Introduction to Computing and Unix
Today’s Topics
- What is programming?
- Why should you program?
- The Unix environment
First, a few things about computing
See resources page
What is a program?
- a series of commands for your computer
- computers are dumb
- computers read binary
- an assembler language translates human-readable code into binary
What is programming
- one or more scripts saved in text files
- they must be accessible to the operating system
- creating software and scripts is the goal
- the operating system itself is just a collection of scripts that interoperate
Why program?
- Repeatability
- a script can be a record of what happened
- this is important when things go wrong
- publishing scripts is cool (you want to be cool)
- software builds on itself - get involved
- for example: make a software pipeline that collects, catalogues, and aligns sequences in a repeatable, well-documented fashion; then give to someone else so they can do it too
- Speed
- the first and central goal of computer science
- take advantage of the decades of work
- for example: “my NGS file is too big and cannot be opened by any text editor known to man!”
- for example: “I need to divide samples from a particular locality by 4.923”
- Automation
- do the same things many times (perhaps with many parameters)
- let’s face it, some tasks are simply below you
- no task is below a computer
- for example: “I collected two months of data on color, sex, body size, and gut content of five different species at 7 different field sites, but my advisor says only take sex and color from 2 species at 5 field sites. How do I put all this in one text file in under 2 seconds?”
- do the same things many times (perhaps with many parameters)
Elements of style
- Which language should you use?
- Is your code readable by others?
- Is your code readable by you?
- How can you appropriately break up tasks?
Languages
- so many
- consider:
- speed versus readability
- documentation
- what people in your field use
- Stats: R
- Dense computation: C & C++
- Next-gen: Perl, Python, Unix
- Unix is usually the “glue” in workflows
- Why Python?
- general concepts are almost universal
- readable
- popular
- well-documented
- Why Unix?
- general concepts are almost universal
- operating system written in C
- very fast
- almost universally used in computers, supercomputers, and file systems
A note on backups: Everyone should back up their computer regularly. We will discuss some commands
today that can remove files, or even your entire file system if you are not careful.
File systems
- your computer contains a nested hierarchy of directories
- directory is a folder on your computer which contains files
- keeping track of where you are in the file structure of your computer is an important component of computing
- highest level is the root (denoted:
/
) - forward slashes divide levels in the nested hierarchy of directories, e.g.
/top_level_directory/second_level_directory
- there are several high-level directories that users don’t usually go into where program files are stored
- /usr/bin
- /usr/lib
- every file on your computer has an address; if you are going to do an operation on a file, you need its address
- path: the address to a directory or file on your computer. There are, generally, two types of paths:
- absolute/full path represents the path of a given directory or file beginning at the root directory
- relative path represents the path of a given directory/file relative to the working/current directory
- for example, say you have a file “my_favorite_file.txt” located in the directory
/Users/myname/Desktop/my_directory
.- the full path to this file is
/Users/myname/Desktop/my_directory/my_favorite_file.txt
- the relative path to this file depends on where you are on the computer
- if you are calling this file from Desktop, the relative path would be
my_directory/my_favorite_file.txt
- if you are in
/Users/myname/
, the relative path becomesDesktop/my_directory/my_favorite_file.txt
- the full path to this file is
Remember - Whenever you call the full path, you can reach the file from anywhere on your computer. Relative paths will change based on your current location.
Unix
- commands are small programs
- type the name of a command and hit enter
- Unix searches for the program’s text file and executes it
- programs have preset arguments which change their behavior
- these can be found in the program’s manual
- programs interact with files that are in the directory that you are in
- we use a “shell” to interact with Unix: it exchanges information between user and program through standard streams
- standard input: input to programs
- standard output: information on screen, i.e. what the program outputs
- standard text editor for Unix is nano
- type
nano
to access it - opens a text editor within the shell
- saving, exiting, and other functions are controlled with ctrl + letter keys
- type
Let’s try out some commands
Command | Translation | Examples |
---|---|---|
cd |
change directory | cd /absolute/path/of/the/directory/ Go to the home directory by typing simply cd or cd ~ Go up (back) a directory by typing cd .. |
pwd |
print working directory | pwd |
mkdir |
make directory | mkdir newDirectory creates newDirectory in your current directory Make a directory one level up with mkdir ../newDirectory |
cp |
copy | cp file.txt newfile.txt (and file.txt will still exist!) |
mv |
move | mv file.txt newfile.txt (but file.txt will no longer exist!) |
rm |
remove | rm file.txt removes file.txt rm -r directoryname/ removes the directory and all files within |
ls |
list | ls *.txt lists all .txt files in current directory ls -a lists all files including hidden ones in the current directory ls -l lists all files in current directory including file sizes and timestamps ls -lh does the same but changes file size format to be human-readable ls ../ lists files in the directory above the current one |
man |
manual | man ls opens the manual for command ls (use q to escape page) |
grep |
global regular expression parser |
grep ">" seqs.fasta pulls out all sequence names in a fasta file grep -c ">" seqs.fasta counts the number of those sequences |
cat |
concatenate | cat seqs.fasta prints the contents of seqs.fasta to the screen (ie stdout) |
head |
head | head seqs.fasta prints the first 10 lines of the file head -n 3 seqs.fasta prints first 3 lines |
tail |
tail | tail seqs.fasta prints the last 10 lines of the file tail -n 3 seqs.fasta prints last 3 lines |
wc |
word count | wc filename.txt shows the number of new lines, number of words, and number of characters wc -l filename.txt shows only the number of new lines wc -c filename.txt shows only the number of characters |
sort |
sort | sort filename.txt sorts file and prints output |
uniq |
unique | uniq -u filename.txt shows only unique elements of a list (must use sort command first to cluster repeats) |
Handy dandy shortcuts
Shortcut | Use |
---|---|
Ctrl + C | kills current process |
Ctrl + L (or clear ) |
clears screen |
Ctrl + A | Go to the beginning of the line |
Ctrl + E | Go to the end of the line |
Ctrl + U | Clears the line before the cursor position |
Ctrl + K | Clear the line after the cursor |
* |
wildcard character |
tab | completes word |
Up Arrow | call last command |
. |
current directory |
.. |
one level up |
~ |
home |
> |
redirects stdout to a file, overwriting file if it already exists |
>> |
redirects stdout to a file, appending to the end of file if it already exists |
pipe (| ) |
redirects stdout to become stdin for next command |
Homework
- using commands and arguments
- Find a partner to do the assignment. Spend 15 minutes using the commands on the cheat sheet with a directory on your computer. Be sure to try different arguments and use both files and directories.
- Look up at least one command and read about all of its options. Share with your partner.
- Create a directory called “My Directory”. What happened?
- redirecting output
- Create a new file containing your bash history.
- How many times did you use the command
ls
? - Be sure to use these three symbols:
|
, >, »
- google groups!
- Post at least one question and one answer on the google groups page regarding what we covered today.