Working with files#
Looking at files#
The command cat displays the entire contents of a file directly on the terminal. For large files this can be disastrous, so remember that you can cancel commands in progress with ctrl + c.
# ConCATenate
cat E.coli_K12_MG1655.fna
The command head displays only the first 10 lines of a file directly on the terminal. If you look at the available options for the command, -n x outputs the first x lines instead, and using a negative number outputs the lines except for the last x.
# Show file head
head E.coli_K12_MG1655.fna
head -n 1 E.coli_K12_MG1655.fna
The command tail displays only the last 10 lines of a file directly on the terminal. It has similar options to head; -n x outputs the last x lines, and using a positive number +x (note the “+” character) outputs the lines except for the first x.
# Show file tail
tail E.coli_K12_MG1655.fna
The command less is a versatile way to look at a file in the command line. Instead of showing you the contents of a file directly on the terminal, it ‘opens’ the file to browse. You can use the arrow keys, page up, page down, home, end and the spacebar to navigate the file. Pressing q will quit. A number of useful options exist for the command, such as showing line numbers or displaying without line wrapping. It also has a search feature that we will cover later.
# Browse file
less E.coli_K12_MG1655.fna
The command wc is a command that will quickly count the number of lines, words and characters in a file, including invisible characters like ‘newline’ and whitespace. Its options allow you to specify which value to return, otherwise it gives all three.
# Count things
wc E.coli_K12_MG1655.fna
Exercise 0.5
Use cat to look at the E. coli genome file you copied last time, is it suitable for looking at this file?
# Your file should be located in your genomes directory
cd ~/genomes/
# If you have not copied and renamed the file yet, you can use these commands to do so
cp /nfs/teaching/551-0132-00L/1_Unix/genomes/bacteria/escherichia/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna ~/genomes
cd ~/genomes
mv GCF_000005845.2_ASM584v2_genomic.fna E.coli_K12_MG1655.fna
# Looking at the file
cat E.coli_K12_MG1655.fna
# Press ctrl + c to cancel the command
Use head and tail to examine the first and last 10 lines of the genome file. Now try to look at the first and last 20 lines.
# Look at the first 10 lines (10 is the default value)
head E.coli_K12_MG1655.fna
# Look at the last 10 lines
tail E.coli_K12_MG1655.fna
# Look at the first 20 lines
head -n 20 E.coli_K12_MG1655.fna
# Look at the last 20 lines
tail -n 20 E.coli_K12_MG1655.fna
Use less to look at the genome file. Navigate through the file with the keys listed above, then return to the Terminal.
# Looking at the genome file
less E.coli_K12_MG1655.fna
#press q to quit
Use the man command we learned to read about the wc command.
# Read about the wc command
man wc
Can you find out how many lines are in the genome file with the wc command?
# Count the number of lines in the file
wc -l E.coli_K12_MG1655.fna