Looking at files#
Ultimately, we are most often interested in exploring the content of files within our file system, as files store the information of interest. Different commands can be used to investigate file contents, depending on what we want to know and whether these commands are suitable for the size and type of a file. A common example would be that you would not use the same commands to look for a specific word in a text file versus to look at an image file.
The command cat stands for con catenate and displays the entire contents of a file directly on the terminal. For large files this can be disastrous, so remember that you can cancel commands in progress with ‘ctrl + c’.
# Concatenate
cat E.coli_K12_MG1655.fna
The command head displays only the first 10 lines of a file directly on the terminal. If you look at the available options for the command:
-n outputs the first 10 lines
-n x outputs the first x lines instead
using a negative number -n -x outputs all lines except for the last x
# Show file head
head E.coli_K12_MG1655.fna
head -n 1 E.coli_K12_MG1655.fna
The command tail displays only the last 10 lines of a file directly on the terminal. It has similar options to head:
-n outputs the last 10 lines
-n x outputs the last x lines
using a positive number -n +x (note the “+” character) outputs all lines except for the first x
# Show file tail
tail E.coli_K12_MG1655.fna
The command less is a versatile way to look at a file in the command line. Instead of showing you the contents of a file directly on the terminal, it opens the file to browse. You can use the arrow keys, page up, page down, home, end and the spacebar to navigate the file. Pressing ‘q’ will quit. A number of useful options exist for the command, such as showing line numbers or displaying without line wrapping. It also has a search feature that we will cover later.
# Browse file
less E.coli_K12_MG1655.fna
q # to quit the opened file and return back to the command line
The command wc is a command that will quickly count the number of lines -l, words -w and characters -m in a file, including invisible characters like newline and whitespace. Its options allow you to specify which value to return, otherwise it gives all three.
# Count things
wc E.coli_K12_MG1655.fna
Exercise 1.6#
Exercise 1.6
Use cat to look at the E. coli genome file you copied last time, is it suitable for looking at this file?
# Your file should be located in your genomes directory
cd ~/genomes/
# If you have not copied and renamed the file yet, you can use these commands to do so
cp /nfs/teaching/551-0132-00L/1_Unix/genomes/bacteria/escherichia/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna ~/genomes
cd ~/genomes
mv GCF_000005845.2_ASM584v2_genomic.fna E.coli_K12_MG1655.fna
# Looking at the file
cat E.coli_K12_MG1655.fna
# Press ctrl + c to cancel the command, since the file is quite big so displaying the whole file takes long
Use head and tail to examine the first and last 10 lines of the genome file. Now try to look at the first and last 20 lines.
# Look at the first 10 lines (10 is the default value)
head E.coli_K12_MG1655.fna
# Look at the last 10 lines
tail E.coli_K12_MG1655.fna
# Look at the first 20 lines
head -n 20 E.coli_K12_MG1655.fna
# Look at the last 20 lines
tail -n 20 E.coli_K12_MG1655.fna
Use less to look at the genome file. Navigate through the file with the keys listed above, then return to the Terminal.
# Looking at the genome file
less E.coli_K12_MG1655.fna
#press q to quit
Use less with tab to autocomplete to look at the genome file.
# Looking at the genome file
less E
#click 'Tab' to autocomplete
Can you find out how many lines are in the genome file with the wc command? What happens if you use wc without specified options?
# Count the number of lines in the file
wc -l E.coli_K12_MG1655.fna
# Check what would have happened if you only used wc without specified options
wc E.coli_K12_MG1655.fna