Working with files

Contents

Working with files#

Looking at files#

../../../_images/looking_at_files.PNG

The command cat displays the entire contents of a file directly on the terminal. For large files this can be disastrous, so remember that you can cancel commands in progress with ctrl + c.

# ConCATenate
cat E.coli_K12_MG1655.fna

The command head displays only the first 10 lines of a file directly on the terminal. If you look at the available options for the command, -n x outputs the first x lines instead, and using a negative number outputs the lines except for the last x.

# Show file head
head E.coli_K12_MG1655.fna
head -n 1 E.coli_K12_MG1655.fna

The command tail displays only the last 10 lines of a file directly on the terminal. It has similar options to head; -n x outputs the last x lines, and using a positive number +x (note the “+” character) outputs the lines except for the first x.

# Show file tail
tail E.coli_K12_MG1655.fna

The command less is a versatile way to look at a file in the command line. Instead of showing you the contents of a file directly on the terminal, it ‘opens’ the file to browse. You can use the arrow keys, page up, page down, home, end and the spacebar to navigate the file. Pressing q will quit. A number of useful options exist for the command, such as showing line numbers or displaying without line wrapping. It also has a search feature that we will cover later.

# Browse file
less E.coli_K12_MG1655.fna

The command wc is a command that will quickly count the number of lines, words and characters in a file, including invisible characters like ‘newline’ and whitespace. Its options allow you to specify which value to return, otherwise it gives all three.

# Count things
wc E.coli_K12_MG1655.fna

Exercise 0.5

  • Use cat to look at the E. coli genome file you copied last time, is it suitable for looking at this file?

# Your file should be located in your genomes directory
cd ~/genomes/

# If you have not copied and renamed the file yet, you can use these commands to do so
cp /nfs/teaching/551-0132-00L/1_Unix/genomes/bacteria/escherichia/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna ~/genomes
cd ~/genomes
mv GCF_000005845.2_ASM584v2_genomic.fna E.coli_K12_MG1655.fna

# Looking at the file
cat E.coli_K12_MG1655.fna
# Press ctrl + c to cancel the command
  • Use head and tail to examine the first and last 10 lines of the genome file. Now try to look at the first and last 20 lines.

# Look at the first 10 lines (10 is the default value)
head E.coli_K12_MG1655.fna

# Look at the last 10 lines
tail E.coli_K12_MG1655.fna

# Look at the first 20 lines
head -n 20 E.coli_K12_MG1655.fna

# Look at the last 20 lines
tail -n 20 E.coli_K12_MG1655.fna
  • Use less to look at the genome file. Navigate through the file with the keys listed above, then return to the Terminal.

# Looking at the genome file
less E.coli_K12_MG1655.fna
#press q to quit
  • Use the man command we learned to read about the wc command.

# Read about the wc command
man wc
  • Can you find out how many lines are in the genome file with the wc command?

# Count the number of lines in the file
wc -l E.coli_K12_MG1655.fna