Search within files

Search within files#

A fast way of searching for a specific pattern within an open file (similar to Ctrl+F/Command+F for searching within a text document) is /<pattern>. The disadvantage of opening a file is however that there is no simple way to display all occurences of the search, you rather have to type n or N to browse through the hits as the file will be visualized so that one hit can be found on the top line.

# Searching for a pattern within an open file
less E.coli.fna
/AAAAAAAAA + Enter                 # searches for pattern within an open file
q                                  # closes the open file

The command grep allows you to search within files without opening them first with another program. It has a number of useful options to help give you the right output.

# A simple grep
grep "AAAAAAAAA" E.coli.fna        # shows all lines containing "AAAAAAAAA" highlighted

# Useful options
grep -o                             # show only the matches
grep -c                             # show only a count of the matches

Exercise 1.7#

Exercise 1.7

  • Navigate to the directory you copied the E. coli files to earlier.

# Navigation
cd ~/ecoli
  • Use less to look at the GCF_000005845.2_ASM584v2_cds_from_genomic.fna file, containing nucleotide gene sequences.

# Look at the file
less GCF_000005845.2_ASM584v2_cds_from_genomic.fna
# Press q to quit
  • Search within less to find the sequence for dnaA (searching within opened file).

less GCF_000005845.2_ASM584v2_cds_from_genomic.fna
/dnaA
# Type n or N (stands for Next) after to see if there are more search hits
# Press q to quit
  • Use grep to find the sequence for dnaA within the file (without opening it).

grep "dnaA" GCF_000005845.2_ASM584v2_cds_from_genomic.fna
  • Use grep -o to only show the matches for the pattern dnaA within the file. How does the output change?

grep -o "dnaA" GCF_000005845.2_ASM584v2_cds_from_genomic.fna
  • Use grep -c count how many dnaA sequences there are within the file.

grep -c "dnaA" GCF_000005845.2_ASM584v2_cds_from_genomic.fna