Sequence data

Sequence data#

Topic overview#

In this lesson you will be introduced to the sequence data formats fastq and fasta. You will learn how features such as genes are annotated in genbank and gff formats. Sequence and feature data is available via any number of online databases, and you will explore NCBI resources.

Learning objectives#

Learning objective 1 - You can interpret files containing biological sequence information in standard formats

  • LO 1.1 - You familiarise yourself with the different file formats (fasta, fastq, GenBank, GFF)

  • LO 1.2 - You extract information on the command line from the files

Learning objective 2 - You are able to navigate and search important public databases of biological information

  • LO 2.1 - You inspect major sequence databases (NCBI, ENA, DDBJ)

  • LO 2.2 - You experiment with the Entrez search system of the NCBI

  • LO 2.3 - You download files from databases with the command line

Learning objective 3 - You can use packages which enable working with sequence- and feature-based computational calculations and conversions.

  • LO 3.1 - You can use Seq and SeqIO objects within Biopython to perform sequence and feature searches and conversions

  • LO 3.2 - You can use Biopython to manipulate sequence and feature files