Sequence data#
Topic overview#
In this lesson you will be introduced to the sequence data formats fastq and fasta. You will learn how features such as genes are annotated in genbank and gff formats. Sequence and feature data is available via any number of online databases, and you will explore NCBI resources.
Learning objectives#
Learning objective 1 - You can interpret files containing biological sequence information in standard formats
LO 1.1 - You familiarise yourself with the different file formats (fasta, fastq, GenBank, GFF)
LO 1.2 - You extract information on the command line from the files
Learning objective 2 - You are able to navigate and search important public databases of biological information
LO 2.1 - You inspect major sequence databases (NCBI, ENA, DDBJ)
LO 2.2 - You experiment with the Entrez search system of the NCBI
LO 2.3 - You download files from databases with the command line
Learning objective 3 - You can use packages which enable working with sequence- and feature-based computational calculations and conversions.
LO 3.1 - You can use Seq and SeqIO objects within Biopython to perform sequence and feature searches and conversions
LO 3.2 - You can use Biopython to manipulate sequence and feature files