In-class problems

In-class problems#

In-class Problems Week 3

Remember that for the in-class problems, you can use different resources (command line manual/help pages, browser-based searches, AI-based problem solving) to find an answer. There is no right or wrong approach to finding an answer to a problem.

When you work with a genome or sequence file, there are some basic statistics of interest that can be obtained using command line tools, although it may be easier to do so with python or even a specialised program:
  • How large (in bases, or base pairs, bp) is the genome?

  • What is the GC content of the genome?

If the sequence has been annotated with features you may also want to know:

  • How many genes does the genome contain?

  • What are the lengths of those genes?

In many cases, you may only be interested in working with gene or protein sequences, or even a specific gene. Consider how you might write a python script that:

  • Outputs the DNA sequences of the genes in a genbank file to a multifasta file?

  • Outputs the amino acid sequences instead?

  • Takes as an argument a genbank file (so you can then apply it to any such file you might encounter)?

  • Takes as an additional argument the ID of a gene feature and outputs only that sequence?

Note: There will be no example solutions for in-class-problems. It is expected that students take notes during the lecture. If questions come up, students can use the Slack-channels to receive help.