Problem set#

Overview#

A set of problems will be handed out on Friday, 23.02.2024. The solutions to the problems must be submitted through Moodle by Thursday, 21.03.2024 23:59. After initial assessment, students with insufficient answers will receive an additional chance to revisit and correct their work until Thursday, 28.03.2024 23:59.

You should submit your solutions on moodle:Moodle

Most of the solutions will be similar to those of in-class problems we have been solving each week or can be found by starting from those exercises. We do not want anyone to struggle for a long time trying to get something to work, so please use the usual channels to seek help when you are stuck or otherwise need help:

Our slack channels Slack Channel
Office hours: Monday from 14:00 - 15:00
Email: Daniel Gehrig gehrigd@ethz.ch, James O’Brien jobrien@ethz.ch, Melanie Stäubli stmelani@ethz.ch
You can also try our offices (HCI F409-413) but there is no guarantee that someone will be around.

Part 1#

Problem scenario: The research expedition Tara Pacific sampled microbial communities in the ocean and coral reefs across the Pacific Ocean. You would like to collect some basic statistics about the project, such as: ‘How many and what kind of samples were collected?’ The following problems will test your ability to solve these questions using the command line interface and by executing basic commands. More information about the Tara Pacific expedition can be found here:

Tara Oceans foundation Tara Pacific expedition

Question 1

Using the command line interface, connect to the server “cousteau.ethz.ch” and create in your home directory a new directory named problem_set and two sub-directories, one named script and another one named results.

Question 2

In your problem_set directory, download from the internet the metadata file for the oceanic research voyage Tara Pacific. https://zenodo.org/records/6299409/files/TARA-PACIFIC_samples-provenance_20220131d.tsv.

Question 3

If examining the column “sample-material_label”, how many samples from “CORAL” were taken? Provide the answer AND the command you used to get your answer.

Question 4

How many unique sample materials (e.g. CORAL, SEDIMENT, FISH, …) are listed under “sample-material_label”?

Question 5

How many samples are listed for each unique sample material? Save the result in the directory results as a file named count_unique_samples.txt.

Question 6

Compress the Tara Pacific metadata file in your problem_set directory. Consult the manual of the command to achieve the best-possible compression.

Now, assume you have isolated a microorganism and would like to find out more about it. You have asked a company to sequence the genome of the microbe and have received the data in FASTA format. The file is named <your ETH user name>.GCA_*.fasta where GCA_* is the unique accession identifier that has been assigned to the genome by NCBI/GenBank. The file can be found in the directory /nfs/teaching/551-0132-00L/7_Project/FS24_Genomes/. Begin by copying the genome fasta file to your script directory. Please provide answers to the following questions about your genome.

Question 7

What is the genus and species name of the organism you isolated?

Question 8

How many lines are in the genome fasta file you were assigned?

Question 9

How many bases are there in the genome (including all chromosomes and plasmids) you were given? Provide a script named genome_bases.sh in your script folder that will output the number of bases.

Question 10

What is the longest open reading frame in the genome provided?

Question 11

What is the GC content (in %) of your genome?

Question 12

What is the maximum number of times the codon “ATG” occurs in your genome?

Problem set

Contents

Problem set#

Overview#

Part 1#