Downloads

The graphical user interface allows for easy and fast inspection of individual genomes, associated annotations, studies and samples. Access to multiple datasets is also possible by using the FTP data backend of OMDB.

Alternatively data can also be downloaded via the commandline using the OMDB links file. After downloading the file (8MB, MD5=c1b5f14c9b7899f7300ccf41e62f8681) users have access to links to all genome and genome annotation files on OMDB.

#download the file. Either click on the link above or download with curl

$ curl -O https://sunagawalab.ethz.ch/share/microbiomics/ocean/db/2.0/data/catalogs/OMDBv2.0_data.tsv.gz

$ gunzip OMDBv2.0_data.tsv.gz

GENOME: GARB21-1_SAMN12799101_MAG_00000001
SAMPLE: GARB21-1_SAMN12799101_METAG
STUDY: GARB21-1
GENOME_FILE: https://sunagawalab.ethz.ch/share/microbiomics/ocean/db/2.0/data/genomes/genomes/GARB21-1/GARB21-1_SAMN12799101_METAG/GARB21-1_SAMN12799101_MAG_00000001/GARB21-1_SAMN12799101_MAG_00000001.fa.gz
GENES_NT_FILE: https://sunagawalab.ethz.ch/share/microbiomics/ocean/db/2.0/data/genomes/genomes/GARB21-1/GARB21-1_SAMN12799101_METAG/GARB21-1_SAMN12799101_MAG_00000001/GARB21-1_SAMN12799101_MAG_00000001.genes.fna.gz
GENES_AA_FILE: https://sunagawalab.ethz.ch/share/microbiomics/ocean/db/2.0/data/genomes/genomes/GARB21-1/GARB21-1_SAMN12799101_METAG/GARB21-1_SAMN12799101_MAG_00000001/GARB21-1_SAMN12799101_MAG_00000001.genes.faa.gz
GENES_GFF_FILE: https://sunagawalab.ethz.ch/share/microbiomics/ocean/db/2.0/data/genomes/genomes/GARB21-1/GARB21-1_SAMN12799101_METAG/GARB21-1_SAMN12799101_MAG_00000001/GARB21-1_SAMN12799101_MAG_00000001.genes.gff.gz
ANTISMASH_FILE: https://sunagawalab.ethz.ch/share/microbiomics/ocean/db/2.0/data/genomes/genomes/GARB21-1/GARB21-1_SAMN12799101_METAG/GARB21-1_SAMN12799101_MAG_00000001/GARB21-1_SAMN12799101_MAG_00000001-antismash.tar.gz

E.g. To download the Antismash file from the Genome GARB21-1_SAMN12799101_MAG_00000001

$ curl -O (<grep "GARB21-1_SAMN12799101_MAG_00000001" OMDBv2.0_data.tsv | cut -f 8)
https://sunagawalab.ethz.ch/share/microbiomics/ocean/db/2.0/data/genomes/genomes/GARB21-1/GARB21-1_SAMN12799101_METAG/GARB21-1_SAMN12799101_MAG_00000001/GARB21-1_SAMN12799101_MAG_00000001-antismash.tar.gz

$ curl -O https://sunagawalab.ethz.ch/share/microbiomics/ocean/db/2.0/data/genomes/genomes/GARB21-1/GARB21-1_SAMN12799101_METAG/GARB21-1_SAMN12799101_MAG_00000001/GARB21-1_SAMN12799101_MAG_00000001-antismash.tar.gz

Alternatively, use download.file in R or the requests module in Python to download data in a more systematic way.

2. Catalogs

OMDB genomes and derived genes have been compiled into several catalogs and are released on this page:

Gene Catalog (NT)

Complete genes of all OMDB genomes were called, aggregated and clustered in nucleotide space at different levels.

Gene Catalog (AA)

Complete genes of all OMDB genomes were called, aggregated and clustered in amino acid space at different levels.

Genome Catalog

Terminology

Methods

Redundant catalogs and the catalogs dereplicated at 100% were generated with custom scripts.

Catalog	Genes	Clustering Threshold	Singletons	Sequences	Clusters
OMDBv2.0_NT_G_R	508,832,278	No clustering	100%	Sequences - 128GB	Clusters - 5GB
OMDBv2.0_NT_G_NR100	325,384,975	100%	85%	Sequences - 88GB	Clusters - 4GB
OMDBv2.0_NT_G_NR95	103,044,829	95%	57%	Sequences - 27GB	Clusters - 3GB

Catalog	Genes	Clustering Threshold	Singletons	Sequences	Clusters
OMDBv2.0_AA_G_R	508,832,278	No clustering	100%	Sequences - 88GB	Clusters - 5GB
OMDBv2.0_AA_G_NR100	249,518,434	100%	79%	Sequences - 46GB	Clusters - 4GB
OMDBv2.0_AA_G_NR50	28,862,112	50%	53%	Sequences - 4GB	Clusters - 4GB
OMDBv2.0_AA_G_NR30	18,342,415	30%	53%	Sequences - 2GB	Clusters - 4GB

Catalog	Genomes	Clustering Threshold	Singletons	Sequences	Clusters
OMDBv2.0_SC_G_R	69,280,421	No clustering	100%	Sequences - 150GB	Clusters - 1GB
OMDBv2.0_SC_G_NR100	68,726,394	100%	99%	Sequences - 145GB	Clusters - 1GB

The OMDBv2.0_NT_G_NR95 catalog was clustered using mmseqs2 with the following parameters:

mmseqs createdb OMDBv2.0_NT_G_R.fna OMDBv2.0_NT_G_NR95.mmseqs.db --dbtype 2 --shuffle 0

mmseqs cluster OMDBv2.0_NT_G_NR95.mmseqs.db OMDBv2.0_NT_G_NR95.mmseqs.db.9590.cluster mmseqs_tmp --kmer-per-seq-scale 0 --kmer-per-seq 1000 -s 4 --max-seq-len 80000 --remove-tmp-files 0 --cluster-mode 2 --min-seq-id 0.95 --threads 96 --cov-mode 1 -c 0.9 --spaced-kmer-mode 0 --alignment-mode 3 --cluster-reassign 1 

mmseqs createtsv OMDBv2.0_NT_G_NR95.mmseqs.db OMDBv2.0_NT_G_NR95.mmseqs.db OMDBv2.0_NT_G_NR95.mmseqs.db.9590.cluster OMDBv2.0_NT_G_NR95.mmseqs.9590.cluster.tsv

The OMDBv2.0_AA_G_NR50 catalog was clustered using mmseqs2 with the following parameters:

mmseqs easy-cluster OMDBv2.0_AA_G_R.faa mmseqs_dir mmseqs_tmp --min-seq-id 0.5 -c 0.9 --cov-mode 1 --threads 96

The OMDBv2.0_AA_G_NR30 catalog was clustered using mmseqs2 with the following parameters:

mmseqs easy-cluster OMDBv2.0_AA_G_R.faa mmseqs_dir mmseqs_tmp --min-seq-id 0.3 -c 0.9 --cov-mode 1 --threads 96