************************************************************************************** * The Ocean Microbial Reference Gene Catalog v2 (OM-RGC.v2) * ************************************************************************************** Last change: 07.March.2019 ************************************************************************************** * Files * ************************************************************************************** Existing files: Gene catalog: OM-RGC_v2_release.tsv.gz OM-RGC_v2_genes.fna OM-RGC_v2_assemblies.tar.gz Gene profiles: OM-RGC_v2_gene_profile_metaG.tsv.gz OM-RGC_v2_gene_profile_metaT.tsv.gz Functional profiles: OM-RGC_v2_functional_profile_KEGG.tar.gz KO_exp.norm.match.log2.tsv.gz KO_metaG.norm.match.log2.tsv.gz KO_metaG.norm.tsv.gz KO_metaT.norm.match.log2.tsv.gz KO_metaT.norm.tsv.gz OM-RGC_v2_functional_profile_eggNOG.tar.gz OG_exp.norm.match.log2.tsv.gz OG_metaG.norm.match.log2.tsv.gz OG_metaG.norm.tsv.gz OG_metaT.norm.match.log2.tsv.gz OG_metaT.norm.tsv.gz OM-RGC_v2_functional_profile_eggNOGplusGC.tar.gz OGplusGC_exp.norm.match.log2.tsv.gz OGplusGC_metaG.norm.match.log2.tsv.gz OGplusGC_metaG.norm.tsv.gz OGplusGC_metaT.norm.match.log2.tsv.gz OGplusGC_metaT.norm.tsv.gz Taxonomic profiles: OM-RGC_v2_taxonomic_profiles.tar.gz mitags_tab_class.tsv.gz mitags_tab_domain.tsv.gz mitags_tab_family.tsv.gz mitags_tab_genus.tsv.gz mitags_tab_order.tsv.gz mitags_tab_otu.tsv.gz mitags_tab_phylum.tsv.gz ************************************************************************************** * File description * ************************************************************************************** Field separator for all files is a single '\t' character. ------------------------------------------------------------ Gene catalog: ------------------------------------------------------------ The file OM-RGC_v2_release.tsv.gz contains all reference genes in a single .tsv file for easy insertion into databases. Each row contains the following information: 1. gene Gene identifier (with link to scaffold origin) 2. OM-RGC_ID Internal identifier of the sequence (OM-RGC.v2.XXXXXXXXX) 3. KO KEGG annotated KEGG orthology of this gene (if available) 4. OG eggNOG annotated orthologous group(s) of this gene (if available) 5. GC gene cluster representative ID (if available) 6. Domain Taxonomic annotation (Domain) 7. Phylum Taxonomic annotation (Phylum) 8. Class Taxonomic annotation (Class) 9. Order Taxonomic annotation (Order) 10. Family Taxonomic annotation (Family) 11. Genus Taxonomic annotation (Genus) 12. Species Taxonomic annotation (Species) 13. Strain Taxonomic annotation (Strain) 14. sequence Nucleotide sequence of the gene The file OM-RGC_v2_genes.fna contains all gene sequences in FASTA format. The file OM-RGC_v2_assemblies.tar.gz contains one file for each sample with all assemblies in FASTA format. ------------------------------------------------------------ Gene profiles: ------------------------------------------------------------ Contain the composition (length–normalized insert counts) for each gene in the 180 metagenomic samples [gene_profile_metaG.tsv.gz] and the 187 metatranscriptomic samples [gene_profile_metaT.tsv.gz]. ------------------------------------------------------------ Functional profiles: ------------------------------------------------------------ Contain the genomic and transcriptomic composition and expression for each OG/KO/GC in each sample. Profiles are built for three different metric: - genomic composition [*metaG*.tsv.gz] - transcriptomic composition [*metaT*.tsv.gz] - expression levels (transcriptomic composition/genomic composition) [*exp*.tsv.gz] Profiles are built for three different functional annotations: - KOs [KO_*.tsv.gz] - OGs [KO_*.tsv.gz] - OGs+GC [OGplusGC_*.tsv.gz] Profiles with two different normalization are shared (see details in STAR Methods): - per-cell abundance [*norm*.tsv.gz] - per-cell abundance + Variance stabilization + log2 [*norm.match.log2.tsv.gz] ------------------------------------------------------------ Taxonomic profiles: ------------------------------------------------------------ Contain the composition (number of 16S/18S metagenomic reads) of each taxa (and the unclassified reads) in each sample. Composition tables exist for 7 different taxonomic levels (Domain, Phylum, Class, Order, Family, Genus, 97% OTUs). ************************************************************************************** * Disclaimer * ************************************************************************************** The data have been generated according to current standards of scientific conduct. However, gene coding nucleotide sequences represent computational predictions only and accuracy of taxonomic and gene functional assignments dependent on the completeness of reference databases used for annotation. Thus, the data provider cannot be held responsible for potential mispredictions or misannotations.