![alt text](https://raw.githubusercontent.com/motu-tool/mOTUs/master/pics/motu_logo.png) mOTU profiler ======== **This is the alpha release of the mOTUs4 (v4.0.0a) profiler**. The mOTUs profiler is a computational tool that estimates (relative) taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data. If you use mOTUs, please cite: > **Reference genome-independent taxonomic profiling of microbiomes with mOTUs3** > > Hans-Joachim Ruscheweyh*, Alessio Milanese*, Lucas Paoli, Nicolai Karcher, Quentin Clayssen, > Marisa Isabell Metzger, Jakob Wirbel, Peer Bork, Daniel R. Mende, Georg Zeller# & Shinichi Sunagawa# > > _Microbiome_ (2022) > > doi: [10.1186/s40168-022-01410-z](https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-022-01410-z) Pre-requisites/Installation -------------- The mOTU profiler requires: * Python = 3.10 * bwa = v0.7.17 * biopython = 1.81 * pysam = 0.21.0 Installation of mOTUs is easiest when using the conda package manager. To install conda and enable bioconda please follow the instructions at the official [bioconda](https://bioconda.github.io/) page. Then install mOTUs using conda: ``` conda create -n motus4 python=3.10 bwa=0.7.17 pysam=0.21 biopython=1.81 conda activate motus4 wget https://sunagawalab.ethz.ch/share/MOTUS/motus-tool/v4.0.0a/mOTUsv4.0.0a.tar.gz tar -xzvf mOTUsv4.0.0a.tar.gz cd motus-tool python motus.py downloadDB 2024-07-01,12:04:08 INFO: mOTU tool starting 2024-07-01,12:04:08 INFO: Start downloading mOTUs marker gene database. ~6GB 2024-07-01,12:04:20 INFO: Finished downloading mOTUs marker gene database. 2024-07-01,12:04:20 INFO: Start un-taring mOTUs marker gene database. 2024-07-01,12:05:20 INFO: Finished untaring mOTUs marker gene database. 2024-07-01,12:05:20 INFO: mOTU tool shutting down with exitcode 0 python motus.py Program: motus - a tool for marker gene-based OTU (mOTU) profiling Version: 4.0.0 Reference: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022). doi: https://doi.org/10.1186/s40168-022-01410-z motus [options] -- Taxonomic profiling profile Perform taxonomic profiling (map_tax + calc_mgc + calc_motu) in a single step map_tax Map reads to the marker gene database calc_mgc Calculate marker gene cluster (MGC) abundance calc_motu Summarize MGC abundances into a mOTU profile -- Utilities download Download genomes associated with mOTUs downloadDB Download the mOTUs marker gene database merge Merge several taxonomic profiling results into one table Type motus to print the help menu for a specific command motus.py: error: the following arguments are required: command ``` Basic Examples -------------- The mOTUs profiler takes as input sequence read files, aligns them against the markergene database and aggregates the alignments in mOTUs. To test the mOTUs profiler we will download a dataset from ENA ``` mkdir motus-test cd motus-test wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479092/ERR479092_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479092/ERR479092_2.fastq.gz ``` Now run the mOTUs profile command: ``` # -t --> number of threads, can be adapted python ../motus.py profile -f ERR479092_1.fastq.gz -r ERR479092_2.fastq.gz -n ERR479092 -o ERR479092.motus -t 32 2024-07-01,12:08:07 INFO: mOTU tool starting 2024-07-01,12:08:07 INFO: Loading database ... 2024-07-01,12:08:25 INFO: Loading database finished. Version 4.0 (version date: 2024-06-05) contains 124300 mOTUs, 2075157 markergeneclusters and 3436253 markergenes. 2024-07-01,12:08:25 INFO: Starting mOTUs - map_tax routine - Alignment against the mOTUs database ... 2024-07-01,12:08:25 INFO: Aligning ERR479092_1.fastq.gz 2024-07-01,12:10:04 INFO: Finished alignment. Total reads: 9447842, Total aligned reads 11116, 0.1177% aligned. 2024-07-01,12:10:04 INFO: Aligning ERR479092_2.fastq.gz 2024-07-01,12:11:44 INFO: Finished alignment. Total reads: 9447842, Total aligned reads 28029, 0.2967% aligned. 2024-07-01,12:11:44 INFO: Finished all alignments. Total reads: 18895684, Total aligned reads 39145, 0.2072% 2024-07-01,12:11:44 INFO: Sorting BAM file 2024-07-01,12:12:01 INFO: Finished sorting BAM file 2024-07-01,12:12:01 INFO: Finished mOTUs - map_tax routine - Alignment against the mOTUs database ... 2024-07-01,12:12:02 INFO: Starting mOTUs - calc_mgc routine - Calculating abundances per MGC ... 2024-07-01,12:12:02 INFO: Reading alignment file ... 2024-07-01,12:12:19 INFO: Finished reading alignment file ... 2024-07-01,12:12:19 INFO: Read 31711 aligned inserts of which 39.73% are multimappers 2024-07-01,12:12:19 INFO: Finished mOTUs - calc_mgc routine - Calculating abundances per MGC ... 2024-07-01,12:12:19 INFO: mOTU tool shutting down with exitcode 0 ``` This will produce a few mOTUs output files: ``` Output files: ERR479092.motus --> The main output file: mOTU abundances, one mOTU per line ERR479092.motus.relab --> Same as ERR479092.motus but translated to relative abundances head -n 10 ERR479092.motus #TOOL:4.0.0_DB:4.0 report_mode=counts count_mode=INSERT_SCALED min_mgcs=3 MOTU ERR479092 mOTUv4.0_000002 8 mOTUv4.0_000003 13 mOTUv4.0_000006 64 mOTUv4.0_000007 1 mOTUv4.0_000022 2 mOTUv4.0_000023 3 mOTUv4.0_000026 2 mOTUv4.0_000030 241 head -n 10 ERR479092.motus.relab #TOOL:4.0.0_DB:4.0 report_mode=relative_abundance count_mode=INSERT_SCALED min_mgcs=3 MOTU ERR479092 mOTUv4.0_000002 0.00302044 mOTUv4.0_000003 0.00512949 mOTUv4.0_000006 0.02560648 mOTUv4.0_000007 0.00048276 mOTUv4.0_000022 0.00064852 mOTUv4.0_000023 0.00128958 mOTUv4.0_000026 0.00098196 mOTUv4.0_000030 0.09671557 Temporary files (useful for rerunning with different parameters): ERR479092.motus.bam --> Filtered alignment file of reads against the mOTUs markergene database ERR479092.motus.mgc --> alignments aggregated at markergenecluster level (intermediate level between markergene and mOTU) ``` Merging of profiles can be performed using the `merge` routine. For that we first need to profile a second sample: ``` wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479030/ERR479030_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479030/ERR479030_2.fastq.gz python ../motus.py profile -f ERR479030_1.fastq.gz -r ERR479030_2.fastq.gz -n ERR479030 -o ERR479030.motus -t 32 ``` Next we merge the two profile files: ``` python ../motus.py merge -i ERR479030.motus ERR479092.motus -o merged.motus head -n 10 merged.motus #TOOL:4.0.0_DB:4.0 report_mode=counts count_mode=INSERT_SCALED min_mgcs=3 MOTU ERR479030 ERR479092 mOTUv4.0_000002 3 8 mOTUv4.0_000003 0 13 mOTUv4.0_000006 1 64 mOTUv4.0_000007 0 1 mOTUv4.0_000009 2 0 mOTUv4.0_000011 1 0 mOTUv4.0_000012 4 0 mOTUv4.0_000016 5 0 head -n 10 merged.motus.relab #TOOL:4.0.0_DB:4.0 report_mode=relative_abundance count_mode=INSERT_SCALED min_mgcs=3 MOTU ERR479030 ERR479092 mOTUv4.0_000002 0.00095238 0.00320641 mOTUv4.0_000003 0.00000000 0.00521042 mOTUv4.0_000006 0.00031746 0.02565130 mOTUv4.0_000007 0.00000000 0.00040080 mOTUv4.0_000009 0.00063492 0.00000000 mOTUv4.0_000011 0.00031746 0.00000000 mOTUv4.0_000012 0.00126984 0.00000000 mOTUv4.0_000016 0.00158730 0.00000000 ``` All genomes used to build the mOTUs database can be downloaded via the mOTUs tool (or via the [https://motus-db.org/](https://motus-db.org/) website). ``` # List all genomes where the taxonomy matches the word "Angelakisella" exactly python ../motus.py download -s Angelakisella.genomes -w Angelakisella -l 2024-07-01,12:45:56 INFO: mOTU tool starting 2024-07-01,12:45:56 INFO: Loading database ... 2024-07-01,12:45:56 INFO: Initialising the mOTUs search database. 2024-07-01,12:46:44 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words. 2024-07-01,12:46:44 INFO: Searching for keyword: Angelakisella. 2024-07-01,12:46:44 INFO: Found: 5485 hits. 2024-07-01,12:46:44 INFO: Found 5485 genomes. Writing genome information to Angelakisella.genomes 2024-07-01,12:46:44 INFO: Finished writing genome information to Angelakisella.genomes 2024-07-01,12:46:44 INFO: mOTU tool shutting down with exitcode 0 head -n 3 Angelakisella.genomes GENOME MOTU PATH DOMAIN PHYLUM CLASS ORDER FAMILY GENUS SPECIES ANDE20-1_SAMEA4688840_MAG_00000115 no_mOTU https://sunagawalab.ethz.ch/share/MOTUS/database/4.0/data/genomes/ANDE20-1/ANDE20-1_SAMEA4688840_METAG/ ANDE20-1_SAMEA4688840_MAG_00000115/ANDE20-1_SAMEA4688840_MAG_00000115.fa.gz Bacteria Bacillota_A Clostridia Oscillospirales Ruminococcaceae Angelakisella Angelakisella sp004554485 ANDE20-1_SAMEA4688843_MAG_00000017 no_mOTU https://sunagawalab.ethz.ch/share/MOTUS/database/4.0/data/genomes/ANDE20-1/ANDE20-1_SAMEA4688843_METAG/ ANDE20-1_SAMEA4688843_MAG_00000017/ANDE20-1_SAMEA4688843_MAG_00000017.fa.gz Bacteria Bacillota_A Clostridia Oscillospirales Ruminococcaceae Angelakisella Angelakisella sp004554485 # Download all genomes where the taxonomy matches the word "Angelakisella" exactly python ../motus.py download -s Angelakisella.genomes -w Angelakisella -o Angelakisella_genomes_folder/ 2024-07-01,12:47:47 INFO: mOTU tool starting 2024-07-01,12:47:47 INFO: Loading database ... 2024-07-01,12:47:47 INFO: Initialising the mOTUs search database. 2024-07-01,12:48:34 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words. 2024-07-01,12:48:34 INFO: Searching for keyword: Angelakisella. 2024-07-01,12:48:34 INFO: Found: 5485 hits. 2024-07-01,12:48:34 INFO: Found 5485 genomes. Writing genome information to Angelakisella.genomes 2024-07-01,12:48:34 INFO: Finished writing genome information to Angelakisella.genomes 2024-07-01,12:48:34 INFO: Downloading genomes to Angelakisella_genomes_folder 2024-07-01,12:48:34 INFO: Downloading genome (1 / 5485) ANDE20-1_SAMEA4688840_MAG_00000115 to Angelakisella_genomes_folder/ANDE20-1_SAMEA4688840_MAG_00000115.fa.gz ... 2024-07-01,12:50:08 INFO: Downloading genome (5484 / 5485) ZHUJ18-1_SAMN08993540_MAG_00000012 to Angelakisella_genomes_folder/ZHUJ18-1_SAMN08993540_MAG_00000012.fa.gz 2024-07-01,12:50:08 INFO: Downloading genome (5485 / 5485) ZHUJ18-1_SAMN08993547_MAG_00000060 to Angelakisella_genomes_folder/ZHUJ18-1_SAMN08993547_MAG_00000060.fa.gz 2024-07-01,12:50:08 INFO: Finished downloading genomes 2024-07-01,12:50:08 INFO: mOTU tool shutting down with exitcode 0 # Download all REPRESENTATIVE genomes where the taxonomy matches the word "Angelakisella" exactly python ../motus.py download -s Angelakisella.representative.genomes -w Angelakisella -o Angelakisella_representative_genomes_folder/ -r 2024-07-01,12:50:59 INFO: mOTU tool starting 2024-07-01,12:50:59 INFO: Loading database ... 2024-07-01,12:50:59 INFO: Initialising the mOTUs search database. 2024-07-01,12:51:47 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words. 2024-07-01,12:51:47 INFO: Searching for keyword: Angelakisella. 2024-07-01,12:51:47 INFO: Found: 18 hits. 2024-07-01,12:51:47 INFO: Found 18 genomes. Writing genome information to Angelakisella.representative.genomes 2024-07-01,12:51:47 INFO: Finished writing genome information to Angelakisella.representative.genomes 2024-07-01,12:51:47 INFO: Downloading genomes to Angelakisella_representative_genomes_folder 2024-07-01,12:51:47 INFO: Downloading genome (1 / 18) BATT21-1_SAMEA7076242_MAG_00000037 to Angelakisella_representative_genomes_folder/BATT21-1_SAMEA7076242_MAG_00000037.fa.gz 2024-07-01,12:51:47 INFO: Downloading genome (2 / 18) BATT21-1_SAMEA7085688_MAG_00000030 to Angelakisella_representative_genomes_folder/BATT21-1_SAMEA7085688_MAG_00000030.fa.gz 2024-07-01,12:51:47 INFO: Downloading genome (3 / 18) DANK21-1_haib17CEM4890_H2NYMCCXY_SL254772_MAG_00000039 to Angelakisella_representative_genomes_folder/DANK21-1_haib17CEM4890_H2NYMCCXY_SL254772_MAG_00000039.fa.gz ... 2024-07-01,12:51:47 INFO: Downloading genome (17 / 18) XIAO15-1_SAMEA3134386_MAG_00000003 to Angelakisella_representative_genomes_folder/XIAO15-1_SAMEA3134386_MAG_00000003.fa.gz 2024-07-01,12:51:47 INFO: Downloading genome (18 / 18) XIAO16-1_SAMEA3663224_MAG_00000007 to Angelakisella_representative_genomes_folder/XIAO16-1_SAMEA3663224_MAG_00000007.fa.gz 2024-07-01,12:51:47 INFO: Finished downloading genomes 2024-07-01,12:51:47 INFO: mOTU tool shutting down with exitcode 0 # Download REPRESENTATIVE genome from mOTU "mOTUv4.0_000000" python ../motus.py download -s mOTUv4.0_000000.representative.genome -w mOTUv4.0_000000 -o mOTUv4.0_000000_representative_genome_folder/ -r 24-07-01,12:53:00 INFO: mOTU tool starting 2024-07-01,12:53:00 INFO: Loading database ... 2024-07-01,12:53:00 INFO: Initialising the mOTUs search database. 2024-07-01,12:53:47 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words. 2024-07-01,12:53:47 INFO: Searching for keyword: mOTUv4.0_000000. 2024-07-01,12:53:47 INFO: Found: 1 hits. 2024-07-01,12:53:47 INFO: Found 1 genomes. Writing genome information to mOTUv4.0_000000.representative.genome 2024-07-01,12:53:47 INFO: Finished writing genome information to mOTUv4.0_000000.representative.genome 2024-07-01,12:53:47 INFO: Downloading genomes to mOTUv4.0_000000_representative_genome_folder 2024-07-01,12:53:47 INFO: Downloading genome (1 / 1) RSGB23-1_GCF-023347315-V1_GENO_10000001 to mOTUv4.0_000000_representative_genome_folder/RSGB23-1_GCF-023347315-V1_GENO_10000001.fa.gz 2024-07-01,12:53:47 INFO: Finished downloading genomes 2024-07-01,12:53:47 INFO: mOTU tool shutting down with exitcode 0 ```