This is the alpha release of the mOTUs4 (v4.0.0a) profiler.
The mOTUs profiler is a computational tool that estimates (relative) taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.
If you use mOTUs, please cite:
Reference genome-independent taxonomic profiling of microbiomes with mOTUs3
Hans-Joachim Ruscheweyh, Alessio Milanese, Lucas Paoli, Nicolai Karcher, Quentin Clayssen, Marisa Isabell Metzger, Jakob Wirbel, Peer Bork, Daniel R. Mende, Georg Zeller# & Shinichi Sunagawa#
Microbiome (2022)
The mOTU profiler requires:
Installation of mOTUs is easiest when using the conda package manager. To install conda and enable bioconda please follow the instructions at the official bioconda page. Then install mOTUs using conda:
conda create -n motus4 python=3.10 bwa=0.7.17 pysam=0.21 biopython=1.81
conda activate motus4
wget https://sunagawalab.ethz.ch/share/MOTUS/motus-tool/v4.0.0a/mOTUsv4.0.0a.tar.gz
tar -xzvf mOTUsv4.0.0a.tar.gz
cd motus-tool
python motus.py downloadDB
2024-07-01,12:04:08 INFO: mOTU tool starting
2024-07-01,12:04:08 INFO: Start downloading mOTUs marker gene database. ~6GB
2024-07-01,12:04:20 INFO: Finished downloading mOTUs marker gene database.
2024-07-01,12:04:20 INFO: Start un-taring mOTUs marker gene database.
2024-07-01,12:05:20 INFO: Finished untaring mOTUs marker gene database.
2024-07-01,12:05:20 INFO: mOTU tool shutting down with exitcode 0
python motus.py
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
Version: 4.0.0
Reference: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
doi: https://doi.org/10.1186/s40168-022-01410-z
motus <command> [options]
-- Taxonomic profiling
profile Perform taxonomic profiling (map_tax + calc_mgc + calc_motu) in a single step
map_tax Map reads to the marker gene database
calc_mgc Calculate marker gene cluster (MGC) abundance
calc_motu Summarize MGC abundances into a mOTU profile
-- Utilities
download Download genomes associated with mOTUs
downloadDB Download the mOTUs marker gene database
merge Merge several taxonomic profiling results into one table
Type motus <command> to print the help menu for a specific command
motus.py: error: the following arguments are required: command
The mOTUs profiler takes as input sequence read files, aligns them against the markergene database and aggregates the alignments in mOTUs. To test the mOTUs profiler we will download a dataset from ENA
mkdir motus-test
cd motus-test
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479092/ERR479092_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479092/ERR479092_2.fastq.gz
Now run the mOTUs profile command:
# -t --> number of threads, can be adapted
python ../motus.py profile -f ERR479092_1.fastq.gz -r ERR479092_2.fastq.gz -n ERR479092 -o ERR479092.motus -t 32
2024-07-01,12:08:07 INFO: mOTU tool starting
2024-07-01,12:08:07 INFO: Loading database ...
2024-07-01,12:08:25 INFO: Loading database finished. Version 4.0 (version date: 2024-06-05) contains 124300 mOTUs, 2075157 markergeneclusters and 3436253 markergenes.
2024-07-01,12:08:25 INFO: Starting mOTUs - map_tax routine - Alignment against the mOTUs database ...
2024-07-01,12:08:25 INFO: Aligning ERR479092_1.fastq.gz
2024-07-01,12:10:04 INFO: Finished alignment. Total reads: 9447842, Total aligned reads 11116, 0.1177% aligned.
2024-07-01,12:10:04 INFO: Aligning ERR479092_2.fastq.gz
2024-07-01,12:11:44 INFO: Finished alignment. Total reads: 9447842, Total aligned reads 28029, 0.2967% aligned.
2024-07-01,12:11:44 INFO: Finished all alignments. Total reads: 18895684, Total aligned reads 39145, 0.2072%
2024-07-01,12:11:44 INFO: Sorting BAM file
2024-07-01,12:12:01 INFO: Finished sorting BAM file
2024-07-01,12:12:01 INFO: Finished mOTUs - map_tax routine - Alignment against the mOTUs database ...
2024-07-01,12:12:02 INFO: Starting mOTUs - calc_mgc routine - Calculating abundances per MGC ...
2024-07-01,12:12:02 INFO: Reading alignment file ...
2024-07-01,12:12:19 INFO: Finished reading alignment file ...
2024-07-01,12:12:19 INFO: Read 31711 aligned inserts of which 39.73% are multimappers
2024-07-01,12:12:19 INFO: Finished mOTUs - calc_mgc routine - Calculating abundances per MGC ...
2024-07-01,12:12:19 INFO: mOTU tool shutting down with exitcode 0
This will produce a few mOTUs output files:
Output files:
ERR479092.motus --> The main output file: mOTU abundances, one mOTU per line
ERR479092.motus.relab --> Same as ERR479092.motus but translated to relative abundances
head -n 10 ERR479092.motus
#TOOL:4.0.0_DB:4.0 report_mode=counts count_mode=INSERT_SCALED min_mgcs=3
MOTU ERR479092
mOTUv4.0_000002 8
mOTUv4.0_000003 13
mOTUv4.0_000006 64
mOTUv4.0_000007 1
mOTUv4.0_000022 2
mOTUv4.0_000023 3
mOTUv4.0_000026 2
mOTUv4.0_000030 241
head -n 10 ERR479092.motus.relab
#TOOL:4.0.0_DB:4.0 report_mode=relative_abundance count_mode=INSERT_SCALED min_mgcs=3
MOTU ERR479092
mOTUv4.0_000002 0.00302044
mOTUv4.0_000003 0.00512949
mOTUv4.0_000006 0.02560648
mOTUv4.0_000007 0.00048276
mOTUv4.0_000022 0.00064852
mOTUv4.0_000023 0.00128958
mOTUv4.0_000026 0.00098196
mOTUv4.0_000030 0.09671557
Temporary files (useful for rerunning with different parameters):
ERR479092.motus.bam --> Filtered alignment file of reads against the mOTUs markergene database
ERR479092.motus.mgc --> alignments aggregated at markergenecluster level (intermediate level between markergene and mOTU)
Merging of profiles can be performed using the merge
routine. For that we first need to profile a second sample:
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479030/ERR479030_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479030/ERR479030_2.fastq.gz
python ../motus.py profile -f ERR479030_1.fastq.gz -r ERR479030_2.fastq.gz -n ERR479030 -o ERR479030.motus -t 32
Next we merge the two profile files:
python ../motus.py merge -i ERR479030.motus ERR479092.motus -o merged.motus
head -n 10 merged.motus
#TOOL:4.0.0_DB:4.0 report_mode=counts count_mode=INSERT_SCALED min_mgcs=3
MOTU ERR479030 ERR479092
mOTUv4.0_000002 3 8
mOTUv4.0_000003 0 13
mOTUv4.0_000006 1 64
mOTUv4.0_000007 0 1
mOTUv4.0_000009 2 0
mOTUv4.0_000011 1 0
mOTUv4.0_000012 4 0
mOTUv4.0_000016 5 0
head -n 10 merged.motus.relab
#TOOL:4.0.0_DB:4.0 report_mode=relative_abundance count_mode=INSERT_SCALED min_mgcs=3
MOTU ERR479030 ERR479092
mOTUv4.0_000002 0.00095238 0.00320641
mOTUv4.0_000003 0.00000000 0.00521042
mOTUv4.0_000006 0.00031746 0.02565130
mOTUv4.0_000007 0.00000000 0.00040080
mOTUv4.0_000009 0.00063492 0.00000000
mOTUv4.0_000011 0.00031746 0.00000000
mOTUv4.0_000012 0.00126984 0.00000000
mOTUv4.0_000016 0.00158730 0.00000000
All genomes used to build the mOTUs database can be downloaded via the mOTUs tool (or via the https://motus-db.org/ website).
# List all genomes where the taxonomy matches the word "Angelakisella" exactly
python ../motus.py download -s Angelakisella.genomes -w Angelakisella -l
2024-07-01,12:45:56 INFO: mOTU tool starting
2024-07-01,12:45:56 INFO: Loading database ...
2024-07-01,12:45:56 INFO: Initialising the mOTUs search database.
2024-07-01,12:46:44 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words.
2024-07-01,12:46:44 INFO: Searching for keyword: Angelakisella.
2024-07-01,12:46:44 INFO: Found: 5485 hits.
2024-07-01,12:46:44 INFO: Found 5485 genomes. Writing genome information to Angelakisella.genomes
2024-07-01,12:46:44 INFO: Finished writing genome information to Angelakisella.genomes
2024-07-01,12:46:44 INFO: mOTU tool shutting down with exitcode 0
head -n 3 Angelakisella.genomes
GENOME MOTU PATH DOMAIN PHYLUM CLASS ORDER FAMILY GENUS SPECIES
ANDE20-1_SAMEA4688840_MAG_00000115 no_mOTU https://sunagawalab.ethz.ch/share/MOTUS/database/4.0/data/genomes/ANDE20-1/ANDE20-1_SAMEA4688840_METAG/ ANDE20-1_SAMEA4688840_MAG_00000115/ANDE20-1_SAMEA4688840_MAG_00000115.fa.gz Bacteria Bacillota_A Clostridia Oscillospirales Ruminococcaceae Angelakisella Angelakisella sp004554485
ANDE20-1_SAMEA4688843_MAG_00000017 no_mOTU https://sunagawalab.ethz.ch/share/MOTUS/database/4.0/data/genomes/ANDE20-1/ANDE20-1_SAMEA4688843_METAG/ ANDE20-1_SAMEA4688843_MAG_00000017/ANDE20-1_SAMEA4688843_MAG_00000017.fa.gz Bacteria Bacillota_A Clostridia Oscillospirales Ruminococcaceae Angelakisella Angelakisella sp004554485
# Download all genomes where the taxonomy matches the word "Angelakisella" exactly
python ../motus.py download -s Angelakisella.genomes -w Angelakisella -o Angelakisella_genomes_folder/
2024-07-01,12:47:47 INFO: mOTU tool starting
2024-07-01,12:47:47 INFO: Loading database ...
2024-07-01,12:47:47 INFO: Initialising the mOTUs search database.
2024-07-01,12:48:34 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words.
2024-07-01,12:48:34 INFO: Searching for keyword: Angelakisella.
2024-07-01,12:48:34 INFO: Found: 5485 hits.
2024-07-01,12:48:34 INFO: Found 5485 genomes. Writing genome information to Angelakisella.genomes
2024-07-01,12:48:34 INFO: Finished writing genome information to Angelakisella.genomes
2024-07-01,12:48:34 INFO: Downloading genomes to Angelakisella_genomes_folder
2024-07-01,12:48:34 INFO: Downloading genome (1 / 5485) ANDE20-1_SAMEA4688840_MAG_00000115 to Angelakisella_genomes_folder/ANDE20-1_SAMEA4688840_MAG_00000115.fa.gz
...
2024-07-01,12:50:08 INFO: Downloading genome (5484 / 5485) ZHUJ18-1_SAMN08993540_MAG_00000012 to Angelakisella_genomes_folder/ZHUJ18-1_SAMN08993540_MAG_00000012.fa.gz
2024-07-01,12:50:08 INFO: Downloading genome (5485 / 5485) ZHUJ18-1_SAMN08993547_MAG_00000060 to Angelakisella_genomes_folder/ZHUJ18-1_SAMN08993547_MAG_00000060.fa.gz
2024-07-01,12:50:08 INFO: Finished downloading genomes
2024-07-01,12:50:08 INFO: mOTU tool shutting down with exitcode 0
# Download all REPRESENTATIVE genomes where the taxonomy matches the word "Angelakisella" exactly
python ../motus.py download -s Angelakisella.representative.genomes -w Angelakisella -o Angelakisella_representative_genomes_folder/ -r
2024-07-01,12:50:59 INFO: mOTU tool starting
2024-07-01,12:50:59 INFO: Loading database ...
2024-07-01,12:50:59 INFO: Initialising the mOTUs search database.
2024-07-01,12:51:47 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words.
2024-07-01,12:51:47 INFO: Searching for keyword: Angelakisella.
2024-07-01,12:51:47 INFO: Found: 18 hits.
2024-07-01,12:51:47 INFO: Found 18 genomes. Writing genome information to Angelakisella.representative.genomes
2024-07-01,12:51:47 INFO: Finished writing genome information to Angelakisella.representative.genomes
2024-07-01,12:51:47 INFO: Downloading genomes to Angelakisella_representative_genomes_folder
2024-07-01,12:51:47 INFO: Downloading genome (1 / 18) BATT21-1_SAMEA7076242_MAG_00000037 to Angelakisella_representative_genomes_folder/BATT21-1_SAMEA7076242_MAG_00000037.fa.gz
2024-07-01,12:51:47 INFO: Downloading genome (2 / 18) BATT21-1_SAMEA7085688_MAG_00000030 to Angelakisella_representative_genomes_folder/BATT21-1_SAMEA7085688_MAG_00000030.fa.gz
2024-07-01,12:51:47 INFO: Downloading genome (3 / 18) DANK21-1_haib17CEM4890_H2NYMCCXY_SL254772_MAG_00000039 to Angelakisella_representative_genomes_folder/DANK21-1_haib17CEM4890_H2NYMCCXY_SL254772_MAG_00000039.fa.gz
...
2024-07-01,12:51:47 INFO: Downloading genome (17 / 18) XIAO15-1_SAMEA3134386_MAG_00000003 to Angelakisella_representative_genomes_folder/XIAO15-1_SAMEA3134386_MAG_00000003.fa.gz
2024-07-01,12:51:47 INFO: Downloading genome (18 / 18) XIAO16-1_SAMEA3663224_MAG_00000007 to Angelakisella_representative_genomes_folder/XIAO16-1_SAMEA3663224_MAG_00000007.fa.gz
2024-07-01,12:51:47 INFO: Finished downloading genomes
2024-07-01,12:51:47 INFO: mOTU tool shutting down with exitcode 0
# Download REPRESENTATIVE genome from mOTU "mOTUv4.0_000000"
python ../motus.py download -s mOTUv4.0_000000.representative.genome -w mOTUv4.0_000000 -o mOTUv4.0_000000_representative_genome_folder/ -r
24-07-01,12:53:00 INFO: mOTU tool starting
2024-07-01,12:53:00 INFO: Loading database ...
2024-07-01,12:53:00 INFO: Initialising the mOTUs search database.
2024-07-01,12:53:47 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words.
2024-07-01,12:53:47 INFO: Searching for keyword: mOTUv4.0_000000.
2024-07-01,12:53:47 INFO: Found: 1 hits.
2024-07-01,12:53:47 INFO: Found 1 genomes. Writing genome information to mOTUv4.0_000000.representative.genome
2024-07-01,12:53:47 INFO: Finished writing genome information to mOTUv4.0_000000.representative.genome
2024-07-01,12:53:47 INFO: Downloading genomes to mOTUv4.0_000000_representative_genome_folder
2024-07-01,12:53:47 INFO: Downloading genome (1 / 1) RSGB23-1_GCF-023347315-V1_GENO_10000001 to mOTUv4.0_000000_representative_genome_folder/RSGB23-1_GCF-023347315-V1_GENO_10000001.fa.gz
2024-07-01,12:53:47 INFO: Finished downloading genomes
2024-07-01,12:53:47 INFO: mOTU tool shutting down with exitcode 0