alt text

mOTU profiler

This is the alpha release of the mOTUs4 (v4.0.0a) profiler.

The mOTUs profiler is a computational tool that estimates (relative) taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.

If you use mOTUs, please cite:

Reference genome-independent taxonomic profiling of microbiomes with mOTUs3

Hans-Joachim Ruscheweyh, Alessio Milanese, Lucas Paoli, Nicolai Karcher, Quentin Clayssen, Marisa Isabell Metzger, Jakob Wirbel, Peer Bork, Daniel R. Mende, Georg Zeller# & Shinichi Sunagawa#

Microbiome (2022)

doi: 10.1186/s40168-022-01410-z

Pre-requisites/Installation

The mOTU profiler requires:

Installation of mOTUs is easiest when using the conda package manager. To install conda and enable bioconda please follow the instructions at the official bioconda page. Then install mOTUs using conda:

conda create -n motus4 python=3.10 bwa=0.7.17 pysam=0.21 biopython=1.81
conda activate motus4
wget https://sunagawalab.ethz.ch/share/MOTUS/motus-tool/v4.0.0a/mOTUsv4.0.0a.tar.gz
tar -xzvf mOTUsv4.0.0a.tar.gz
cd motus-tool
python motus.py downloadDB

    2024-07-01,12:04:08 INFO: mOTU tool starting
    2024-07-01,12:04:08 INFO: Start downloading mOTUs marker gene database. ~6GB
    2024-07-01,12:04:20 INFO: Finished downloading mOTUs marker gene database.
    2024-07-01,12:04:20 INFO: Start un-taring mOTUs marker gene database.
    2024-07-01,12:05:20 INFO: Finished untaring mOTUs marker gene database.
    2024-07-01,12:05:20 INFO: mOTU tool shutting down with exitcode 0

python motus.py

    Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.0
    Reference: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
    taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
    doi: https://doi.org/10.1186/s40168-022-01410-z
    
    motus <command> [options]
    
        -- Taxonomic profiling
              profile     Perform taxonomic profiling (map_tax + calc_mgc + calc_motu) in a single step
    
              map_tax     Map reads to the marker gene database
              calc_mgc    Calculate marker gene cluster (MGC) abundance
              calc_motu   Summarize MGC abundances into a mOTU profile
    
        -- Utilities
              download    Download genomes associated with mOTUs
              downloadDB  Download the mOTUs marker gene database
              merge       Merge several taxonomic profiling results into one table
    
        Type motus <command> to print the help menu for a specific command
    
    motus.py: error: the following arguments are required: command

Basic Examples

The mOTUs profiler takes as input sequence read files, aligns them against the markergene database and aggregates the alignments in mOTUs. To test the mOTUs profiler we will download a dataset from ENA

mkdir motus-test
cd motus-test
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479092/ERR479092_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479092/ERR479092_2.fastq.gz

Now run the mOTUs profile command:

# -t --> number of threads, can be adapted
python ../motus.py profile -f ERR479092_1.fastq.gz -r ERR479092_2.fastq.gz -n ERR479092 -o ERR479092.motus -t 32

    2024-07-01,12:08:07 INFO: mOTU tool starting
    2024-07-01,12:08:07 INFO: Loading database ...
    2024-07-01,12:08:25 INFO: Loading database finished. Version 4.0 (version date: 2024-06-05) contains 124300 mOTUs, 2075157 markergeneclusters and 3436253 markergenes.
    2024-07-01,12:08:25 INFO: Starting mOTUs - map_tax routine - Alignment against the mOTUs database ...
    2024-07-01,12:08:25 INFO: Aligning ERR479092_1.fastq.gz
    2024-07-01,12:10:04 INFO: Finished alignment. Total reads: 9447842, Total aligned reads 11116, 0.1177% aligned.
    2024-07-01,12:10:04 INFO: Aligning ERR479092_2.fastq.gz
    2024-07-01,12:11:44 INFO: Finished alignment. Total reads: 9447842, Total aligned reads 28029, 0.2967% aligned.
    2024-07-01,12:11:44 INFO: Finished all alignments. Total reads: 18895684, Total aligned reads 39145, 0.2072%
    2024-07-01,12:11:44 INFO: Sorting BAM file
    2024-07-01,12:12:01 INFO: Finished sorting BAM file
    2024-07-01,12:12:01 INFO: Finished mOTUs - map_tax routine - Alignment against the mOTUs database ...
    2024-07-01,12:12:02 INFO: Starting mOTUs - calc_mgc routine - Calculating abundances per MGC ...
    2024-07-01,12:12:02 INFO: Reading alignment file ...
    2024-07-01,12:12:19 INFO: Finished reading alignment file ...
    2024-07-01,12:12:19 INFO: Read 31711 aligned inserts of which 39.73% are multimappers
    2024-07-01,12:12:19 INFO: Finished mOTUs - calc_mgc routine - Calculating abundances per MGC ...
    2024-07-01,12:12:19 INFO: mOTU tool shutting down with exitcode 0

This will produce a few mOTUs output files:

Output files:
ERR479092.motus --> The main output file: mOTU abundances, one mOTU per line
ERR479092.motus.relab --> Same as ERR479092.motus but translated to relative abundances

head -n 10 ERR479092.motus
    #TOOL:4.0.0_DB:4.0  report_mode=counts  count_mode=INSERT_SCALED    min_mgcs=3
    MOTU    ERR479092
    mOTUv4.0_000002 8
    mOTUv4.0_000003 13
    mOTUv4.0_000006 64
    mOTUv4.0_000007 1
    mOTUv4.0_000022 2
    mOTUv4.0_000023 3
    mOTUv4.0_000026 2
    mOTUv4.0_000030 241
    
head -n 10 ERR479092.motus.relab
    #TOOL:4.0.0_DB:4.0  report_mode=relative_abundance  count_mode=INSERT_SCALED    min_mgcs=3
    MOTU    ERR479092
    mOTUv4.0_000002 0.00302044
    mOTUv4.0_000003 0.00512949
    mOTUv4.0_000006 0.02560648
    mOTUv4.0_000007 0.00048276
    mOTUv4.0_000022 0.00064852
    mOTUv4.0_000023 0.00128958
    mOTUv4.0_000026 0.00098196
    mOTUv4.0_000030 0.09671557


Temporary files (useful for rerunning with different parameters):
ERR479092.motus.bam --> Filtered alignment file of reads against the mOTUs markergene database
ERR479092.motus.mgc --> alignments aggregated at markergenecluster level (intermediate level between markergene and mOTU)

Merging of profiles can be performed using the merge routine. For that we first need to profile a second sample:

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479030/ERR479030_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR479/ERR479030/ERR479030_2.fastq.gz
python ../motus.py profile -f ERR479030_1.fastq.gz -r ERR479030_2.fastq.gz -n ERR479030 -o ERR479030.motus -t 32

Next we merge the two profile files:

python ../motus.py merge -i ERR479030.motus ERR479092.motus -o merged.motus


head -n 10 merged.motus
    #TOOL:4.0.0_DB:4.0  report_mode=counts count_mode=INSERT_SCALED min_mgcs=3
    MOTU    ERR479030   ERR479092
    mOTUv4.0_000002 3   8
    mOTUv4.0_000003 0   13
    mOTUv4.0_000006 1   64
    mOTUv4.0_000007 0   1
    mOTUv4.0_000009 2   0
    mOTUv4.0_000011 1   0
    mOTUv4.0_000012 4   0
    mOTUv4.0_000016 5   0

head -n 10 merged.motus.relab
    #TOOL:4.0.0_DB:4.0  report_mode=relative_abundance count_mode=INSERT_SCALED min_mgcs=3
    MOTU    ERR479030   ERR479092
    mOTUv4.0_000002 0.00095238  0.00320641
    mOTUv4.0_000003 0.00000000  0.00521042
    mOTUv4.0_000006 0.00031746  0.02565130
    mOTUv4.0_000007 0.00000000  0.00040080
    mOTUv4.0_000009 0.00063492  0.00000000
    mOTUv4.0_000011 0.00031746  0.00000000
    mOTUv4.0_000012 0.00126984  0.00000000
    mOTUv4.0_000016 0.00158730  0.00000000

All genomes used to build the mOTUs database can be downloaded via the mOTUs tool (or via the https://motus-db.org/ website).

# List all genomes where the taxonomy matches the word "Angelakisella" exactly
python ../motus.py download -s Angelakisella.genomes -w Angelakisella -l

    2024-07-01,12:45:56 INFO: mOTU tool starting
    2024-07-01,12:45:56 INFO: Loading database ...
    2024-07-01,12:45:56 INFO: Initialising the mOTUs search database.
    2024-07-01,12:46:44 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words.
    2024-07-01,12:46:44 INFO: Searching for keyword: Angelakisella.
    2024-07-01,12:46:44 INFO: Found: 5485 hits.
    2024-07-01,12:46:44 INFO: Found 5485 genomes. Writing genome information to Angelakisella.genomes
    2024-07-01,12:46:44 INFO: Finished writing genome information to Angelakisella.genomes
    2024-07-01,12:46:44 INFO: mOTU tool shutting down with exitcode 0


head -n 3 Angelakisella.genomes
    GENOME  MOTU    PATH    DOMAIN  PHYLUM  CLASS   ORDER   FAMILY  GENUS   SPECIES
    ANDE20-1_SAMEA4688840_MAG_00000115  no_mOTU https://sunagawalab.ethz.ch/share/MOTUS/database/4.0/data/genomes/ANDE20-1/ANDE20-1_SAMEA4688840_METAG/ ANDE20-1_SAMEA4688840_MAG_00000115/ANDE20-1_SAMEA4688840_MAG_00000115.fa.gz Bacteria    Bacillota_A Clostridia  Oscillospirales Ruminococcaceae Angelakisella   Angelakisella sp004554485
    ANDE20-1_SAMEA4688843_MAG_00000017  no_mOTU https://sunagawalab.ethz.ch/share/MOTUS/database/4.0/data/genomes/ANDE20-1/ANDE20-1_SAMEA4688843_METAG/ ANDE20-1_SAMEA4688843_MAG_00000017/ANDE20-1_SAMEA4688843_MAG_00000017.fa.gz Bacteria    Bacillota_A Clostridia  Oscillospirales Ruminococcaceae Angelakisella   Angelakisella sp004554485


# Download all genomes where the taxonomy matches the word "Angelakisella" exactly
python ../motus.py download -s Angelakisella.genomes -w Angelakisella -o Angelakisella_genomes_folder/

    2024-07-01,12:47:47 INFO: mOTU tool starting
    2024-07-01,12:47:47 INFO: Loading database ...
    2024-07-01,12:47:47 INFO: Initialising the mOTUs search database.
    2024-07-01,12:48:34 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words.
    2024-07-01,12:48:34 INFO: Searching for keyword: Angelakisella.
    2024-07-01,12:48:34 INFO: Found: 5485 hits.
    2024-07-01,12:48:34 INFO: Found 5485 genomes. Writing genome information to Angelakisella.genomes
    2024-07-01,12:48:34 INFO: Finished writing genome information to Angelakisella.genomes
    2024-07-01,12:48:34 INFO: Downloading genomes to Angelakisella_genomes_folder
    2024-07-01,12:48:34 INFO: Downloading genome (1 / 5485) ANDE20-1_SAMEA4688840_MAG_00000115 to Angelakisella_genomes_folder/ANDE20-1_SAMEA4688840_MAG_00000115.fa.gz
    ...
    2024-07-01,12:50:08 INFO: Downloading genome (5484 / 5485) ZHUJ18-1_SAMN08993540_MAG_00000012 to Angelakisella_genomes_folder/ZHUJ18-1_SAMN08993540_MAG_00000012.fa.gz
    2024-07-01,12:50:08 INFO: Downloading genome (5485 / 5485) ZHUJ18-1_SAMN08993547_MAG_00000060 to Angelakisella_genomes_folder/ZHUJ18-1_SAMN08993547_MAG_00000060.fa.gz
    2024-07-01,12:50:08 INFO: Finished downloading genomes
    2024-07-01,12:50:08 INFO: mOTU tool shutting down with exitcode 0


# Download all REPRESENTATIVE genomes where the taxonomy matches the word "Angelakisella" exactly
python ../motus.py download -s Angelakisella.representative.genomes -w Angelakisella -o Angelakisella_representative_genomes_folder/ -r

    2024-07-01,12:50:59 INFO: mOTU tool starting
    2024-07-01,12:50:59 INFO: Loading database ...
    2024-07-01,12:50:59 INFO: Initialising the mOTUs search database.
    2024-07-01,12:51:47 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words.
    2024-07-01,12:51:47 INFO: Searching for keyword: Angelakisella.
    2024-07-01,12:51:47 INFO: Found: 18 hits.
    2024-07-01,12:51:47 INFO: Found 18 genomes. Writing genome information to Angelakisella.representative.genomes
    2024-07-01,12:51:47 INFO: Finished writing genome information to Angelakisella.representative.genomes
    2024-07-01,12:51:47 INFO: Downloading genomes to Angelakisella_representative_genomes_folder
    2024-07-01,12:51:47 INFO: Downloading genome (1 / 18) BATT21-1_SAMEA7076242_MAG_00000037 to Angelakisella_representative_genomes_folder/BATT21-1_SAMEA7076242_MAG_00000037.fa.gz
    2024-07-01,12:51:47 INFO: Downloading genome (2 / 18) BATT21-1_SAMEA7085688_MAG_00000030 to Angelakisella_representative_genomes_folder/BATT21-1_SAMEA7085688_MAG_00000030.fa.gz
    2024-07-01,12:51:47 INFO: Downloading genome (3 / 18) DANK21-1_haib17CEM4890_H2NYMCCXY_SL254772_MAG_00000039 to Angelakisella_representative_genomes_folder/DANK21-1_haib17CEM4890_H2NYMCCXY_SL254772_MAG_00000039.fa.gz
    ...
    2024-07-01,12:51:47 INFO: Downloading genome (17 / 18) XIAO15-1_SAMEA3134386_MAG_00000003 to Angelakisella_representative_genomes_folder/XIAO15-1_SAMEA3134386_MAG_00000003.fa.gz
    2024-07-01,12:51:47 INFO: Downloading genome (18 / 18) XIAO16-1_SAMEA3663224_MAG_00000007 to Angelakisella_representative_genomes_folder/XIAO16-1_SAMEA3663224_MAG_00000007.fa.gz
    2024-07-01,12:51:47 INFO: Finished downloading genomes
    2024-07-01,12:51:47 INFO: mOTU tool shutting down with exitcode 0

# Download REPRESENTATIVE genome from mOTU "mOTUv4.0_000000"
python ../motus.py download -s mOTUv4.0_000000.representative.genome -w mOTUv4.0_000000 -o mOTUv4.0_000000_representative_genome_folder/ -r

    24-07-01,12:53:00 INFO: mOTU tool starting
    2024-07-01,12:53:00 INFO: Loading database ...
    2024-07-01,12:53:00 INFO: Initialising the mOTUs search database.
    2024-07-01,12:53:47 INFO: Finished initialising the mOTUs search database. Found 124288 mOTUs, 3747151 genomes and 82452 taxonomy search words.
    2024-07-01,12:53:47 INFO: Searching for keyword: mOTUv4.0_000000.
    2024-07-01,12:53:47 INFO: Found: 1 hits.
    2024-07-01,12:53:47 INFO: Found 1 genomes. Writing genome information to mOTUv4.0_000000.representative.genome
    2024-07-01,12:53:47 INFO: Finished writing genome information to mOTUv4.0_000000.representative.genome
    2024-07-01,12:53:47 INFO: Downloading genomes to mOTUv4.0_000000_representative_genome_folder
    2024-07-01,12:53:47 INFO: Downloading genome (1 / 1) RSGB23-1_GCF-023347315-V1_GENO_10000001 to mOTUv4.0_000000_representative_genome_folder/RSGB23-1_GCF-023347315-V1_GENO_10000001.fa.gz
    2024-07-01,12:53:47 INFO: Finished downloading genomes
    2024-07-01,12:53:47 INFO: mOTU tool shutting down with exitcode 0