Command manual¶

Here we provide a full description of the various commands and their various options. In the command line, you can always type motus <command> to obtain a short description of the various flags and useage.

To execute the motus profiler you need to call motus <command> [options]. The possible values for command are:

profile, perform taxonomic profiling on a sample (map_tax + calc_mgc + calc_motu);
merge, append different profiles to create a table.

The profile command can be split into:

map_tax, map reads to the marker gene database, output a SAM/BAM file;
calc_mgc, aggregate reads from the same marker gene cluster (mgc) and output the mgc abundance table. It uses the SAM/BAM file produced by map_tax;
calc_motu, from a mgc abundance table (created by calc_mgc), produce the mOTUs abundance table;

We also have a command to handle long reads. You can find a more detailed tutorial here.

prep_long, which converts long read data into short read data, which can then be used by mOTUs profile .

And commands to perform SNV profiling using the metaSNV package https://metasnv.embl.de/. Again you have a more detailed tutorial using these commands here.

map_snv, map reads to the mOTUs marker gene database and produce a BAM file suitable for metaSNV.
snv_call, SNV calling using metaSNV

`profile`¶

Performs taxonomic profiling from reads in fastq format, outputs the relative abundances of each profiled species.

Input options

Option	Input type	Description	Example
`-f`	FILE[,FILE]	Input fastq file in the forward orientation. When present, it requires also `-r` to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case reads of analyzed in two lanes need to be comma-separated.	`motus profile -f sample_lane1_1.fq,for_sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq > sample.motus`
`-r`	FILE[,FILE]	Input fastq file(s) in the reverse orientation. When present, it requires also `-f` to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case of reads analyzed in two lanes need to be comma-separated. It is important to be consistent in the order of the files	`motus profile -f sample_lane1_1.fq,for_sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq > sample.motus`
`-s`	FILE[,FILE]	Input fastq file(s) for unpaired reads, The file(s) can also be a ZIP file (.gz or .bz2). You can also analyze single read files alone or together with forward and reverse reads.	`motus profile -s sample_lane1.fq,sample_lane2.fq > sample.motus` \motus profile -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -s sample_lane1.fq > sample.motus`
`-n`	STR	Name of the sample. Supplying a unique name is essential for merging profiles later on.	`motus profile -f sample_lane1_1.fq,for_sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -n sample > sample.motus`
`-i`	FILE	From the intermediate alignment result of `map_tax`( as produced by the `-I` option) as input, create a taxonomic profile	`motus profile -i sample.bam > sample.motus`
`-db`	DIR	Provide a different database directory DIR	If the database is in directory `~/database` `motus profile -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -db ~/database > sample.motus`
`-m`	FILE	From the intermediate MGC read count table (as produced by the `-M` option) as input, create a taxonomic profile	`motus profile -m sample.mgc > sample.motus`

Output options

Option	Input type	Description	Example
`-o`	FILE	Output file name. If you don’t provide this option then it will print to stdout.	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus`
`-I`	FILE	Save the intermediate alignment result of `map_tax` in .bam format in FILE	`motus -f sample_1.fq -r sample_2.fq -I sample.bam > sample.motus`
`-M`	FILE	Save the intermediate marker gene cluster (MGC) read count table result from `calc_mgc` in FILE	`motus profile -f sample_1.fq -r sample_2.fq -M sample.mgc > sample.motus`
`-e`		Print the abundances of only the ref-mOTUs in the output. All other mOTU types (meta and ext) will be part of `unassigned`	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -e`
`-c`		Output the taxonomic profile with counts rather than relative abundances	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -c`
`-p`		Print the NCBI TaxID of the mOTU in the output	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -p`
`-B`		Output the taxonomy profile in BIOM format	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -B`
`-C`	STR	Print result in CAMI format (BioBoxes format 0.9.1). Possible values: [precision, recall, parenthesis]. Note that the mOTUs species definition and the NCBI species definition is not always congruent. As a result, you can decide three methods to save the result in CAMI format: “precision”, where the discrepancies are deleted; “recall”, where the relative abundances of the discrepancies are split and “parenthesis” where all the discrepancies are kept.	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -P precision`
`-q`		Report the full rank taxonomy in the taxonomic profile output	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -q`
`-k`	STR	Report abundances at a specific taxonomic level. You can choose between [kingdom, phylum, class, order, family, genus, mOTU].	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -k phylum`
`-A`		Print all taxonomic levels together (kingdom to mOTUs; overrides `-k`)	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -A`

Algorithm options

Option	Input type	Description	Example
`-g`	INT [Default 3]	Number of marker genes required to calculate a mOTU’s abundance. Given a mOTU, we calculate its abundance if at least `-g` marker genes have a read count different from 0. A value equal to 1 produces results with higher recall, while higher values produce results with higher precision. The minimum value is 1 and the maximum is 10. Default: 3	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -g 1`
`-l`	INT [Default 75]	Minimum alignment length for reads. This has to be lower than the average read length; a warning will be produced in the stderr if `l` is larger than the average read length. A smaller value produces higher recall while a larger value will produce higher precision. Default: 75	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -l 60`
`-t`	INT [Default 1]	Number of threads to use when running `bwa`. It is suggested to use multiple threads so that bwa will run faster.	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -t 4`
`-y`	STR [Default insert.scaled_count]	Type of read counts that we use. Possible values: [base.coverage, insert.raw_counts, insert.scaled_counts]	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -y base.coverage`
`-v`	INT [Default 3]	Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging	`motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -v 1`

`merge`¶

Merges taxnonomic profiles from multiple samples into one (tab-separated) table. Requires that each profile is named (using -n in motus profile).

Input options

Option	Input type	Description	Example
`-i`	[,FILE]	List of profiled samples to be merged. It is important that each profile has a unique name (given by `-n` flag in `motus profile` ).	`motus merge -i sampleA.motus,sampleB.motus > merged.motus`
`-d`	DIR	Merge all profiles within directory DIR. Note that the command will fail if any file other than mOTUs profiles are present in the directory.	If all profiles to be merged are in DIR `results/`: `motus merge -d results > merged.motus`
`-a`	STR [,STR]	Append profiles pre-computed using publicly available metagenomic and metatransciptomic samples from various environments (available from `mOTUs` versions 2.6 and up) to your own profilles. A total of `x` public profiles are available from the following environments: [all, air, bioreactor, bee, cat, marine, mouse, pig, sheep, soil, termite, wastewater]. You can append profiles from a single environment or multiple environments.	Appending profiles from a single environment:`motus merge -d results -a human > merged.motus` Appending profiles from multiple environments: `motus merge -d results -a human,cat` Appending all public profiles: `motus merge -d results -a all > merged.motus`

Output options

Option	Input type	Description	Example
`-o`	FILE	Output file name. If you don’t provide this option then it will print to stdout.	`motus merge -i sampleA.motus,sampleB.motus -o merged.motus`
`-B`		Print result in BIOM format	`motus merge -i sampleA.motus,sampleB.motus -o merged.motus -B`

Algorithm options

Option	Input type	Description	Example
`-v`	INT [Default 3]	Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging	`motus merge -i sampleA.motus,sampleB.motus -o merged.motus -v 2`

`map_tax`¶

Maps reads from fastq files to marker gene database, outputs a SAM/BAM file.

Input options

Option	Input type	Description	Example
`-f`	FILE[,FILE]	Input fastq file in the forward orientation. When present, it requires also `-r` to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case reads of analyzed in two lanes need to be comma-separated.	`motus map_tax -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq > sample.sam`
`-r`	FILE[,FILE]	Input fastq file(s) in the reverse orientation. When present, it requires also `-f` to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case of reads analyzed in two lanes need to be comma-separated. It is important to be consistent in the order of the files	`motus map_tax -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq > sample.sam`
`-s`	FILE[,FILE]	Input fastq file(s) for unpaired reads, The file(s) can also be a ZIP file (.gz or .bz2). You can also analyze single read files alone or together with forward and reverse reads.	`motus map_tax -s sample_lane1.fq,sample_lane2.fq > sample.sam` `<br /> motus map_tax -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -s sample_lane1.fq > sample.sam`
`-db`	DIR	Provide a different database directory DIR	If the database is in directory `~/database` `motus map_tax -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -db ~/database > sample.sam`

Output options

Option	Input type	Description	Example
`-o`	FILE	Output file name. If you don’t provide this option then it will print to stdout.	`motus map_tax -s sample.fq -o sample.sam`
`-b`		Save the result of `bwa` in bam format	`motus map_tax -s sample.fq -b > sample.bam`

Algorithm options

Option	Input type	Description	Example
`-t`	INT	Number of threads to use when running `bwa`. It is suggested to use multiple threads so that bwa will run faster.	`motus map_tax -f sample_1.fq -r sample_2.fq -o sample.sam -t 4`
`-l`	INT [Default 75]	Minimum alignment length for reads. This has to be lower than the average read length; a warning will be produced in the stderr if `l` is larger than the average read length. A smaller value produces higher recall while a larger value will produce higher precision. Default: 75	`motus map_tax -f sample_1.fq -r sample_2.fq -o sample.sam -l 60`
`-v`	INT [Default 3]	Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging	`motus map_tax -s sample.fq -o sample.sam -v 2`

`calc_mgc`¶

Aggregate reads from the same marker gene cluster (mgc) and outputs the mgc abundance table. It uses the SAM/BAM file produced by map_tax.

Input options

Option	Input type	Description	Example
`-i`	FILE [,FILE]	Input SAM or BAM file (or list of files) result of `motus map_tax`. The program automatically recognizes the right extension.	`motus calc_mgc -i sample.bam > mgc_reads.count` `motus calc_mgc -i sample1_lane1.bam,sample1_lane2.sam > mgc_reads.count`
`-n`	STR	Name of the sample. Supplying a unique name is essential for merging profiles later on.	`motus calc_mgc -i sample.bam -n sample > mgc_reads.count`
`-db`	DIR	Provide a different database directory DIR	`motus calc_mgc -i sample.bam -db ~/database > mgc_reads.count`

Output options

Option	Input type	Description	Example
`-i`	FILE [,FILE]	Input SAM or BAM file (or list of files) result of `motus map_tax`. The program automatically recognizes the right extension.	`motus calc_mgc -i sample.bam > mgc_reads.count` `motus calc_mgc -i sample1_lane1.bam,sample1_lane2.sam > mgc_reads.count`
`-n`	STR	Name of the sample. Supplying a unique name is essential for merging profiles later on.	`motus calc_mgc -i sample.bam -n sample > mgc_reads.count`
`-db`	DIR	Provide a different database directory DIR	`motus calc_mgc -i sample.bam -db ~/database > mgc_reads.count`

Algorithm options

Option	Input type	Description	Example
`-l`	INT [Default 75]	Minimum alignment length for reads. This has to be lower than the average read length; a warning will be produced in the stderr if `l` is larger than the average read length. A smaller value produces higher recall while a larger value will produce higher precision. Default: 75	`motus calc_mgc -i sample.bam -y insert.raw_counts -l 50 -o mgc_reads.count`
`-v`	INT [Default 3]	Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging	`motus calc_mgc -i sample.bam -v 2 > mgc_reads.count`
`-y`	STR [Default insert.scaled_count]	Type of read counts that we use. Possible values: [base.coverage, insert.raw_counts, insert.scaled_counts]	`motus calc_mgc -i sample.bam -y insert.raw_counts -o mgc_reads.count`

`calc_motu`¶

Produces the mOTUs abundance table (final output of motus profile) from a mgc abundance table (created by calc_mgc).

Input options

Option	Input type	Description	Example
`-i`	FILE	Input MGC read count table (produced by `motus calc_mgc`)	`motus calc_motu -i mgc_reads.count > sample.motus`
`-n`	STR	Name of the sample. Supplying a unique name is essential for merging profiles later on.	`motus calc_motu -i mgc_reads.count -n sample > sample.motus`
`-db`	DIR	Provide a different database directory DIR	If the database is in directory `~/database` `motus -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -db ~/database > sample.motus`

Output options

Option	Input type	Description	Example
`-o`	FILE	Output file name. If you don’t provide this option then it will print to stdout.	`motus calc_motu -i mgc_reads.count -o sample.motus`
`-e`		Print the abundances of only the ref-mOTUs in the output. All other mOTU types (meta and ext) will be part of `unassigned`	`motus calc_motu -i mgc_reads.count -o sample.motus -e`
`-c`		Output the taxonomic profile with counts rather than relative abundances	`motus calc_motu -i mgc_reads.count -o sample.motus -c`
`-p`		Print the NCBI TaxID of the mOTU in the output	`motus calc_motu -i mgc_reads.count -o sample.motus -p`
`-B`		Output the taxonomy profile in BIOM format	`motus calc_motu -i mgc_reads.count -o sample.motus -B`
`-C`	STR	Print result in CAMI format (BioBoxes format 0.9.1). Possible values: [precision, recall, parenthesis]. Note that the mOTUs species definition and the NCBI species definition is not always congruent. As a result, you can decide three methods to save the result in CAMI format: “precision”, where the discrepancies are deleted; “recall”, where the relative abundances of the discrepancies are split and “parenthesis” where all the discrepancies are kept.	`motus calc_motu -i mgc_reads.count -o sample.motus -P precision`
`-q`		Report the full rank taxonomy in the taxonomic profile output	`motus calc_motu -i mgc_reads.count -o sample.motus -q`
`-k`	STR	Report abundances at a specific taxonomic level. You can choose between [kingdom, phylum, class, order, family, genus, mOTU].	`motus calc_motu -i mgc_reads.count -o sample.motus -k phylum`

Algorithm options

Option	Input type	Description	Example
`-g`	INT [Default 3]	Number of marker genes required to calculate a mOTU’s abundance. Given a mOTU, we calculate its abundance if at least `-g` marker genes have a read count different from 0. A value equal to 1 produces results with higher recall, while higher values produce results with higher precision. The minimum value is 1 and the maximum is 10. Default: 3	`motus calc_motu -i mgc_reads.count -g 4 -o sample.motus`
`-v`	INT [Default 3]	Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging	`motus calc_motu -i mgc_reads.count -v 2 > sample.motus`
`-y`	STR [Default insert.scaled_count]	Type of read counts that we use. Possible values: [base.coverage, insert.raw_counts, insert.scaled_counts]	`motus calc_mgc -i sample.bam -y insert.raw_counts -o mgc_reads.count`

`prep_long`¶

Prepares long reads to be profiled by mOTUs.

Input Options

Option	Input type	Description	Example
`i`	FILE	Input long read file to convert into shorter reads. The file can be fasta(.gz) or fastq(.gz).	`motus prep_long -i long_reads.fasta > converted_long_reads.fasta.gz`

Output options

Option	Input type	Description	Example
`-o`	FILE	Output file name. If you don’t provide this option then it will print to stdout.	`motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz`
`-no_gz`		Do not compress the output file.	`motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz -no_gz`

Algorithm options

Option	Input type	Description	Example
`-sl`	INT [Default 300]	Splitting length for the long reads.	`motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz -sl 250`
`-ml`	INT [Default 50]	Minimum read length. Reads shorter than `ml` are discarded.	`motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz -ml 60`
`-v`	INT	Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging	`motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz -v 2`

`map_snv`¶

Maps reads to the marker gene database and produces a BAM file suitable for metaSNV https://metasnv.embl.de/. You can find a more detailed explanation on this page

Input options

Option	Input type	Description	Example
`-f`	FILE[,FILE]	Input fastq file in the forward orientation. When present, it requires also `-r` to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case reads of analyzed in two lanes need to be comma-separated.	`motus map_snv -f for_sample.fastq -r rev_sample.fastq > sample.bam`
`-r`	FILE[,FILE]	Input fastq file(s) in the reverse orientation. When present, it requires also `-f` to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case of reads analyzed in two lanes need to be comma-separated. It is important to be consistent in the order of the files.	`motus map_snv -f for_sample.fastq -r rev_sample.fastq > sample.bam`
`-s`	FILE[,FILE]	Input fastq file(s) for unpaired reads, The file(s) can also be a ZIP file (.gz or .bz2). You can also analyze single read files alone or together with forward and reverse reads.	`motus map_snv -s sample_lane1.fq,sample_lane2.fq > sample.bam`
`-db`	DIR	Provide a different database directory DIR	`motus map_snv -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -db DIR > sample.bam`

Output options

Option	Input type	Description	Example
`-o`	FILE	Output BAM file name. If you don’t provide this option then it will print to stdout.	`motus map_snv -s sample.fq > sample.bam`

Algorithm options

Option	Input type	Description	Example
`-l`	INT [Default 75]	Minimum alignment length for reads.	`motus map_snv -f for_sample.fastq -r rev_sample.fastq -l 50 > sample.bam`
`-t`	INT [Default 1]	Number of threads to use	`motus map_snv -f for_sample.fastq -r rev_sample.fastq -t 4 > sample.bam`
`-v`	INT [Default 3]	Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging	`motus map_snv -f for_sample.fastq -r rev_sample.fastq -v 2 > sample.bam`

`snv_call`¶

Performs single nucleotide variant calling using the metaSNV package https://metasnv.embl.de/.

Input options

Option	Input type	Description	Example
`-d`	DIR	Call metaSNV on all BAM files in the directory DIR [Mandatory]	`motus snv_call -d DIR -o out.dir`

Output options

Option	Input type	Description	Example
`-o`	DIR	Output Directory. It will fail if output directory already exists.	`motus snv_call -d DIR -o out.dir`
`-K`		Save in the output directory all the files and directories produced by metaSNV. By default cov, distances, filtered, snpCaller are deleted.	`motus snv_call -d DIR -o out.dir -K`

Algorithm options

Option	Input type	Description	Example
`-fb`	FLOAT [Default 80.0]	Coverage breadth, minimal horizontal genome coverage percentage per sample per species. Sample filter.	`motus snv_call -d DIR -fb 85 -o out.dir`
`-fd`	FLOAT [Default 5.0]	Coverage depth: minimal average vertical genome coverage per sample per species. Sample filter.	`motus snv_call -d DIR -fd 20 -o out.dir`
`-fm`	INT [Default 2]	Minimum number of samples per species. mOTU filter.	`motus snv_call -d DIR -fm 10 -o out.dir`
`-fp`	FLOAT [Default 0.9]	Required proportion of informative samples (coverage should be non-zero) per position. Position filter.	`motus snv_call -d DIR -fp 0.8 -o out.dir`
`-fc`	FLOAT [Default 5.0]	Minimum coverage per position per sample per species. Position filter.	`motus snv_call -d DIR -fc 10 -o out.dir`
`-t`	INT [Default 1]	Number of threads	`motus snv_call -d DIR -t 8 -o out.dir`
`-v`	INT [Default 3]	Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging	`motus snv_call -d DIR -v 2 -o out.dir`

Command manual¶

profile¶

merge¶

map_tax¶

calc_mgc¶

calc_motu¶

prep_long¶

map_snv¶

snv_call¶

`profile`¶

`merge`¶

`map_tax`¶

`calc_mgc`¶

`calc_motu`¶

`prep_long`¶

`map_snv`¶

`snv_call`¶