Command manual

Here we provide a full description of the various commands and their various options. In the command line, you can always type motus <command> to obtain a short description of the various flags and useage.

To execute the motus profiler you need to call motus <command> [options]. The possible values for command are:

  • profile, perform taxonomic profiling on a sample (map_tax + calc_mgc + calc_motu);

  • merge, append different profiles to create a table.

The profile command can be split into:

  • map_tax, map reads to the marker gene database, output a SAM/BAM file;

  • calc_mgc, aggregate reads from the same marker gene cluster (mgc) and output the mgc abundance table. It uses the SAM/BAM file produced by map_tax;

  • calc_motu, from a mgc abundance table (created by calc_mgc), produce the mOTUs abundance table;

We also have a command to handle long reads. You can find a more detailed tutorial here.

  • prep_long, which converts long read data into short read data, which can then be used by mOTUs profile .

And commands to perform SNV profiling using the metaSNV package https://metasnv.embl.de/. Again you have a more detailed tutorial using these commands here.

  • map_snv, map reads to the mOTUs marker gene database and produce a BAM file suitable for metaSNV.

  • snv_call, SNV calling using metaSNV

profile

Performs taxonomic profiling from reads in fastq format, outputs the relative abundances of each profiled species.

Input options

Option

Input type

Description

Example

-f

FILE[,FILE]

Input fastq file in the forward orientation. When present, it requires also -r to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case reads of analyzed in two lanes need to be comma-separated.

motus profile -f sample_lane1_1.fq,for_sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq > sample.motus

-r

FILE[,FILE]

Input fastq file(s) in the reverse orientation. When present, it requires also -f to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case of reads analyzed in two lanes need to be comma-separated. It is important to be consistent in the order of the files

motus profile -f sample_lane1_1.fq,for_sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq > sample.motus

-s

FILE[,FILE]

Input fastq file(s) for unpaired reads, The file(s) can also be a ZIP file (.gz or .bz2). You can also analyze single read files alone or together with forward and reverse reads.

motus profile -s sample_lane1.fq,sample_lane2.fq > sample.motus   \motus profile -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -s sample_lane1.fq > sample.motus`

-n

STR

Name of the sample. Supplying a unique name is essential for merging profiles later on.

motus profile -f sample_lane1_1.fq,for_sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -n sample > sample.motus

-i

FILE

From the intermediate alignment result of map_tax( as produced by the -I option) as input, create a taxonomic profile

motus profile -i sample.bam > sample.motus

-db

DIR

Provide a different database directory DIR

If the database is in directory ~/database
motus profile -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -db ~/database > sample.motus

-m

FILE

From the intermediate MGC read count table (as produced by the -M option) as input, create a taxonomic profile

motus profile -m sample.mgc > sample.motus

Output options

Option

Input type

Description

Example

-o

FILE

Output file name. If you don’t provide this option then it will print to stdout.

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus  

-I

FILE

Save the intermediate alignment result of map_tax in .bam format in FILE

motus -f sample_1.fq -r sample_2.fq -I sample.bam > sample.motus

-M

FILE

Save the intermediate marker gene cluster (MGC) read count table result from calc_mgc in FILE

motus profile -f sample_1.fq -r sample_2.fq -M sample.mgc > sample.motus  

-e

Print the abundances of only the ref-mOTUs in the output. All other mOTU types (meta and ext) will be part of unassigned

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -e  

-c

Output the taxonomic profile with counts rather than relative abundances

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -c  

-p

Print the NCBI TaxID of the mOTU in the output

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -p  

-B

Output the taxonomy profile in BIOM format

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -B  

-C

STR

Print result in CAMI format (BioBoxes format 0.9.1). Possible values: [precision, recall, parenthesis]. Note that the mOTUs species definition and the NCBI species definition is not always congruent. As a result, you can decide three methods to save the result in CAMI format: “precision”, where the discrepancies are deleted; “recall”, where the relative abundances of the discrepancies are split and “parenthesis” where all the discrepancies are kept.

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -P precision  

-q

Report the full rank taxonomy in the taxonomic profile output

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -q  

-k

STR

Report abundances at a specific taxonomic level. You can choose between [kingdom, phylum, class, order, family, genus, mOTU].

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -k phylum  

-A

Print all taxonomic levels together (kingdom to mOTUs; overrides -k)

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -A

Algorithm options

Option

Input type

Description

Example

-g

INT [Default 3]

Number of marker genes required to calculate a mOTU’s abundance. Given a mOTU, we calculate its abundance if at least -g marker genes have a read count different from 0. A value equal to 1 produces results with higher recall, while higher values produce results with higher precision. The minimum value is 1 and the maximum is 10. Default: 3

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -g 1

-l

INT [Default 75]

Minimum alignment length for reads. This has to be lower than the average read length; a warning will be produced in the stderr if l is larger than the average read length. A smaller value produces higher recall while a larger value will produce higher precision. Default: 75

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -l 60

-t

INT [Default 1]

Number of threads to use when running bwa. It is suggested to use multiple threads so that bwa will run faster.

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -t 4

-y

STR [Default insert.scaled_count]

Type of read counts that we use. Possible values: [base.coverage, insert.raw_counts, insert.scaled_counts]

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -y base.coverage

-v

INT [Default 3]

Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging

motus profile -f sample_1.fq -r sample_2.fq -o sample.motus -v 1

merge

Merges taxnonomic profiles from multiple samples into one (tab-separated) table. Requires that each profile is named (using -n in motus profile).

Input options

Option

Input type

Description

Example

-i

[,FILE]

List of profiled samples to be merged. It is important that each profile has a unique name (given by -n flag in motus profile ).

motus merge -i sampleA.motus,sampleB.motus > merged.motus

-d

DIR

Merge all profiles within directory DIR. Note that the command will fail if any file other than mOTUs profiles are present in the directory.

If all profiles to be merged are in DIR results/:
motus merge -d results > merged.motus

-a

STR [,STR]

Append profiles pre-computed using publicly available metagenomic and metatransciptomic samples from various environments (available from mOTUs versions 2.6 and up) to your own profilles. A total of x public profiles are available from the following environments: [all, air, bioreactor, bee, cat, marine, mouse, pig, sheep, soil, termite, wastewater]. You can append profiles from a single environment or multiple environments.

Appending profiles from a single environment:motus merge -d results -a human    > merged.motus
Appending profiles from multiple environments: motus merge -d results -a human,cat
Appending all public profiles: motus merge -d results -a all > merged.motus

Output options

Option

Input type

Description

Example

-o

FILE

Output file name. If you don’t provide this option then it will print to stdout.

motus merge -i sampleA.motus,sampleB.motus -o merged.motus

-B

Print result in BIOM format

motus merge -i sampleA.motus,sampleB.motus -o merged.motus -B

Algorithm options

Option

Input type

Description

Example

-v

INT [Default 3]

Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging

motus merge -i sampleA.motus,sampleB.motus -o merged.motus -v 2

map_tax

Maps reads from fastq files to marker gene database, outputs a SAM/BAM file.

Input options

Option

Input type

Description

Example

-f

FILE[,FILE]

Input fastq file in the forward orientation. When present, it requires also -r to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case reads of analyzed in two lanes need to be comma-separated.

motus map_tax -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq > sample.sam

-r

FILE[,FILE]

Input fastq file(s) in the reverse orientation. When present, it requires also -f to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case of reads analyzed in two lanes need to be comma-separated. It is important to be consistent in the order of the files

motus map_tax -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq > sample.sam

-s

FILE[,FILE]

Input fastq file(s) for unpaired reads, The file(s) can also be a ZIP file (.gz or .bz2). You can also analyze single read files alone or together with forward and reverse reads.

motus map_tax -s sample_lane1.fq,sample_lane2.fq > sample.sam <br />   motus map_tax -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -s sample_lane1.fq > sample.sam

-db

DIR

Provide a different database directory DIR

If the database is in directory ~/database
motus map_tax -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -db ~/database > sample.sam

Output options

Option

Input type

Description

Example

-o

FILE

Output file name. If you don’t provide this option then it will print to stdout.

motus map_tax -s sample.fq -o sample.sam

-b

Save the result of bwa in bam format

motus map_tax -s sample.fq -b > sample.bam

Algorithm options

Option

Input type

Description

Example

-t

INT

Number of threads to use when running bwa. It is suggested to use multiple threads so that bwa will run faster.

motus map_tax -f sample_1.fq -r sample_2.fq -o sample.sam -t 4

-l

INT [Default 75]

Minimum alignment length for reads. This has to be lower than the average read length; a warning will be produced in the stderr if l is larger than the average read length. A smaller value produces higher recall while a larger value will produce higher precision. Default: 75

motus map_tax -f sample_1.fq -r sample_2.fq -o sample.sam -l 60

-v

INT [Default 3]

Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging

motus map_tax -s sample.fq -o sample.sam -v 2

calc_mgc

Aggregate reads from the same marker gene cluster (mgc) and outputs the mgc abundance table. It uses the SAM/BAM file produced by map_tax.

Input options

Option

Input type

Description

Example

-i

FILE [,FILE]

Input SAM or BAM file (or list of files) result of motus map_tax. The program automatically recognizes the right extension.

motus calc_mgc -i sample.bam > mgc_reads.count
motus calc_mgc -i sample1_lane1.bam,sample1_lane2.sam > mgc_reads.count  

-n

STR

Name of the sample. Supplying a unique name is essential for merging profiles later on.

motus calc_mgc -i sample.bam -n sample > mgc_reads.count

-db

DIR

Provide a different database directory DIR

motus calc_mgc -i sample.bam -db ~/database > mgc_reads.count

Output options

Option

Input type

Description

Example

-i

FILE [,FILE]

Input SAM or BAM file (or list of files) result of motus map_tax. The program automatically recognizes the right extension.

motus calc_mgc -i sample.bam > mgc_reads.count
motus calc_mgc -i sample1_lane1.bam,sample1_lane2.sam > mgc_reads.count  

-n

STR

Name of the sample. Supplying a unique name is essential for merging profiles later on.

motus calc_mgc -i sample.bam -n sample > mgc_reads.count

-db

DIR

Provide a different database directory DIR

motus calc_mgc -i sample.bam -db ~/database > mgc_reads.count

Algorithm options

Option

Input type

Description

Example

-l

INT [Default 75]

Minimum alignment length for reads. This has to be lower than the average read length; a warning will be produced in the stderr if l is larger than the average read length. A smaller value produces higher recall while a larger value will produce higher precision. Default: 75

motus calc_mgc -i sample.bam -y insert.raw_counts -l 50 -o mgc_reads.count

-v

INT [Default 3]

Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging

motus calc_mgc -i sample.bam -v 2 > mgc_reads.count

-y

STR [Default insert.scaled_count]

Type of read counts that we use. Possible values: [base.coverage, insert.raw_counts, insert.scaled_counts]

motus calc_mgc -i sample.bam -y insert.raw_counts -o mgc_reads.count

calc_motu

Produces the mOTUs abundance table (final output of motus profile) from a mgc abundance table (created by calc_mgc).

Input options

Option

Input type

Description

Example

-i

FILE

Input MGC read count table (produced by motus calc_mgc)

motus calc_motu -i mgc_reads.count > sample.motus

-n

STR

Name of the sample. Supplying a unique name is essential for merging profiles later on.

motus calc_motu -i mgc_reads.count -n sample > sample.motus

-db

DIR

Provide a different database directory DIR

If the database is in directory ~/database
motus  -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -db ~/database > sample.motus

Output options

Option

Input type

Description

Example

-o

FILE

Output file name. If you don’t provide this option then it will print to stdout.

motus calc_motu -i mgc_reads.count -o sample.motus

-e

Print the abundances of only the ref-mOTUs in the output. All other mOTU types (meta and ext) will be part of unassigned

motus calc_motu -i mgc_reads.count -o sample.motus -e

-c

Output the taxonomic profile with counts rather than relative abundances

motus calc_motu -i mgc_reads.count -o sample.motus -c  

-p

Print the NCBI TaxID of the mOTU in the output

motus calc_motu -i mgc_reads.count -o sample.motus -p  

-B

Output the taxonomy profile in BIOM format

motus calc_motu -i mgc_reads.count -o sample.motus -B  

-C

STR

Print result in CAMI format (BioBoxes format 0.9.1). Possible values: [precision, recall, parenthesis]. Note that the mOTUs species definition and the NCBI species definition is not always congruent. As a result, you can decide three methods to save the result in CAMI format: “precision”, where the discrepancies are deleted; “recall”, where the relative abundances of the discrepancies are split and “parenthesis” where all the discrepancies are kept.

motus calc_motu -i mgc_reads.count -o sample.motus -P precision  

-q

Report the full rank taxonomy in the taxonomic profile output

motus calc_motu -i mgc_reads.count -o sample.motus -q  

-k

STR

Report abundances at a specific taxonomic level. You can choose between [kingdom, phylum, class, order, family, genus, mOTU].

motus calc_motu -i mgc_reads.count -o sample.motus -k phylum  

Algorithm options

Option

Input type

Description

Example

-g

INT [Default 3]

Number of marker genes required to calculate a mOTU’s abundance. Given a mOTU, we calculate its abundance if at least -g marker genes have a read count different from 0. A value equal to 1 produces results with higher recall, while higher values produce results with higher precision. The minimum value is 1 and the maximum is 10. Default: 3

motus calc_motu -i mgc_reads.count -g 4 -o sample.motus

-v

INT [Default 3]

Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging

motus calc_motu -i mgc_reads.count -v 2 > sample.motus

-y

STR [Default insert.scaled_count]

Type of read counts that we use. Possible values: [base.coverage, insert.raw_counts, insert.scaled_counts]

motus calc_mgc -i sample.bam -y insert.raw_counts -o mgc_reads.count

prep_long

Prepares long reads to be profiled by mOTUs.

Input Options

Option

Input type

Description

Example

i

FILE

Input long read file to convert into shorter reads. The file can be fasta(.gz) or fastq(.gz).

motus prep_long -i long_reads.fasta > converted_long_reads.fasta.gz

Output options

Option

Input type

Description

Example

-o

FILE

Output file name. If you don’t provide this option then it will print to stdout.

motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz

-no_gz

Do not compress the output file.

motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz -no_gz

Algorithm options

Option

Input type

Description

Example

-sl

INT [Default 300]

Splitting length for the long reads.

motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz -sl 250

-ml

INT [Default 50]

Minimum read length. Reads shorter than ml are discarded.

motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz -ml 60

-v

INT

Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging

motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz -v 2

map_snv

Maps reads to the marker gene database and produces a BAM file suitable for metaSNV https://metasnv.embl.de/. You can find a more detailed explanation on this page

Input options

Option

Input type

Description

Example

-f

FILE[,FILE]

Input fastq file in the forward orientation. When present, it requires also -r to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case reads of analyzed in two lanes need to be comma-separated.

motus map_snv -f for_sample.fastq -r rev_sample.fastq > sample.bam

-r

FILE[,FILE]

Input fastq file(s) in the reverse orientation. When present, it requires also -f to be present. The file(s) can also be a ZIP file (.gz or .bz2). Multiple files as in the case of reads analyzed in two lanes need to be comma-separated. It is important to be consistent in the order of the files.

motus map_snv -f for_sample.fastq -r rev_sample.fastq > sample.bam

-s

FILE[,FILE]

Input fastq file(s) for unpaired reads, The file(s) can also be a ZIP file (.gz or .bz2). You can also analyze single read files alone or together with forward and reverse reads.

motus map_snv -s sample_lane1.fq,sample_lane2.fq > sample.bam

-db

DIR

Provide a different database directory DIR

motus map_snv -f sample_lane1_1.fq,sample_lane2_1.fq -r sample_lane1_2.fq,sample_lane2_2.fq -db DIR > sample.bam

Output options

Option

Input type

Description

Example

-o

FILE

Output BAM file name. If you don’t provide this option then it will print to stdout.

motus map_snv -s sample.fq > sample.bam  

Algorithm options

Option

Input type

Description

Example

-l

INT [Default 75]

Minimum alignment length for reads.

motus map_snv -f for_sample.fastq -r rev_sample.fastq -l 50 > sample.bam

-t

INT [Default 1]

Number of threads to use

motus map_snv -f for_sample.fastq -r rev_sample.fastq -t 4 > sample.bam

-v

INT [Default 3]

Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging

motus map_snv -f for_sample.fastq -r rev_sample.fastq -v 2 > sample.bam

snv_call

Performs single nucleotide variant calling using the metaSNV package https://metasnv.embl.de/.

Input options

Option

Input type

Description

Example

-d

DIR

Call metaSNV on all BAM files in the directory DIR [Mandatory]

motus snv_call -d DIR -o out.dir

Output options

Option

Input type

Description

Example

-o

DIR

Output Directory. It will fail if output directory already exists.

motus snv_call -d DIR -o out.dir

-K

Save in the output directory all the files and directories produced by metaSNV. By default cov, distances, filtered, snpCaller are deleted.

motus snv_call -d DIR -o out.dir -K

Algorithm options

Option

Input type

Description

Example

-fb

FLOAT [Default 80.0]

Coverage breadth, minimal horizontal genome coverage percentage per sample per species. Sample filter.

motus snv_call -d DIR -fb 85  -o out.dir

-fd

FLOAT [Default 5.0]

Coverage depth: minimal average vertical genome coverage per sample per species. Sample filter.

motus snv_call -d DIR -fd 20 -o out.dir

-fm

INT [Default 2]

Minimum number of samples per species. mOTU filter.

motus snv_call -d DIR -fm 10 -o out.dir

-fp

FLOAT [Default 0.9]

Required proportion of informative samples (coverage should be non-zero) per position. Position filter.

motus snv_call -d DIR -fp 0.8 -o out.dir

-fc

FLOAT [Default 5.0]

Minimum coverage per position per sample per species. Position filter.

motus snv_call -d DIR -fc 10 -o out.dir

-t

INT [Default 1]

Number of threads

motus snv_call -d DIR -t 8 -o out.dir

-v

INT [Default 3]

Change verbosity level: 1=error, 2=warning, 3=message, 4+=debugging

motus snv_call -d DIR -v 2 -o out.dir