Profiling long reads samples¶
Long reads can be profiled using From mOTUs 3.0.3
and onwards. It works by first splitting the long reads into shorter reads (using the command motus prep_long
) that can then be profiled with the default motus profile
.
Installing mOTUs 3.0.3
¶
If you are having difficulty installing the latest version of mOTUs
with conda, installing via pip
might be better
Download example data¶
Let us try to profile long reads using mOTUs on a mock comminity. First download the long reads:
wget https://sunagawalab.ethz.ch/share/MOTUS_DATA/motus_3.0.3/motus_long_reads/HiFi-ATCC-MSA-1003.250k.fastq.gz
This dataset is a subsample of the larger dataset SRR9328980
(subsampled to 10% of the original number of reads).
Preparing the long reads¶
We first prepare the sample by splitting the long reads into shorter reads.
motus prep_long -i HiFi-ATCC-MSA-1003.250k.fastq.gz -o HiFi-ATCC-MSA-1003.250k.short.fastq -no_gz
gzip HiFi-ATCC-MSA-1003.250k.short.fastq
# or "pigz -p 32 HiFi-ATCC-MSA-1003.250k.short.fastq" if pigz is installed
# Or download the prepared result produced by the command:
wget https://sunagawalab.ethz.ch/share/MOTUS_DATA/motus_3.0.3/motus_long_reads/HiFi-ATCC-MSA-1003.250k.short.fastq.gz
Note: We compress the file manually due to performance issues with the python gzip module.
Running mOTUs¶
We can now use the usual motus profile
command on the prepared long reads.
# We use -A to be consistent with the report shown below. -A prints out all the taxonomic levels
# -t defines the number of threads
motus profile -A -s HiFi-ATCC-MSA-1003.250k.short.fastq.gz -o HiFi-ATCC-MSA-1003.motus -t 32
# Or download the prepared result produced by the command:
wget https://sunagawalab.ethz.ch/share/MOTUS_DATA/motus_3.0.3/motus_long_reads/HiFi-ATCC-MSA-1003.motus
Exploring the result we get:
# Get abundances for genus level
grep "g__" HiFi-ATCC-MSA-1003.motus | grep -v "s__"
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Pseudomonadaceae|g__Pseudomonas 0.0256625687
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Moraxellaceae|g__Acinetobacter 0.0041047739
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia 0.1690642823
k__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhodobacterales|f__Rhodobacteraceae|g__Rhodobacter 0.2692651120
k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Neisseriales|f__Neisseriaceae|g__Neisseria 0.0016740789
k__Bacteria|p__Proteobacteria|c__Epsilonproteobacteria|o__Campylobacterales|f__Helicobacteraceae|g__Helicobacter 0.0019215690
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Clostridiaceae|g__Clostridium 0.0065334779
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Bacillaceae|g__Bacillus 0.0208843590
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Staphylococcaceae|g__Staphylococcus 0.0725363474
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus 0.1647341426
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae|g__Lactobacillus 0.0009095220
k__Bacteria|p__Deinococcus-Thermus|c__Deinococci|o__Deinococcales|f__Deinococcaceae|g__Deinococcus 0.0013104470
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Propionibacteriales|f__Propionibacteriaceae|g__Cutibacterium 0.0035726154
k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas 0.1891746967
# Get abundances starting for mOTU/species level
grep "s__" HiFi-ATCC-MSA-1003.motus
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia coli [ref_mOTU_v3_00095] 0.1690642823
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Pseudomonadaceae|g__Pseudomonas|s__Pseudomonas aeruginosa [ref_mOTU_v3_00201] 0.0256625687
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Moraxellaceae|g__Acinetobacter|s__Acinetobacter baumannii [ref_mOTU_v3_00259] 0.0041047739
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Bacillaceae|g__Bacillus|s__Bacillus sp. [ref_mOTU_v3_00329] 0.0208843590
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Staphylococcaceae|g__Staphylococcus|s__Staphylococcus aureus [ref_mOTU_v3_00340] 0.0053912499
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Staphylococcaceae|g__Staphylococcus|s__Staphylococcus epidermidis [ref_mOTU_v3_00346] 0.0671450975
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Propionibacteriales|f__Propionibacteriaceae|g__Cutibacterium|s__Cutibacterium acnes [ref_mOTU_v3_00800] 0.0035726154
k__Bacteria|p__Proteobacteria|c__Epsilonproteobacteria|o__Campylobacterales|f__Helicobacteraceae|g__Helicobacter|s__Helicobacter pylori [ref_mOTU_v3_00897] 0.0019215690
k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas|s__Porphyromonas gingivalis [ref_mOTU_v3_00985] 0.1891746967
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae|g__Lactobacillus|s__Lactobacillus gasseri [ref_mOTU_v3_01039] 0.0009095220
k__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhodobacterales|f__Rhodobacteraceae|g__Rhodobacter|s__Rhodobacter sphaeroides/johrii [ref_mOTU_v3_01513] 0.2692651120
k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Neisseriales|f__Neisseriaceae|g__Neisseria|s__Neisseria meningitidis [ref_mOTU_v3_01539] 0.0016740789
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus|s__Streptococcus mutans [ref_mOTU_v3_01605] 0.1567639053
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus|s__Streptococcus agalactiae [ref_mOTU_v3_01860] 0.0079702373
k__Bacteria|p__Deinococcus-Thermus|c__Deinococci|o__Deinococcales|f__Deinococcaceae|g__Deinococcus|s__Deinococcus radiodurans [ref_mOTU_v3_02207] 0.0013104470
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Clostridiaceae|g__Clostridium|s__Clostridium beijerinckii [ref_mOTU_v3_03007] 0.0065334779