########################### # MOCAT README FILE # ########################### Content: - Note - Installation - Requirements - Running on SGE or PBS - Guides and examples - Quick Guide ========== Note ========== This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. ================== INSTALLATION ================== To setup MOCAT, simply (from within the MOCAT base directory. The MOCAT base directory is the direcotry containing this file) 1. run: ./install_MOCAT.pl 2. run: source ~/.bashrc; source ~/.bash_profile 3. You're now ready to run MOCAT.pl and get started! 4. Check out the section QUICK GUIDE below how to setup a new project! This step will download and install required software and databases. The largest file to download is the custom made human genome 19 database, which has a download size of 1GB and will take 2-3 hours to setup and build. If you wish to download the simulated metagenome, the size of these files are 3.5GB in total. After running this step, all you need to do is start a new session of UNIX (to load needed variables), and then you should be able to execute the main script MOCAT.pl from within any direcotry. ADDITIONAL NOTE FOR USING USEARCH AND METAGENEMARK! If you want to run gene prediction using MetaGeneMark, or screen a custom fasta file for contaminants, you have to download and extract MetaGeneMark and Usearch, respectively, into the /ext/metagenemark and /ext/usearch directories respectively. You can download MetaGeneMark from http://exon.gatech.edu/GeneMark/license_download.cgi And download Usearch from Usearch from http://www.drive5.com/usearch/nonprofit_form.html. NOTE: MOCAT supports Usearch v5 and v6. We recommend v5, because the memory restrictions for 32bit systems affect input files to a higher degree in version 6. This additional step is required because you have to get a custom licence. IMPORTANT! To be able to reproduce the results made in the article, you need to download both MetaGeneMark and Usearch. ================== Requirements ================== To run MOCAT you need: - UNIX 64 bit system - Perl 5.8.8+ - R (www.r-project.org) We also recommend: - SGE or PBS queuing system Required CPUs and hard disk space varies greatly on your needs, but we recommend a hard disk of 500+ GB and a 8 core CPU, preferably a UNIX cluster with much more CPUs and hard disk space. ================================ Running on SGE, PBS or LSF ================================ FAQ: Q1. Why doesn't my jobs start? A1: Try different parallel environment settings, see below. Also, procesing steps requires 1-8 CPUs depending on the step. If you have less that 8 CPUs available your job will not start. You can change this by forcing a lower number of CPUs to be used with the -p flag when executing MOCAT. MOCAT has been designed to run on SGE primarily. The system has also been tested and run on PBS and LSF. However, the setup of these queuing system is often very different and it is difficult to design a system that will work for all permutations of setups of these queuing system. We have designed the system to have as few user settings as possible and currently, the only changable options are: MOCAT_qsub_system : SGE [SGE,PBS,LSF,none] MOCAT_SGE_qsub_add_param : [-l mem_free=6G -l h_vmem=6G] MOCAT_PBS_qsub_add_param : [-l select=mem=6gb] MOCAT_LSF_qsub_add_param : [-l select=mem=6gb] MOCAT_LSF_queue : [] MOCAT_LSF_memory_limit : [] MOCAT_SGE_parallell_env : smp [smp,mpi,make,qstate,-or other setting on your system-] And these can be changed in the CONFIG file. Specifically YOU MAY MOST LIKELY HAVE TO CHANGE THE MOCAT_SGE_parallell_env setting (which only affects SGE systems), this is done during the installation. The additional paremeter can be specified if your queuing system has specific limitations or other constraints that need to be addressed when submitting jobs. To change number of CPUs required you can use the -p option when running MOCAT. By default different processing steps requires between 1 and 8 CPUs and memory usage depends on the size of your metagenomes. If you have trouble you can also create all the files and submit the jobs manually by specifying the -x option when running MOCAT. ========================= Guides and Examples ========================= The MOCAT_user_manual.pdf is the user manual in PDF. If you chose to download the datasets in the article, they will be located in the /article_datasets folder ================= QUICK GUIDE ================= After installation, to run MOCAT, you can execute MOCAT.pl from any directory. You can run the example files in the /example-folder There are two shell scripts that you can execute directly. QUICK NOTES! 1. You need to copy the MOCAT.cfg config file to the project folder! - Check that this file is correct for you specific needs! 2. You need a sample file containing names of folders, in which you have the FastQ files For a complete and extensive manual how to use MOCAT, execute 'MOCAT.pl -man' A new project should have the following structure (if pair-end reads): CONFIG and SAMPLE file: | PROJECT_DIR | PROJECT_DIR/MOCAT.cfg | PROJECT_DIR/sample_file FASTQ files: | PROJECT_DIR/SAMPLE_1 | PROJECT_DIR/SAMPLE_1/sample_file_lane1.1.fq.gz | PROJECT_DIR/SAMPLE_1/sample_file_lane1.2.fq.gz | PROJECT_DIR/SAMPLE_1/sample_file_lane2.1.fq.gz | PROJECT_DIR/SAMPLE_1/sample_file_lane2.2.fq.gz | ... IMPORTANT NOTE FOR PAIRED-END READ SAMPLE FILES: The sample files must end with .1.fq or .1.fq.gz for the 1st pair of paired end reads, and .2.fq or .2.fq.gz for the second paired end reads. You can of course have as many samples, and lanes per samples as you want. IF THE READS ARE SINGLE END: Th sample files have to end with .fq or .dq.gz. THE CONFIG FILE: The MOCAT.cfg file is required in the base project direcotry. This file can be copied from this folder. THE SAMPLE FILE: The sample_file can be any file (anywhere), containing the names of the samples (subfolders) to analyze, and is specified when running MOCAT.pl each time. The structure of the sample file is: | SAMPLE_1 | SAMPLE_2 | ... - end of file -