###########################
#    MOCAT README FILE    #
###########################


Content:
- Note
- Installation
- Requirements
- Running on SGE or PBS
- Guides and examples
- Quick Guide


==========
   Note
==========
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.


==================
   INSTALLATION
==================
To setup MOCAT, simply

(from within the MOCAT base directory.
The MOCAT base directory is the direcotry
containing this file)

1. run: ./install_MOCAT.pl
2. run: source ~/.bashrc; source ~/.bash_profile

3. You're now ready to run MOCAT.pl and get started!
4. Check out the section QUICK GUIDE below how to setup a new project!

This step will download and install required
software and databases. The largest file to
download is the custom made human genome 19 database,
which has a download size of 1GB and will take
2-3 hours to setup and build.

If you wish to download the simulated metagenome, the size
of these files are 3.5GB in total.

After running this step, all you need to do is
start a new session of UNIX (to load needed variables),
and then you should be able to execute the main
script MOCAT.pl from within any direcotry.

ADDITIONAL NOTE FOR USING USEARCH AND METAGENEMARK!
If you want to run gene prediction using MetaGeneMark, or screen
a custom fasta file for contaminants, you have to download
and extract MetaGeneMark and Usearch, respectively, into the
/ext/metagenemark and /ext/usearch directories respectively.

You can download MetaGeneMark from
http://exon.gatech.edu/GeneMark/license_download.cgi

And download Usearch from
Usearch from http://www.drive5.com/usearch/nonprofit_form.html.

NOTE: MOCAT supports Usearch v5 and v6. We recommend v5, because
the memory restrictions for 32bit systems affect input files to
a higher degree in version 6.


This additional step is required because
you have to get a custom licence.

IMPORTANT!
To be able to reproduce the results made in the article,
you need to download both MetaGeneMark and Usearch.


==================
   Requirements
==================
To run MOCAT you need:
- UNIX 64 bit system
- Perl 5.8.8+
- R (www.r-project.org)

We also recommend:
- SGE or PBS queuing system

Required CPUs and hard disk space varies
greatly on your needs, but we recommend
a hard disk of 500+ GB and a 8 core CPU,
preferably a UNIX cluster with much more
CPUs and hard disk space.


================================
   Running on SGE, PBS or LSF
================================

FAQ:
Q1. Why doesn't my jobs start?
A1: Try different parallel environment settings, see below.
Also, procesing steps requires 1-8 CPUs depending on the step.
If you have less that 8 CPUs available your job will not start.
You can change this by forcing a lower number of CPUs to be used
with the -p flag when executing MOCAT.

MOCAT has been designed to run on SGE primarily.
The system has also been tested and run on PBS and LSF.

However, the setup of these queuing system is often very
different and it is difficult to design a system that will work for all
permutations of setups of these queuing system.

We have designed the system to have as few user settings
as possible and currently, the only changable options are:
MOCAT_qsub_system        : SGE [SGE,PBS,LSF,none]
MOCAT_SGE_qsub_add_param : [-l mem_free=6G -l h_vmem=6G] 
MOCAT_PBS_qsub_add_param : [-l select=mem=6gb]
MOCAT_LSF_qsub_add_param : [-l select=mem=6gb]
MOCAT_LSF_queue          : []
MOCAT_LSF_memory_limit   : []
MOCAT_SGE_parallell_env  : smp [smp,mpi,make,qstate,-or other setting on your system-]

And these can be changed in the CONFIG file.
Specifically YOU MAY MOST LIKELY HAVE TO CHANGE THE
MOCAT_SGE_parallell_env setting (which only affects SGE systems),
this is done during the installation.

The additional paremeter can be specified if your queuing system has
specific limitations or other constraints that need to be addressed when submitting jobs.

To change number of CPUs required you can use the -p option when running MOCAT.
By default different processing steps requires between 1 and 8 CPUs and memory usage depends
on the size of your metagenomes.

If you have trouble you can also create all the files and submit the jobs manually by
specifying the -x option when running MOCAT.


=========================
   Guides and Examples
=========================
The MOCAT_user_manual.pdf is the user manual in PDF.

If you chose to download the datasets in the article,
they will be located in the /article_datasets folder


=================
   QUICK GUIDE
=================
After installation, to run MOCAT, you can
execute MOCAT.pl from any directory.

You can run the example files in the /example-folder
There are two shell scripts that you can execute directly.

QUICK NOTES!
1. You need to copy the MOCAT.cfg config file to the project folder!
    - Check that this file is correct for you specific needs!
2. You need a sample file containing names of folders, in which you
have the FastQ files

For a complete and extensive manual
how to use MOCAT, execute 'MOCAT.pl -man'

A new project should have the following
structure (if pair-end reads):


CONFIG and SAMPLE file:
| PROJECT_DIR
| PROJECT_DIR/MOCAT.cfg
| PROJECT_DIR/sample_file

FASTQ files:
| PROJECT_DIR/SAMPLE_1
| PROJECT_DIR/SAMPLE_1/sample_file_lane1.1.fq.gz
| PROJECT_DIR/SAMPLE_1/sample_file_lane1.2.fq.gz
| PROJECT_DIR/SAMPLE_1/sample_file_lane2.1.fq.gz
| PROJECT_DIR/SAMPLE_1/sample_file_lane2.2.fq.gz
| ...

IMPORTANT NOTE FOR PAIRED-END READ SAMPLE FILES:
The sample files must end with .1.fq or .1.fq.gz
for the 1st pair of paired end reads, and .2.fq or
.2.fq.gz for the second paired end reads.
You can of course have as many samples, and lanes per
samples as you want. 

IF THE READS ARE SINGLE END:
Th sample files have to end with .fq or .dq.gz.

THE CONFIG FILE:
The MOCAT.cfg file is required in the base project
direcotry. This file can be copied from this folder.

THE SAMPLE FILE:
The sample_file can be any file (anywhere),
containing the names of the samples (subfolders) to analyze,
and is specified when running MOCAT.pl each time.
The structure of the sample file is:

| SAMPLE_1
| SAMPLE_2
| ...


- end of file -