Automated workflow for OTU classification using shotgun data from Illumina Miseq fastq(s). It follows the Mothur MiSeq SOP except for the initial quality control steps. High-quality read length and abundance have been demonstrated to be primary factors in avoiding spurious and/or inflated OTU classification, so special emphasis was placed on those steps. It also attempts to refine freshwater OTU classifications using TaxAss.
- Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencingi 2013
- Accuracy of microbial community diversity estimated by closed- and open-reference OTUs 2017
It leverages third party tools and databases:
- MeFit 2016: a merging and filtering tool for Illumina paired-end reads, designed specifically for 16S rRNA amplicaon sequencing data
- Mothur 2009: software for describing and comparing microbial communities
- Silva reference files v128 2016: 16S rRNA seed database and sequence/taxonomy references
- TaxAss 2017: fine-scale taxonomic assignment for freshwater datasets (by default)
To be able to build CASPER, you'll need to make sure the g++ compiler and boost libraries are installed
- Clone the repository
$ git clone https://github.com/jordangumm/omics_16s.git
- Build the local conda environment with dependencies
$ cd omics_16s $./build.sh
- Activate the root environment
$source <path_to_project>/dependencies/miniconda/bin/activate
Use the runner script against a sequencing run directory of fastq(s)
$python runner.py <run_dp>
You can also run on flux!
$python runner.py -q <queue> --flux <run_dp> --account <flux_account_name>