Skip to content

NeatFreq/Automaton Reference-Free Coverage Reduction Distribution Package

License

Notifications You must be signed in to change notification settings

BioinformaticsArchive/NeatFreq

 
 

Repository files navigation

##################################################################
# README: 
# NeatFreq version 1.0.1 - 7/1/2014
# J. McCorrison (ICS, J. Craig Venter Inst.)
# contact : jmccorri@jcvi.org
# web resources : https://github.com/bioh4x/NeatFreq
##################################################################



# 1.0 - INSTALL ##################################################

Deposit the entire contents of the NeatFreq distributable bundle in your desired install location. This bundle includes additional 1st and 3rd party tools available by open source license, expected in the NeatFreq install's /lib/ directory. 

(Optional) If you would like to remove the requirement to point to this area using the '-N' flag on every execution, update the global variable "$NEATFREQ_INSTALL" at the top of scripts NeatFreq_preprocess.pl, NeatFreq.pl, NeatFreq_postprocess_random.pl and NeatFreq_auto.pl.



# 2.0 - PREPROCESSING ############################################

NeatFreq assumes that only a single organism is present in the input dataset and the presence of a contaminant in any input data set may cause abnormal reduction of the target sample and maximal recruitment of the contaminant.

Targeted bin reduction with "-b align" requires preliminary read correction to remove as many false low frequency kmers as possible before NeatFreq preprocessing.  An example preprocessing pipeline is supplied in the assembly_automaton/ directory of the NeatFreq distributable code base.

High quantities of mate pairs may slow processing during targeted bin selection.  If this occurs, consider processing pairs as fragments, extracting as mate pairs following the process using NeatFreq_postprocess_random.pl.

For run method information, see [NEATFREQ_INSTALL]/assembly_automaton/README_automaton.txt



# 3.0 - REDUCTION ################################################

Reduction can occur by bin selection setting "random" or "targeted".  Random reduction has lower resource requirements but does not include methods for preferential selection of the most unique sequences nor does it force selection of 2-sided mate pair relationships.  The targeted reduction method is intended for extremely variable coverage and should be used cautiously with large datasets.


# 3.1 - AUTOMATIC EXECUTION OF ALL PROCESSING STAGES #############

Auto curate all 3 stages of NeatFreq processing, automatically handling fasta/fastq file types (-q) and maintaining maximal paired end information, even when run as fragment-only (-g).

EXAMPLE USAGE:
NeatFreq_auto.pl -N /install_loc/NeatFreq -z -q -k 19 -p PREFIX -m 1000 -x 10 -b align -f 33 -r fragments.fastq -rpairs pairs.fastq

EXAMPLE USAGE (run reduction without mate metadata, then recruit mates from output):
NeatFreq_auto.pl -N /install_loc/NeatFreq -z -q -k 19 -p PREFIX -m 1000 -x 10 -b align -f 33 -r fragments.fastq -rpairs pairs.fastq -g


# 3.2 - HAND-CURATED EXECUTION IN 2 TO 3 STAGES ##################

##### 3.2.A - Standard Execution

EXAMPLE USAGE (preprocess):
NeatFreq_preprocess.pl -N /install_loc/NeatFreq -q -f 33 -m 19 -frag fragments.fastq -pair pairs.fastq

EXAMPLE USAGE (reduce):
NeatFreq.pl -N /install_loc/NeatFreq -p align_80 -b align -x 80 -r AUTO_FORMAT.frg.fasta -rpairs AUTO_FORMAT.prs.fasta -c allreads.mer_counts.minocc1.FIX.txt -z -m 10000 -v


##### 3.2.B - Using Pairs as Fragments

EXAMPLE USAGE (preprocess):
NeatFreq_preprocess.pl -N /install_loc/NeatFreq -q -f 33 -m 19 -pair pairs.fastq

EXAMPLE USAGE (reduce):
NeatFreq.pl -N /install_loc/NeatFreq -p align_as_fragments_80 -b align -x 80 -r AUTO_FORMAT.frg.fasta -c allreads.mer_counts.minocc1.FIX.txt -z -m 10000 -v

EXAMPLE USAGE (postprocess):
(Calculate number of fragments in AUTO_FORMAT.fasta for use in flag -numfrg):
NeatFreq_postprocess_random.pl -N /install_loc/NeatFreq -ids REDUCED_COVERAGE_ID_LIST.txt -numfrg 500 -frag AUTO_FORMAT.frg.fasta -pair AUTO_FORMAT.prs.fasta

About

NeatFreq/Automaton Reference-Free Coverage Reduction Distribution Package

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 83.1%
  • Ruby 5.6%
  • Perl 3.9%
  • TeX 2.1%
  • Python 1.6%
  • Lua 0.9%
  • Other 2.8%