forked from bcthomas/pullseq
-
Notifications
You must be signed in to change notification settings - Fork 0
Utility program for extracting sequences from a fasta/fastq file by size or by header name
License
ilovefungi/pullseq
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This software includes two applications: pullseq and seqdiff. Pullseq Summary: pullseq - extract sequences from a fasta/fastq file. This program is fast, and can be useful in a variety of situations. You can use it to extract sequences from one fasta/fastq file into a new file, given either a list of header ids to include / exclude or a size minimum / maximum sequence lengths. Additionally, it can convert from fastq to fasta or visa-versa and can change the length of the output sequence lines. NOTE: pullseq prints to standard out, so you need to use redirection (e.g. pullseq input.fasta -m 10 *>* output.fasta ) to create output files. Synopsis: # general extraction with a list of names pullseq --input=<input fasta/fastq file> --names=<fasta header ids file> # general extraction with a minimum size requirement pullseq --input=<input fasta/fastq file> --min=<minimum size sequence to extract> # only sequences with min 200 and max 500 pullseq -i input.fasta -m 200 -a 500 > new.fasta Options: -i, --input, Input fasta/fastq file (required) -n, --names, File of header id names to select -m, --min, Minimum sequence length -a, --max, Maximum sequence length -l, --length, Sequence characters per line (default 50) -c, --convert, Convert input to fastq/fasta (e.g. if input is fastq, output will be fasta) -q, --quality, ASCII code to use for fasta->fastq quality conversions -e, --excluded, Exclude the header id names in the list (-n) -t, --count, Just count the possible output, but don't write it -h, --help, Display this help and exit -v, --verbose, Print extra details during the run --version, Output version information and exit Seqdiff Summary: seqdiff - compare two fasta (or fastq) files to determine overlap of sequences. This overlap can be at the sequence level (are two sequences exactly the same in both files?) or at the header name level (do two sequences contain the same header name between the two files?). Synopsis: seqdiff -1 first_file.fa -2 second_file.fa Usage: seqdiff -1 <first input fasta/fastq file> -2 <second fasta/fastq file> Options: -1, --first, First sequence file (required) -2, --second, Second sequence file (required) -a, --a_output, File name for uniques from first file -b, --b_output, File name for uniques from second file -c, --c_output, File name for common entries -d, --headers, Compare headers instead of sequences (default: false) -s, --summary, Just show summary stats? (default: false) -h, --help, Display this help and exit -v, --verbose, Print extra details during the run --version, Output version information and exit REQUIREMENTS: Pullseq/Seqdiff require a C compiler and has been tested to work with either GCC or clang. They also require (and include) kseq.h (Heng Li) and uthash.h (http://troydhanson.github.com/uthash/). kseq.h also requires Zlib (so your linker should be able to handle the '-lz' option). You can obtain zlib from http://www.zlib.net/ or commonly from your OS package manager (e.g. apt-get zlib or emerge zlib). INSTALL: To install, do the following in a shell on your system... From Git: git clone https://github.com/bcthomas/pullseq.git # checkout the code using git cd pullseq ./bootstrap # get set up for config/build after cloning ./configure # configure the application based on your system make # will build the application make install # will install in /usr/local by default From a Tar file: tar xvf pullseq_version.tar.gz cd pullseq_version ./configure # configure the application based on your system make # will build the application make install # will install in /usr/local by default
About
Utility program for extracting sequences from a fasta/fastq file by size or by header name
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published