Pandora's Toolbox

License – GNU GPL3

Condition for use

Please do NOT cite Pandora's Toolbox. Instead cite papers from individual authors, whose programs you are using. Those links are given below.

Description

Pandora’s Toolbox is a collection of source codes of well-known bioinformatics programs all at the same place. You can now run ‘git clone’ and then ‘make’ once in the main folder. That should compile everything (not there yet) and collect executables in the ‘bin’ folder.

Previously, every time we tried to build them, we faced the problem of having to collect the source codes from various websites. Then each program had to be compiled individually, and some of those using boost were nightmares to compile. Also, many programs have interdependencies and we ended up having five copies of different versions of BWA or samtools codes.

Our collection reduces those dependencies and makes life easier for those working on different bioinformatics programs. The collection includes a boost_1_55_0 folder to remove external boost dependencies. We are working on sorting out all bwa and other inter-dependencies.

The following programs are included in the current version. Please DO NOT cite us, but cite the authors of individual programs.

KMC2 ========

KMC2 is an efficient kmer-counter that does not require significant RAM. It is disk-based and uses a minimizer-like method to partition the reads.

Deorowicz, S., Kokot, M., Grabowski, Sz., Debudaj-Grabysz, A., KMC 2: Fast and resource-frugal k-mer counting, Bioinformatics, 2015.
Deorowicz, S., Debudaj-Grabysz, A., Grabowski, Sz., Disk-based k-mer counting on a PC, BMC Bioinformatics, 2013; 14():Article no. 160.
BWA and BWA-MEM ============================

BWA-MEM searches for given short and long reads within an existing sequence or collection of sequences. For example, it can be used to find matches of millions of short reads in the human genome.

Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Heng Li (2013).
DALIGNER ============

DALIGNER finds overlaps between long noisy reads with extensive amount of insertion-deletion errors.

Eugene Myers, WABI 2014 (Workshop on Algorithms in Bioinformatics), Sept 8-10, Wroclaw, Poland.

HMMER3 ==========

HMMER is a sensitive nucleotide and protein search program. It uses hidden Markov model.

Accelerated profile HMM searches. S. R. Eddy. PLoS Comp. Biol., 7:e1002195, 2011.
bcalm – de Bruijn graph compressor =====================================

Splitting the short reads into k-mers and building the de Bruijn graph structure is usually the first step of assembly for de Bruijn graph-based algorithms. This step is often very memory intensive. A compressed de Bruijn graph combines contiguous k-mers into longer sequences and can be held with less memory. BCALM is a memory-efficient program to generate compressed de Bruijn graph from a collection of k-mers.

On the representation of de Bruijn graphs, Rayan Chikhi et al. (2014).
Minia – assembler =====================

Minia is a contig assembler for short reads. It uses very low amount of memory, such as less than 4GB to assemble the human genome.

R. Chikhi, G. Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter, WABI 2012.
K. Salikhov, G. Sacomoto and G. Kucherov. Using cascading Bloom filters to improve the memory usage for de Brujin graphs, WABI 2013.
samtools – NGS data handling ===============================

Samtools is useful for processing short-read alignment data.

https://github.com/samtools

SOAPdenovo2 – genome assembler =================================

SOAPdenovo uses de Bruijn graph-based algorithms to assemble large eukaryotic genomes from short read libraries.

De novo assembly of human genomes with massively parallel short read sequencing. R. Li et al. Genome Res 20, 265-72 (2010).
SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler R. Luo et al. GIGAscience (2012).
SOAPdenovo-trans – transcriptome assembler =============================================

SOAPdenovo-trans assembles short reads of RNAseq experiments into transcripts.

SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads, Yinlong Xie et al. (2014).
SPAdes – genome assembler =============================

SPAdes, developed by Algorithmic Biology lab in Saint Petersburg, Russsia is among the most efficient and versatile genome assemblers that uses de Bruijn graphs. SPAdes was originally designed to assemble single-celled bacterial genomes, but it appears to work well for multi-cell data, as well as small to mid-sized eukaryotic genomes.

Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander V. Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, and Pavel A. Pevzner. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology 19(5) (2012), 455-477. doi:10.1089/cmb.2012.0021
Trinity – transcriptome assembler =====================================

Trinity is a stand alone de novo transcriptome assembler that uses de Bruijn graph-based algorithms.

Full-length transcriptome assembly from RNA-Seq data without a reference genome, Manfred G Grabherr et al. Nature Biotechnology 29, 644.652 (2011)
sailfish – RNAseq expression analysis ==========================================

Sailfish is a lightweight program for quantifying the abundance of previously annotated RNA isoforms in RNAseq data.

RAPsearch2 searches for matches to a protein sequences in a database of sequences. It is 80x faster than BLAST without significant loss of quality.

Tophat and Cufflinks process RNAseq reads aligned onto a reference genome and resolve the intron-exon junctions.

TopHat: discovering splice junctions with RNA-Seq Cole Trapnell, Lior Pachter and Steven L. Salzberg Bioinformatics. 2009
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks Cole Trapnell et al. Nature Protocols 7, 562.578 (2012).

About the name

We named the modules after Symbion pandora, an unusual animal discovered by Danish researchers Peter Funch and R. M. Christensen in 1995. Many thanks to professor Funch for sharing the SEM image at the top of this post.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
boost_1_55_0		boost_1_55_0
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

boost_1_55_0

boost_1_55_0

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

makefile

makefile

Repository files navigation

Pandora's Toolbox

Condition for use

Description

About the name

About

Releases

Packages

Languages

License

fw1121/Pandoras-Toolbox-for-Bioinformatics

Folders and files

Latest commit

History

Repository files navigation

Pandora's Toolbox

Condition for use

Description

About the name

About

Resources

License

Stars

Watchers

Forks

Languages