hisat2

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). Based on an extension of BWT for a graph [1], we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents general population, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover human population). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of sequencing reads. This new indexing scheme is called Hierarchical Graph FM index (HGFM). We have developed HISAT2 based on the HISAT [2] and Bowtie 2 [3] implementations. See the HISAT2 website at ccb.jhu.edu/software/hisat2.

A few notes:

HISAT2's index (HGFM) size for the human reference genome and 12.3 million common SNPs is 6.2GB. The SNPs consist of 11 million single nucleotide polymorphisms, 728,000 deletions, and 555,000 insertions. Insertions and deletions used in this index are small (usually <20bp). We plan to incorporate structural variations (SV) into this index.
HISAT2 also allows for mapping reads directly against transcriptome, similar to that of TopHat2.
The memory footprint of HISAT2 is relatively low, 6.7GB.
The runtime of HISAT2 is estimated to be slightly slower than HISAT (30–100% slower for some data sets).
HISAT2 provides greater accuracy for alignment of reads containing SNPs.
We released a first (beta) version of HISAT2 in September 8, 2015.

References:

[1] Sirén J, Välimäki N, Mäkinen V (2014) Indexing graphs for path queries with applications in genome research. IEEE/ACM Transactions on Computational Biology and Bioinformatics 11: 375–388. doi: 10.1109/tcbb.2013.2297101

[2] Kim D, Langmead B, and Salzberg SL HISAT: a fast spliced aligner with low memory requirements, Nature methods, 2015

[3] Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9:357-359

Name		Name	Last commit message	Last commit date
Latest commit History 576 Commits
doc		doc
evaluation		evaluation
example		example
hisat2.xcodeproj		hisat2.xcodeproj
hisat2_test		hisat2_test
li_hla		li_hla
scripts		scripts
third_party		third_party
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
MANUAL		MANUAL
MANUAL.markdown		MANUAL.markdown
Makefile		Makefile
NEWS		NEWS
README.md		README.md
TUTORIAL		TUTORIAL
VERSION		VERSION
aligner_bt.cpp		aligner_bt.cpp
aligner_bt.h		aligner_bt.h
aligner_cache.cpp		aligner_cache.cpp
aligner_cache.h		aligner_cache.h
aligner_driver.cpp		aligner_driver.cpp
aligner_driver.h		aligner_driver.h
aligner_metrics.h		aligner_metrics.h
aligner_report.h		aligner_report.h
aligner_result.cpp		aligner_result.cpp
aligner_result.h		aligner_result.h
aligner_seed.cpp		aligner_seed.cpp
aligner_seed.h		aligner_seed.h
aligner_seed2.cpp		aligner_seed2.cpp
aligner_seed2.h		aligner_seed2.h
aligner_seed_policy.cpp		aligner_seed_policy.cpp
aligner_seed_policy.h		aligner_seed_policy.h
aligner_sw.cpp		aligner_sw.cpp
aligner_sw.h		aligner_sw.h
aligner_sw_common.h		aligner_sw_common.h
aligner_sw_driver.cpp		aligner_sw_driver.cpp
aligner_sw_driver.h		aligner_sw_driver.h
aligner_sw_nuc.h		aligner_sw_nuc.h
aligner_swsse.cpp		aligner_swsse.cpp
aligner_swsse.h		aligner_swsse.h
aligner_swsse_ee_i16.cpp		aligner_swsse_ee_i16.cpp
aligner_swsse_ee_u8.cpp		aligner_swsse_ee_u8.cpp
aligner_swsse_loc_i16.cpp		aligner_swsse_loc_i16.cpp
aligner_swsse_loc_u8.cpp		aligner_swsse_loc_u8.cpp
aln_sink.cpp		aln_sink.cpp
aln_sink.h		aln_sink.h
alphabet.cpp		alphabet.cpp
alphabet.h		alphabet.h
alt.h		alt.h
assert_helpers.h		assert_helpers.h
banded.cpp		banded.cpp
banded.h		banded.h
binary_sa_search.h		binary_sa_search.h
bitpack.h		bitpack.h
blockwise_sa.h		blockwise_sa.h
bp_aligner.h		bp_aligner.h
btypes.h		btypes.h
ccnt_lut.cpp		ccnt_lut.cpp
classifier_li.h		classifier_li.h
diff_sample.cpp		diff_sample.cpp
diff_sample.h		diff_sample.h
dp_framer.cpp		dp_framer.cpp
dp_framer.h		dp_framer.h
ds.cpp		ds.cpp
ds.h		ds.h
edit.cpp		edit.cpp
edit.h		edit.h
endian_swap.h		endian_swap.h
extract_exons.py		extract_exons.py
extract_splice_sites.py		extract_splice_sites.py
fast_mutex.h		fast_mutex.h
filebuf.h		filebuf.h
formats.h		formats.h
gbwt_graph.h		gbwt_graph.h
gfm.cpp		gfm.cpp
gfm.h		gfm.h
group_walk.cpp		group_walk.cpp
group_walk.h		group_walk.h
hgfm.h		hgfm.h
hi_aligner.h		hi_aligner.h
hier_idx_common.h		hier_idx_common.h
hisat2		hisat2
hisat2-build		hisat2-build
hisat2-inspect		hisat2-inspect
hisat2.cpp		hisat2.cpp
hisat2_build.cpp		hisat2_build.cpp
hisat2_build_main.cpp		hisat2_build_main.cpp
hisat2_extract_HLA_vars.py		hisat2_extract_HLA_vars.py
hisat2_extract_exons.py		hisat2_extract_exons.py
hisat2_extract_snps_haplotypes_UCSC.py		hisat2_extract_snps_haplotypes_UCSC.py
hisat2_extract_snps_haplotypes_VCF.py		hisat2_extract_snps_haplotypes_VCF.py
hisat2_extract_splice_sites.py		hisat2_extract_splice_sites.py
hisat2_inspect.cpp		hisat2_inspect.cpp
hisat2_main.cpp		hisat2_main.cpp
hisat2_simulate_reads.py		hisat2_simulate_reads.py
hisat2_test_BRCA_genotyping.py		hisat2_test_BRCA_genotyping.py
hisat2_test_HLA_genotyping.py		hisat2_test_HLA_genotyping.py

License

cooleel/hisat2

Folders and files

Latest commit

History