FMindex implementation using wavelet trees

Goals

implement the simplest FM-index using just basic data structures
implement FM-index using wavelet trees

How to get source

git clone --recursive https://github.com/iborko/fmindex

Build instructions

run make
executables will be located in bin folder (bin/fmindex, bin/test)

Usage

Binary will be located in bin folder (fmindex). Usage:
fmindex <sequence> <reads> [<occurrence_table> [<bucket_size>]]

<sequence> - path to the sequence on which the search will be made, FASTA format
<reads> - path to the reads that will be searched, FASTQ format
<occurrence_table> - can be 0 (matrix occurrence table), 1 (wavelet tree based occurrence table)
<bucket_size> - bucket size for bit string rank in wavelet tree, optional, default is 20

Example:
fmindex Esch_coli_536.fna Esch_coli_536_reads.fq 1 40

For every read from <reads> program generates two lines. First line is the FASTQ header and the second line is the list of all position indices of the current read in <sequence>. Indices are splitted with whitespaces.

Script test_run.sh can be used to run program on one of the example sequence. Example sequences are located in the test_data folder.

Example of run:
test_run.sh test_data/Esch_coli_536.fna
This script will generate sets of (1000, 5000, 10000, 50000, 100000, 500000, 1000000) reads and run the fmindex on them, measuring CPU time and memory using time program.

References

1 Jochen Singer: A Wavelet Tree Based FM-Index for Biological Sequences in SeqAn, link

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
doc		doc
include		include
sandbox		sandbox
src		src
test_data		test_data
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
test_generate.sh		test_generate.sh
test_run.sh		test_run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

include

include

sandbox

sandbox

src

src

test_data

test_data

.gitignore

.gitignore

.gitmodules

.gitmodules

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

test_generate.sh

test_generate.sh

test_run.sh

test_run.sh

Repository files navigation

FMindex implementation using wavelet trees

Goals

How to get source

Build instructions

Usage

References

About

Releases

Packages

Contributors 2

Languages

License

iborko/fmindex

Folders and files

Latest commit

History

Repository files navigation

FMindex implementation using wavelet trees

Goals

How to get source

Build instructions

Usage

References

About

Resources

License

Stars

Watchers

Forks

Languages