Skip to content

the-overengineer/bioinf-bwt

Repository files navigation

Dynamic Suffix Array Implementation

This repository contains an implementation of the dynamic suffix array described in the paper A four-stage algorithm for updating a Burrows-Wheeler transform by Salson et al. It was implemented as a project assignment for our Bioinformatics course.

Dependencies

We are using Gerlach's implentation of Mäkinen and Navarro’s dynamic structures. This allows us to perform our rankc, insert and delete operations in O(log n log σ), where n is the length of the text and σ is the size of the alphabet Σ. These structures also include a sampler, which we are using for (more) efficient computation of ISA values at a given index.

Relevant files

The majority of our implementation is contained within the header file DynamicSuffixArray.h and its implementation DynamicSuffixArray.cpp. The class, variable and method comments are all contained within the header file. The comments inside the C++ file only explain the implementation details or reference the paper.

The rest of the C++ files are unchanged or mostly unchanged, having required at most a few (re)definitions to satisfy our compilers.

The building process is managed by the Makefile. It manages the building process, as well as the compiler options and flags. The building can simply be started by running

make

or

make all

Running the examples

For testing purposes we implemented unit tests using Catch automated test framework. Tests are grouped into categories that can be run independently. To start all tests use:

./test

To start all tests with time monitoring options use:

./test -d yes

To print all possible options use:

./test -?

Test results

Action Length Time[s] Memory[MB]
Calculate BWT 1031 0.003357 1
Calculate BWT 5385 0.028451 1
Calculate BWT 35970 0.176832 1
Calculate BWT 189561 1.214295 2
Calculate BWT 2494862 21.097746 9
Adding string 512 into 1032 0.001728 1
Deleting string 512 from 1032 0.001916 1
Deleting string 512 from 35970 0.002748 2
Adding string 1413 into 189561 0.010186 2
Deleting string 42000 from 189561 0.0203301 2
Deleting string 126183 from 700918 0.738056 4
Adding string 126183 into 700918 0.902695 4

About

Dynamic suffix array implementation for Bioinformatics course project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages