Multi-sample de novo assembly and variant calling using de bruijn graphs. Variant calling with and without a reference genome. Between closely related samples or highly diverged ones. From bacterial to mammalian genomes. Minimal configuration. And it's free.
Isaac Turner's experimental rewrite of cortex_var, to handle larger populations with better genome assembly. PhD supervisor: Prof Gil McVean. Collaborators: Zam Iqbal, Kiran Garimella. Based at the Wellcome Trust Centre for Human Genetics, University of Oxford.
Note: Currently under development. Expect bugs, fixes and vague documentation until we hit our first release in the next month. Feel free to try out McCortex and watch this space for the release. An announcement will be made on the cortex mailing list.
30 May 2014
Compiles with clang and gcc. Tested on Mac OS X and linux. Requires zlib. The first compile will take a while since the libraries in libs/ need to be downloaded and compiled.
To compile for a maximum kmer size of 31:
make
to compile for a maximum kmer size of 63:
make MAXK=63
Executables appear in the bin/
directory. To update the libraries included:
cd libs && make
usage: ctx31 <command> [options] <args>
version: ctx=XXXX zlib=1.2.5 htslib=0.2.0-rc7-74-g996b3c0 ASSERTS=ON CHECKS=ON k=3..31
Commands: breakpoints use a trusted assembled genome to call large events
bubbles find bubbles in graph which are potential variants
build construct cortex graph from FASTA/FASTQ/BAM
check load and check graph (.ctx) and path (.ctp) files
clean clean errors from a graph
contigs pull out contigs for a sample
correct error correct reads
coverage print contig coverage
inferedges infer graph edges between kmers before calling `thread`
join combine graphs, filter graph intersections
pjoin merge path files (.ctp)
place place variants against a reference
pview view read threading information
reads filter reads against a graph
rmsubstr reduce set of strings to remove substrings
subgraph filter a subgraph using seed kmers
supernodes pull out supernodes
thread thread reads through cleaned graph
unique remove duplicated bubbles, produce VCF
view view and check a cortex graph file (.ctx)
Type a command with no arguments to see help.
Common Options:
-m --memory <M> Memory e.g. 1GB [default: 1GB]
-n --nkmers <H> Hash entries [default: 4M, ~4 million]
-c --ncols <C> Number of graph colours to load at once [default: 1]
-a --asyncio <A> Limit on file reading threads [default: 4]
-t --threads <T> Limit on proccessing threads [default: 2]
-o --out <file> Output file
-p --paths <in.ctp> Assembly file to load (can specify multiple times)
Type a command with no arguments to see usage. The following may also be useful:
- HipChat
- wiki
- website
- mailing list
- Report a bug
- Email me: Isaac Turner turner.isaac@gmail.com
Issues can be submitted on github. Pull requests welcome. Please add your name to the AUTHORS file.
Code should compile on mac/linux with clang/gcc without errors or warnings.
Code is organised as:
- libs/ included library code from other projects / third party code
- src/basic files that do not depend on MAX_KMER_SIZE
- src/kmer files that need recompiling based on different MAX_KMER_SIZE
- src/tools complex operations performed on the graph
- src/commands one file per cortex command ctx_COMMAND
- src/main files with a main function go in here
Files only link to files that are above them in the list above. E.g. src/kmer/* files only include files in src/kmer/, src/basic/ and libs/.
Static analysis can be run with cppcheck:
cppcheck src
or with clang:
rm -rf bin/ctx31
scan-build make RECOMPILE=1
Occasionally we also run Coverity Scan:
Bundled libraries may have different licenses:
- GNU Science Library (GPL)
- CityHash (MIT)
- lookup3 (Public Domain)
- htslib (MIT)
- bcftools (MIT)
- vcflib (MIT)
- seq_file (Public Domain)
- string_buffer (Public Domain)
- BitArray (Public Domain)
- msg-pool (Public Domain)
- seq-align (Public Domain)
Used in testing:
- bioinf-perl (Public Domain)
'Cortex with low memory and read threading' is currently unpublished. Please cite previous cortex_var papers:
- De novo assembly and genotyping of variants using colored de Bruijn graphs, Iqbal(), Caccamo(), Turner, Flicek, McVean (Nature Genetics) (2012) (doi:10.1038/ng.1028)
- High-throughput microbial population genomics using the Cortex variation assembler, Iqbal, Turner, McVean (Bioinformatics) (Nov 2012) (doi:10.1093/bioinformatics/bts673)