Multi-sample de novo assembly and variant calling using de bruijn graphs. Variant calling with and without a reference genome. Between closely related samples or highly diverged ones. From bacterial to mammalian genomes. Minimal configuration. And it's free.
Isaac Turner's experimental rewrite of cortex_var, to handle larger populations with better genome assembly. PhD supervisor: Prof Gil McVean. Collaborators: Zam Iqbal, Kiran Garimella. Based at the Wellcome Trust Centre for Human Genetics, University of Oxford.
Note: Currently under development. Expect bugs, fixes and vague documentation until we hit our first release. Feel free to try out McCortex and watch this space for the release. An announcement will be made on the cortex mailing list.
10 August 2015
Branch | Status |
---|---|
master: | |
develop: | |
code analysis: |
McCortex compiles with clang and gcc. Tested on Mac OS X and linux. Requires zlib. Download with:
git clone --recursive https://github.com/mcveanlab/mccortex
To compile for a maximum kmer size of 31:
make
to compile for a maximum kmer size of 63:
make MAXK=63
Executables appear in the bin/
directory.
usage: mccortex31 <command> [options] <args>
version: ctx=XXXX zlib=1.2.5 htslib=1.2.1 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31
Commands: breakpoints use a trusted assembled genome to call large events
bubbles find bubbles in graph which are potential variants
build construct cortex graph from FASTA/FASTQ/BAM
calls2vcf convert bubble/breakpoint calls to VCF
check load and check graph (.ctx) and path (.ctp) files
clean clean errors from a graph
contigs assemble contigs for a sample
correct error correct reads
coverage print contig coverage
index index a sorted cortex graph file
inferedges infer graph edges between kmers before calling `thread`
join combine graphs, filter graph intersections
links clean and plot link files (.ctp)
pjoin merge path files (.ctp)
popbubbles Pop bubbles in the population graph
pview text view of a cortex path file (.ctp)
reads filter reads against a graph
rmsubstr reduce set of strings to remove substrings
sort sort the kmers in a graph file
subgraph filter a subgraph using seed kmers
thread thread reads through cleaned graph
uniqkmers generate random unique kmers
unitigs pull out unitigs in FASTA, DOT or GFA format
view text view of a cortex graph file (.ctx)
Type a command with no arguments to see help.
Common Options:
-h, --help Help message
-q, --quiet Silence status output normally printed to STDERR
-f, --force Overwrite output files if they already exist
-m, --memory <M> Memory e.g. 1GB [default: 1GB]
-n, --nkmers <H> Hash entries [default: 4M, ~4 million]
-t, --threads <T> Limit on proccessing threads [default: 2]
-o, --out <file> Output file
-p, --paths <in.ctp> Assembly file to load (can specify multiple times)
Type a command with no arguments to see usage. The following may also be useful:
- wiki
- website
- mailing list
- Report a bug / feature request on GitHub
- Email me: Isaac Turner turner.isaac@gmail.com
Live chat (email me to fix a time):
- HipChat to instant message -- please email me first to arrange a time
Issues can be submitted on github. Pull requests welcome. Please add your name to the AUTHORS file. Code should compile on mac/linux with clang/gcc without errors or warnings.
More on the wiki
Unit tests are run with make test
and integration tests with cd tests; ./run
. Both of these test suites are run automatically with Travis CI when commits are pushed to GitHub.
Static analysis can be run with cppcheck:
cppcheck src
or with clang:
rm -rf bin/mccortex31
scan-build make RECOMPILE=1
Occasionally we also run Coverity Scan. This is done by pushing to the coverity_scan
branch on github, which triggers Travis CI to upload the latest code to Coverity.
Bundled libraries may have different licenses:
- BitArray (Public Domain)
- cJSON (MIT)
- CityHash (MIT)
- htslib (MIT)
- lookup3 (Public Domain)
- madcrowlib (MIT)
- msg-pool (Public Domain)
- seq-align (Public Domain)
- seq_file (Public Domain)
- sort_r (Public Domain)
- string_buffer (Public Domain)
- xxHash (BSD)
Used in testing:
- bcftools (MIT)
- bioinf-perl (Public Domain)
- bwa (MIT)
- readsim (Public Domain)
- samtools (MIT)
'Cortex with low memory and read threading' is currently unpublished. Please cite previous cortex_var papers:
- De novo assembly and genotyping of variants using colored de Bruijn graphs, Iqbal(), Caccamo(), Turner, Flicek, McVean (Nature Genetics) (2012) (doi:10.1038/ng.1028)
- High-throughput microbial population genomics using the Cortex variation assembler, Iqbal, Turner, McVean (Bioinformatics) (Nov 2012) (doi:10.1093/bioinformatics/bts673)