Skip to content

ambarrio/mccortex

 
 

Repository files navigation

McCortex: Population De Novo Assembly and Variant Calling

Multi-sample de novo assembly and variant calling using de bruijn graphs. Variant calling with and without a reference genome. Between closely related samples or highly diverged ones. From bacterial to mammalian genomes. Minimal configuration. And it's free.

Isaac Turner's experimental rewrite of cortex_var, to handle larger populations with better genome assembly. PhD supervisor: Prof Gil McVean. Collaborators: Zam Iqbal, Kiran Garimella. Based at the Wellcome Trust Centre for Human Genetics, University of Oxford.

Note: Currently under development. Expect bugs, fixes and vague documentation until we hit our first release. Feel free to try out McCortex and watch this space for the release. An announcement will be made on the cortex mailing list.

10 August 2015

Branch Status
master: Build Status
develop: Build Status
code analysis: Coverity Scan Build Status

Build

McCortex compiles with clang and gcc. Tested on Mac OS X and linux. Requires zlib. Download with:

git clone --recursive https://github.com/mcveanlab/mccortex

To compile for a maximum kmer size of 31:

make

to compile for a maximum kmer size of 63:

make MAXK=63

Executables appear in the bin/ directory.

Commands

usage: mccortex31 <command> [options] <args>
version: ctx=XXXX zlib=1.2.5 htslib=1.2.1 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31

Commands:   breakpoints  use a trusted assembled genome to call large events
            bubbles      find bubbles in graph which are potential variants
            build        construct cortex graph from FASTA/FASTQ/BAM
            calls2vcf    convert bubble/breakpoint calls to VCF
            check        load and check graph (.ctx) and path (.ctp) files
            clean        clean errors from a graph
            contigs      assemble contigs for a sample
            correct      error correct reads
            coverage     print contig coverage
            index        index a sorted cortex graph file
            inferedges   infer graph edges between kmers before calling `thread`
            join         combine graphs, filter graph intersections
            links        clean and plot link files (.ctp)
            pjoin        merge path files (.ctp)
            popbubbles   Pop bubbles in the population graph
            pview        text view of a cortex path file (.ctp)
            reads        filter reads against a graph
            rmsubstr     reduce set of strings to remove substrings
            sort         sort the kmers in a graph file
            subgraph     filter a subgraph using seed kmers
            thread       thread reads through cleaned graph
            uniqkmers    generate random unique kmers
            unitigs      pull out unitigs in FASTA, DOT or GFA format
            view         text view of a cortex graph file (.ctx)


  Type a command with no arguments to see help.

Common Options:
  -h, --help            Help message
  -q, --quiet           Silence status output normally printed to STDERR
  -f, --force           Overwrite output files if they already exist
  -m, --memory <M>      Memory e.g. 1GB [default: 1GB]
  -n, --nkmers <H>      Hash entries [default: 4M, ~4 million]
  -t, --threads <T>     Limit on proccessing threads [default: 2]
  -o, --out <file>      Output file
  -p, --paths <in.ctp>  Assembly file to load (can specify multiple times)

Getting Helps

Type a command with no arguments to see usage. The following may also be useful:

Live chat (email me to fix a time):

  • HipChat to instant message -- please email me first to arrange a time
  • Gitter https://gitter.im/mcveanlab/mccortex

Code And Contributing

Issues can be submitted on github. Pull requests welcome. Please add your name to the AUTHORS file. Code should compile on mac/linux with clang/gcc without errors or warnings.

More on the wiki

Unit tests are run with make test and integration tests with cd tests; ./run. Both of these test suites are run automatically with Travis CI when commits are pushed to GitHub.

Static analysis can be run with cppcheck:

cppcheck src

or with clang:

rm -rf bin/mccortex31
scan-build make RECOMPILE=1

Occasionally we also run Coverity Scan. This is done by pushing to the coverity_scan branch on github, which triggers Travis CI to upload the latest code to Coverity.

Coverity Scan Build Status

License: MIT

Bundled libraries may have different licenses:

Used in testing:

Citing

'Cortex with low memory and read threading' is currently unpublished. Please cite previous cortex_var papers:

  • De novo assembly and genotyping of variants using colored de Bruijn graphs, Iqbal(), Caccamo(), Turner, Flicek, McVean (Nature Genetics) (2012) (doi:10.1038/ng.1028)
  • High-throughput microbial population genomics using the Cortex variation assembler, Iqbal, Turner, McVean (Bioinformatics) (Nov 2012) (doi:10.1093/bioinformatics/bts673)

About

Cortex on steroids: for larger populations with better assembly

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 79.0%
  • Makefile 8.0%
  • Perl 7.8%
  • Shell 2.9%
  • R 1.1%
  • Roff 0.7%
  • Other 0.5%