Skip to content

Sampling and manipulating genome-wide ancestral recombination graphs (ARGs)

License

Notifications You must be signed in to change notification settings

jeffhsu3/argweaver

 
 

Repository files navigation

ARGweaver

Sampling and manipulating genome-wide ancestral recombination graphs (ARGs).

The ARGweaver software package contains programs and libraries for sampling and manipulating ancestral recombination graphs (ARGs). An ARG is a rich data structure for representing the ancestry of DNA sequences undergoing coalescence and recombination.

ARGweaver citation: Matthew D. Rasmussen, Adam Siepel. Genome-wide inference of ancestral recombination graphs. 2013. arXiv:1306.5110 [q-bio.PE]

Requirements

The following dependencies must be installed to compile and run ARGweaver:

Install

To compile the ARGweaver commands and library use the Makefile:

make

Once compiled, install the ARGweaver programs (default install in /usr) using:

make install

By default this will install all files into /usr, which may require super user permissions. To specify your own installation path use:

make install prefix=$HOME/local

If you use this option, make sure $HOME/local/bin is in your PATH and $HOME/local/lib/python2.X/site-packages is in your PYTHONPATH.

ARGweaver can also run directly from the source directory. Simply add the bin/ directory to your PATH environment variable or create symlinks to the scripts within bin/ to any directory on your PATH. Also add the argweaver source directory to your PYTHONPATH. See examples/ for details.

Quick Start

Here is a brief example of an ARG simulation and analysis. To generate simulated data containing a set of DNA sequences and an ARG describing their ancestry the following command can be used:

arg-sim \
    -k 8 -L 100000 \
    -N 10000 -r 1.6e-8 -m 1.8e-8 \
    -o test1/test1

This will create an ARG with 8 sequences each 100kb in length evolving in a population of effective size 10,000 (diploid), with recombination rate 1.6e-8 recombinations/site/generation and mutation rate 1.8e-8 mutations/generation/site. The output will be stored in the following files:

test1/test1.arg   -- an ARG stored in *.arg format
test1/test1.sites -- sequences stored in *.sites format

To infer an ARG from the simulated sequences, the following command can be used:

arg-sample \
    -s test1/test1.sites \
    -N 10000 -r 1.6e-8 -m 1.8e-8 \
    --ntimes 20 --maxtime 200e3 -c 10 -n 100 \
    -o test1/test1.sample/out

This will use the sequences in test1/test1.sites and it assumes the same population parameters as the simulation (i.e. -N 10000 -r 1.6e-8 -m 1.8e-8). Also several sampling specific options are given (i.e. 20 discretized time steps, a maximum time of 200,000 generations, a compression of 10bp for the sequences, and 100 sampling iterations. After sampling the following files will be generated:

test1/test1.sample/out.log
test1/test1.sample/out.stats
test1/test1.sample/out.0.smc.gz
test1/test1.sample/out.10.smc.gz
test1/test1.sample/out.20.smc.gz
...
test1/test1.sample/out.100.smc.gz

The file out.log contains a log of the sampling procedure, out.stats contains various ARG statistics (e.g. number of recombinations, ARG posterior probability, etc), and out.0.smc.gz through out.100.smc.gz contain 11 samples of an ARG in *.smc file format.

To estimate the time to most recent common ancestor (TMRCA) across these samples, the following command can be used:

arg-extract-tmrca test1/test1.sample/out.%d.smc.gz \
    > test1/test1.tmrca.txt

This will create a tab-delimited text file containing six columns: chromosome, start, end, posterior mean TMRCA (generations), lower 2.5 percentile TMRCA, and upper 97.5 percentile TMRCA. The first four columns define a track of TMRCA across the genomic region in BED file format.

Many other statistics can be extracted from sampled ARGs. For more details see examples/.

Development

The following Python libraries are needed for developing ARGweaver:

nose
pyflakes
pep8

These can be installed using

pip install -r requirements-dev.txt

About

Sampling and manipulating genome-wide ancestral recombination graphs (ARGs)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published