# Weaver
## Allele-Specific Quantification of Structural Variations in Cancer Genomes
Version 0.20
Bamtools libraries are needed, included in Weaver_SV/lib and Weaver_SV/inc
Parallel::ForkManager perl package is needed
export LD_LIBRARY_PATH=<PREFIX>/Weaver/Weaver_SV/lib/:$LD_LIBRARY_PATH
libz required //-lz flag
1 Modify the required BOOST directory in src/Makefile
2 ./INSTALL.sh
wget http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_data.tar.gz
EXAMPLE DATA =========
wget http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_example.tar.gz
Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 solo_ploidy TARGET 2
Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0
SV finding
- Input: BAM file from BWA
- Output: VCF file for SV
Master program to generate SV together with other inputs needed for Weaver
- Input: 1000 Genomes Project Phase 1 haplotypes
Core MRF program
- Input: SV
Outputs:
- Purity and haploid-level sequencing coverage
- Allele specific copy number of genomic regions
- Allele specific copy number of structural variations
- Relative timing of structural variations
- Cancer scaffolds
- Phasing of germline SNPs in CNV regions
Core MRF program, with SNP phasing disabled to speed up
Inputs:
- SV
- reference
- Mappability (available for hg19)
- Region (available for hg19)
- wig (from bam)
Weaver PLOIDY -f -S -s ../SNP_dens -g GAP_20140416_num -w -r 1 -m -p 16
Inputs:
- -f reference file (fasta), should match the reference used in original bam file. Especially for most TCGA datasets, the alignment was performed on //www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta, which does not have "chr" prefix [MANDATORY]
- -S SV file, with format consistent with Weaver_SV. [MANDATORY]
- -s SNP file, with ref and alt mappings [MANDATORY]
- -w wig file from bam, storing the coverage information [MANDATORY]
- -r 1, if first time running (generating temp files); 0 if want to use existing temp files. [default 1]
- -m mappability file, download from http://bioen-compbio.bioen.illinois.edu/weaver/Weaver_data.tar.gz [MANDATORY]
- -p number of cores [default 1]
Wiggle file need to be declared with fixedStep, step 1 and span 1 fixedStep chrom=chr1 start=9994 step=1 span=1 if a chromosome has multiple declaration lines, they need to be sorted based on position: fixedStep chrom=chr1 start=9994 step=1 span=1 X X X fixedStep chrom=chr1 start=100 step=1 span=1 X X X Is not allowed
Must be sorted and indexed.
SNP file:
NGS SNP link file
1KGP SNP link
Genome region file:
GAP regions in assembly are annotated.
Storing phased allele specific copy number of genome
CHR BEGIN END ALLELE_1_CN ALLELE_2_CN
Structural variation copy number and phasing, catagory
CHR_1 POS_1 ORI_1 ALLELE CHR_2 POS_2 ORI_2 ALLELE CN germline/somatic_post_aneuploidy/somatic_pre_aneuploidy
Yang Li Jian Ma's Computational Genomics Lab at Carnegie Mellon The code was developed by Yang Li when the Ma lab was at the University of Illinois at Urbana-Champaign