Skip to content

dentearl/mafJoin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mafJoin

mafJoin is a tool for combining pairs of maf files that share a common sequence.

Authors

Mark Diekhans

Dependencies

Installation

  1. Download the package. Consider making the parent of mafJoin a sibling directory to sonLib.
  2. cd into the directory.
  3. Type make kentDir=/path/to/kent/src.

Use

Try mafJoin -help for a usage statement.

mafJoin [optional -treelessRoot1="sequence name" -treelessRoot2="sequence name" ...] "common sequence" first.maf second.maf out.maf

Let there be two mafs, AC.maf and BC.maf, that share a common sequence C. A sequence in this context refers to the file wide sample name which is in the format species.chromosomeNumber, e.g. sequence hg18 might have a line hg18.chr19 if the maf contains chromosome 19.

Let AC.maf contain two (in practice there may be any n >= 2) sequences, A and C and let BC.maf contain sequences B and C. A call to mafJoin would look like

mafJoin -treelessRoot1=C -treelessRoot2=C C -maxBlkWidth=10000 -maxInputBlkWidth=1000 AC.maf BC.maf ABC.maf.tmp

For the purposes in evolverSimControl the mafJoin command comes in two flavors. If we have a phylogeny (here in Newick format) (((A, B)C, ) D); whereby A and B are siblings with parent node C and C is a child node of D then the way to create a single maf containing an alignment for ABCD is a two part process of first merging AC and BC into ABC and then merging ABC and CD into ABCD:

  • mafJoin -treelessRoot1=C -treelessRoot2=C C -maxBlkWidth=10000 -maxInputBlkWidth=1000 AC.maf BC.maf ABC.maf.tmp
  • mv ABC.maf.tmp ABC.maf
  • mafJoin -treelessRoot2=D C -maxBlkWidth=10000 -maxInputBlkWidth=1000 ABC.maf CD.maf ABCD.maf.tmp

Note that in the second call to mafJoin we're only establishing the root (-treelessRoot) for the CD.maf sequence because mafJoin added tree information to the ABC.maf file making ABC.maf a tree maf (see docs/README for a technical defininition of a tree-maf). If you're using mafJoin in a multistep progressive merge then you only need to establish the -treelessRoot command if the maf in question lacks tree information. For example, imagine the tree (((A,B)C, (D, E)F)G; whereby G is the root node and has two children, C and F. C has two children A and B, and F has two children D and E. All of the joins would be:

  • ABC-G

  • mafJoin -treelessRoot1=C -treelessRoot2=C C -maxBlkWidth=10000 -maxInputBlkWidth=1000 AC.maf BC.maf ABC.maf.tmp
  • mv ABC.maf.tmp ABC.maf
  • mafJoin -treelessRoot2=G C -maxBlkWidth=10000 -maxInputBlkWidth=1000 ABC.maf CG.maf ABCG.maf.tmp
  • mv ABCG.maf.tmp ABCG.maf
  • DEF-G

  • mafJoin -treelessRoot1=F -treelessRoot2=F F -maxBlkWidth=10000 -maxInputBlkWidth=1000 DF.maf EF.maf DEF.maf.tmp
  • mv DEF.maf.tmp DEF.maf
  • mafJoin -treelessRoot2=D F -maxBlkWidth=10000 -maxInputBlkWidth=1000 DEF.maf FG.maf DEFG.maf.tmp
  • mv DEFG.maf.tmp DEFG.maf
  • ABCDEFG

  • mafJoin -maxBlkWidth=10000 -maxInputBlkWidth=1000 G ABCG.maf DEFG.maf ABCDEFG.maf.tmp
  • mv ABCDEFG.maf.tmp ABCDEFG.maf

Note that the final join does not have any -treelessRoot options since both incoming mafs are already tree-mafs.

Example

Two mafs are included in the example/ directory and can be joined using the command

$ mafJoin -treelessRoot1=C -treelessRoot2=C C example/AC.maf example/BC.maf ABC.maf

About

mafJoin is a tool for combining pairs of MAF files that share a common sequence.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published