Skip to content

MO-Phylogenetics is a optimization software tool that infers bi-objective phylogenetic trees optimized under two reconstruction criteria the Maximun Parsimony and the Maximun Likelihood simultaneously

ajnebro/MO-Phylogenetics

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 -------------------------------
|                               |
|  MO-Phylogenetics - README    |
|                               |
 -------------------------------

=======================================================================================
TABLE OF CONTENTS
=======================================================================================
1. Requirements 
2. Installing 
3. Executing 
4. Parameters
5. Structure
6. Results
=======================================================================================

=======================================================================================
1. Requirements 
=======================================================================================

MO-Phylogenetics has been developed in Unix machines (Ubuntu and MacOS X) as well as in
Windows (MinGW) using the G++ (4.4.7) compiler. The make utility has been used to compile the
software package.

MO-Phylogenetics is based on jMetalCpp and requires install two frameworks:

1) Bio++: a set of C++ libraries for Bioinformatics, including sequence analysis, phylogenetics, molecular evolution and population genetics. Home Page:  http://biopp.univ-montp2.fr/

2) PLL - Phylogenetic Likelihood Library: a highly optimized, parallized software library to ease the development of new software tools dealing with phylogenetic inference. 
Home Page: Phylogenetic Likelihood Library http://www.libpll.org/

Bio++ also requeries the CMake utility and Pll requeries Autoreconf utility.

=======================================================================================

=======================================================================================
2. Installing MO-Phylogenetics
=======================================================================================

Copy the compressed file to the location where you want to install MO-Phylogenetics and
unzip it.

Then, run the install.sh with the following command:
  %sh install.sh

It installs all the needed libraries: Bio++ (Bpp-Core, Bpp-Seq and Bpp-Phyl) and Pll.
Add the path of the new libraries installed in your .bashrc file

For Bio++:

   export CPATH=$CPATH:MOPhylogenetics_path/lib/Bpp/include    
   export LIBRARY_PATH=$LIBRARY_PATH:MOPhylogenetics_path/lib/Bpp/lib   
   export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:MOPhylogenetics_path/lib/Bpp/lib  

For Pll:

   export CPATH=$CPATH:MOPhylogenetics_path/lib/Pll/src/include    
   export LIBRARY_PATH=$LIBRARY_PATH:MOPhylogenetics_path/lib/Pll/src/lib   
   export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:MOPhylogenetics_path/lib/Pll/src/lib  

=======================================================================================


=======================================================================================	
3. Executing MO-Phylogenetics
=======================================================================================

The main binary is in the subfolder 'bin'. Enter this folder to execute MO-Phylogenetics.
	
  % cd bin
  % ./MOPhylogenetics param= parametersfile

The parameterfile is the filename wich has all the parameters requeried to execute the program. For example, to execute the program using one of our test problem 500 sequences dataset, type the following command:
	
  %./MOPhylogenetics param=parameters/params500.txt

The parameters to solve the state-of-the-art problems are defined in some files saved into the paramaters folder, and also are included the datasets (nucleotide sequences), partition model file and the initial user trees perfomanced by a bootstrap techniques using PhyML and DNAPars software. 


=======================================================================================


=======================================================================================
4. Structure 
=======================================================================================

* MO-Phylogenetics: 
** lib
	*** Bpp: Source code of the Bio++ libraries: Bpp-Core, Bpp-Seq and Bpp-Phyl
	*** Pll: Source code of Phylogenetic Likelihood Library
	libjmetal.a: jMetalCpp library

** src: Source Code of MO-Phylogenetics based in the jMetalCpp framework. Some changes were added to adjut to the Phylogenetic problem.
	
** bin: Main Binary of the software

** data
	*** sequences: Sequences file
	*** model: Partition Model file used by Pll 
	*** inputusertrees: Input Phylogenetic Trees used as repository to generate initial population

** parameters: Parameters file wich contains all the parameters needed to customize the software.

	
=======================================================================================
5. Parameters 
=======================================================================================

Parameters of the Metaheuristic:

experimentid	 = Experiment ID, the results file are renamed using this ID
DATAPATH   	 = data folder name 

algorithm	 = Name, now available: NSGAII,SMSEMOA,MOEAD,PAES
populationsize 	 = Population Size o
maxevaluations 	 = Number of the evaluations 

offset 		= SMSEMOA parameter only

datadirectory	= MOAED/D parameter only

bisections	= Number of bisections,  PAES parameter only
archivesize	= Size of the archive,  PAES parameter only

intervalupdateparameters = Interval of evaluations to Optimize the Branch-length and parameters of 				   the evolutionary model  

printtrace 		= Prints trace of the time elapsed and score value performed during 				  optimizing strategies   
printbestscores 	= Prints best scores found during the algorithm execution 

Genetic Operators

selection.method	= binarytournament and randomselection (for SMSEMOA) are Available.

crossover.method 	= Only the PDG (Prune-Delete-Graph) recombination operator is available.
crossover.probability   = Probability to execute.
crossover.offspringsize = Number of descendents

mutation.method 		= NNI, SPR and TRB Topological mutations are available 
mutation.probability 		= Probability to execute
mutation.distributionindex 	= Distribution Index

# -------------------------------------------------------------------------------------
#                                     Phylogenetic Parameters
# -------------------------------------------------------------------------------------

alphabet	    		= DNA, RNA or Protein are available.
input.sequence.file 		= The sequence file to use (sequences must be aligned!).
		      	  	For example $(DATAPATH)/sequences/55.txt

input.sequence.format		= The alignment format, for exampĺe
			  	Phylip(order=sequential, type=extended, split=spaces)

input.sequence.sites_to_use    = Sites to use all, nogap or complete (=only resolved chars)
input.sequence.max_gap_allowed = Specify a maximum amount of gaps: may be an absolute number 
				 or a percentage. 0%

partitionmodelfilepll 		= Model specifications file. For example: $(DATAPATH)/model/55.model

init.population 		= Method to generate or create the Initial phylogenetic trees: 
				user, random or stepwise
init.population.tree.file 	= FileName of Input User Trees, for example:
				$(DATAPATH)/inputusertrees/bootstrap1000_55

init.population.tree.format 	= Phylogenetic Trees Format: Newick

# ********** Parameters of the Evolutionary Model  ***********
model 				= Evolutionary Model Name, for example GTR+G
#model.kappa 			= Kappa value, HKY+G  Parameter Only 

Frequences 
model.frequences 		= user or empirical
model.piA 			=  PiA
model.piC 			=  PiC 
model.piG 			=  PiG 
model.piT 			=  PiT 

GTR relative rate parameters 
model.AC 			= AC
model.AG 			= AG
model.AT 			= AT
model.CG 			= CG
model.CT 			= CT
model.GT 			= GT, always set 1
  
# ********** Distribution Gamma  ***********
rate_distribution 		= Gamma technique only
rate_distribution.ncat		= Number of categories, 4 by default
rate_distribution.alpha 	= Alpha value
rate_distribution.beta 		= Beta value


# ********** Phylogenetic Optimization  ***********

High-level Strategies for searching the tree space: h2 y h1

H1: This strategy is based on the theoretical perspective of the strong relation between
parsimony and likelihood (minimizing the parsimony score is equivalent to
maximizing the likelihood under some assumptions), and searchs bi-objective phylogenetic trees applying topological moves using the Parametric Progressive Neighbourhood (PPN) technique (Go¨effon et al.,2008) to minimizing the parsimony and with this maximizing the likelihood
simultaneously; furthermore, with the aim of improve the likelihood, optimize all branch lengths affected for the topological moves. 

H2: This strategy is a parametric combination of two techniques that optimize both
criteria separately, for parsimony uses PPN technique applying topological
moves and adjusting the branch lengths affected, and for the likelihood uses a topological rearrange method provided by the Phylogenetic Likelihood Library (PLL).

Parameters:

optimization.method 			= h2 or h2
optimization.method.perc 		= Percentage to apply pllRearrangeSearch or PPN technique

optimization.ppn.numiterations 		= Number of iterations of PPN technique
optimization.ppn.maxsprbestmoves 	= Limit of best moves to be applied in PPN Technique


Parameters of the pllRearrangeSearch  function:

optimization.pll.percnodes 		= % of nodes to be used has root node in pllRearrangeSearch optimization.pll.mintranv		= Define the annulus in which the topological moves should 						take place.
optimization.pll.maxtranv		= Define the annulus in which the topological moves should 						take place.
optimization.pll.newton3sprbranch 	= Optimization of branch-length affected by topological moves


# ********** Branch Length Optimization ***********

Optimization of the Branch-Length of the initial population of phylogenetic trees
bl_optimization.starting 			= true or false
bl_optimization.starting.method 		= Method: newton and gradient are available
bl_optimization.starting.maxitevaluation	= 1000
bl_optimization.starting.tolerance 		= Precision 0.001

Optimization of the Branch-Length during one of the techniques of tree-space exploration or during each Update Interval.
bl_optimization 				= true or false
bl_optimization.method				= Method: newton and gradient are available
bl_optimization.maxitevaluation			= 1000 
bl_optimization.tolerance 			= 0.001

Optimization of the Branch-Length of the final population of phylogenetic trees perfomanced by algorithm
bl_optimization.final 				= true or false
bl_optimization.final.method			= Method: newton and gradient are available
bl_optimization.final.maxitevaluation		= 1000
bl_optimization.final.tolerance 		= 0.001


# ********** Evolutionary Model Optimization ***********
Optimization of the Parameters of the Evolutionary Model in 
model_optimization 				= true or false
model_optimization.method 			= Method. brent or bfgs	are available
model_optimization.maxitevaluation 		= 1000
model_optimization.tolerance 			= 0.001

# ********** Distribution Rate Optimization ***********
ratedistribution_optimization = false
ratedistribution_optimization.method = bfgs #brent or bfgs


=======================================================================================
6. Results
=======================================================================================

The results of MO-Phylogenetics are based in the jMetalCpp output format, 
creates two files called FUN+ExperimentID and VAR+ExperimentID. 
The FUN file contains the data of the pareto front approximation and 
the VAR file contains the optimized phylogenetic trees in newick format.

=======================================================================================


About

MO-Phylogenetics is a optimization software tool that infers bi-objective phylogenetic trees optimized under two reconstruction criteria the Maximun Parsimony and the Maximun Likelihood simultaneously

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 55.2%
  • C 24.8%
  • Makefile 10.3%
  • Shell 5.5%
  • CMake 2.3%
  • Perl 1.3%
  • Other 0.6%