GitHub - KhaosResearch/MORPHY

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
bin		bin
data		data
lib		lib
parameters		parameters
src		src
.gitignore		.gitignore
Makefile		Makefile
README		README
install.sh		install.sh
Repository files navigation

 -------------------------------
|                               |
|  MORPHY - README              |
|                               |
 -------------------------------

=======================================================================================
TABLE OF CONTENTS
=======================================================================================
1. Requirements 
2. Installing 
3. Executing 
4. Parameters
5. Structure
6. Results
=======================================================================================

=======================================================================================
1. Requirements 
=======================================================================================

MORPHY has been developed in Unix machines (Ubuntu and MacOS X) using the G++ (4.4.7) compiler. The make utility has been used to compile the software package.

MORPHY is based on MO-Phylogenetics and requires install two frameworks:

1) Bio++: a set of C++ libraries for Bioinformatics, including sequence analysis, phylogenetics, molecular evolution and population genetics. Home Page:  http://biopp.univ-montp2.fr/

2) PLL - Phylogenetic Likelihood Library: a highly optimized, parallized software library to easy the development of new software tools dealing with phylogenetic inference. Home Page: Phylogenetic Likelihood Library http://www.libpll.org/

Bio++ also requeries the CMake utility and Pll requeries Autoreconf utility.

=======================================================================================

=======================================================================================
2. Installing MORPHY
=======================================================================================

Copy the compressed file to the location where you want to install MORPHY and
unzip it.

Then, run as a root user the script install.sh with the following command:
  %sudo sh install.sh

It installs all needed libraries: Bio++ (Bpp-Core, Bpp-Seq and Bpp-Phyl) and Pll
in default directories `/usr/local/bin' and `/usr/local/include'.

=======================================================================================


=======================================================================================	
3. Executing MORPHY
=======================================================================================

The main binary is in the subfolder 'bin'. Enter this folder to execute MORPHY
	
  % cd bin
  % ./MORPHY param= parametersfile

The parameterfile is the filename wich has all the requeried parameters to execute the program. For example, to execute the program using one of our test problem ZILLA_500 sequences dataset, type the following command:
	
  %./MORPHY param=parameters/params500.txt

The parameters to solve the DNA state-of-the-art and AA Test problems are defined in some files saved into the paramaters folder, and also are included the datasets (Amino-Acids and Nucleotide sequences), partition model file and the initial user trees perfomanced by a bootstrap techniques using PhyML, IQTREE, RAxML and DNAPars softwares. 

Simple Example

If you need run a simple example of the algorithm, only define the filename of the dataset of Amino-Acids sequences in Phylip format with the partition file, using the parameters "sequencefile" and "partitionmodelfile"

  %./MORPHY sequencefile=sequences.phy partitionmodelfile=sequences.model

The rest of phylogenetic parameters will be estimated and the parameters of the algorithm will defined with the best default values obtanied in our experiments.

=======================================================================================


=======================================================================================
4. Structure 
=======================================================================================

* MORPHY: 
** lib
	*** Bpp: Source code of the Bio++ libraries: Bpp-Core, Bpp-Seq and Bpp-Phyl
	*** Pll: Source code of Phylogenetic Likelihood Library
	libjmetal.a: jMetalCpp library

** src: Source Code of MORPHY based in the MO-Phylogenetics framework. Some changes were added to adjust to the Phylogenetic problem.
	
** bin: Main Binary of the software

** data
	*** sequences: Sequences file
	*** model: Partition Model file used by Pll 
	*** inputusertrees: Input Phylogenetic Trees used as repository to generate initial population

** parameters: Parameters file wich contains all the needed parameters to customize the software.

	
=======================================================================================
5. Parameters 
=======================================================================================

Parameters of the Metaheuristic:

experimentid			 = Experiment ID, the results file are renamed using this ID
DATAPATH   	 		 = data folder name 

populationsize 	 		 = Population Size
maxevaluations 			 = Number of the evaluations 

intervalupdateparameters	 = Interval of evaluations to Optimize the Branch-length and parameters of the substitution model  

printtrace 			 = Prints trace of the elapsed time and score value performed during optimizing strategies   
printbestscores 		 = Prints best scores found during the algorithm execution 

*****Genetic Operators****

selection.method		= binarytournament and randomselection (for SMSEMOA) are Available.
mutation.method 		= NNI, SPR and TRB Topological mutations are available 
mutation.probability 		= Probability to execute

# ----------------------------------------------------------------------------------------
#                                     Phylogenetic Parameters
# ----------------------------------------------------------------------------------------

alphabet	    		= Protein or DNA are available.
sequencefile	 		= The sequence file to use (sequences must be aligned!).
		      	  	For example $(DATAPATH)/sequences/55.txt

input.sequence.format		= The alignment format, for exampĺe
			  	Phylip(order=sequential, type=extended, split=spaces)

partitionmodelfilepll 		= Model specifications file. For example: $(DATAPATH)/model/55.model

init.population 		= Method to generate or create the Initial phylogenetic trees: 
				user, random or stepwise
init.population.tree.file 	= FileName of Input User Trees, for example:
				$(DATAPATH)/inputusertrees/bootstrap1000_55

init.population.tree.format 	= Phylogenetic Trees Format: Newick

# ********** Parameters of the Evolutionary Model  ***********
model 				= Evolutionary Model Name, for example GTR+G

********** Frequences ********
model.frequences 		= user or empirical
model.piA 			=  PiA
model.piC 			=  PiC 
model.piG 			=  PiG 
model.piT 			=  PiT 

********** GTR relative rate parameters **********
model.AC 			= AC
model.AG 			= AG
model.AT 			= AT
model.CG 			= CG
model.CT 			= CT
model.GT 			= GT, always set 1
  
# ********** Distribution Gamma  ***********
rate_distribution 		= Gamma technique only
rate_distribution.ncat		= Number of categories, 4 by default
rate_distribution.alpha 	= Alpha value


# ********** Phylogenetic Optimization  ***********

High-level Strategies for searching the tree space: h2 y h1

H1: This strategy is based on the theoretical perspective of the strong relation between
parsimony and likelihood (minimizing the parsimony score is equivalent to
maximizing the likelihood under some assumptions), and searchs bi-objective phylogenetic trees applying topological moves using the Parametric Progressive Neighbourhood (PPN) technique to minimizing the parsimony and with this maximizing the likelihood simultaneously; furthermore, with the aim of improve the likelihood, optimize all branch lengths affected for the topological moves. 

H2: This strategy is a parametric combination of two techniques that optimize both
criteria separately, for parsimony uses PPN technique applying topological
moves and adjusting the branch lengths affected, and for the likelihood uses a topological rearrange method provided by the Phylogenetic Likelihood Library (PLL).

Parameters:

optimization.method 			= h2 or h2
optimization.method.perc 		= Percentage to apply pllRearrangeSearch or PPN technique

Parameters of the PPN  function:

optimization.ppn.numiterations 		= Number of iterations of PPN technique
optimization.ppn.maxsprbestmoves 	= Limit of best moves to be applied in PPN Technique

Parameters of the pllRearrangeSearch  function:

optimization.pll.percnodes 		= % of nodes to be used has root node in pllRearrangeSearch optimization.pll.mintranv		= Define the annulus in which the topological moves should 						take place.
optimization.pll.maxtranv		= Define the annulus in which the topological moves should 						take place.
optimization.pll.newton3sprbranch 	= Optimization of branch-length affected by topological moves


# ********** Branch Length Optimization ***********

Optimization of the Branch-Length of the initial population of phylogenetic trees
bl_optimization.starting 			= true or false
bl_optimization.starting.maxitevaluation	= 1000

Optimization of the Branch-Length during one of the techniques of tree-space exploration or during each Update Interval.
bl_optimization 				= true or false
bl_optimization.maxitevaluation			= 1000 

Optimization of the Branch-Length of the final population of phylogenetic trees perfomanced by algorithm
bl_optimization.final 				= true or false
bl_optimization.final.maxitevaluation		= 1000


# ********** Evolutionary Model Optimization ***********
Optimization of the Parameters of the Evolutionary Model in 
model_optimization 				= true or false
model_optimization.tolerance 			= 0.001


#************* Consensus tree *****************
Calculates the consensus tree of the final set of non-dominated solutions, defined from the number of occurrences of bipartitions. A bipartition is included if it is compatible with all previously included bipartitions, and if its score is higher than a threshold

consensus = true/false

consensus_threshold = Minimal acceptable score =number of occurrence of a bipartition/number of trees (0.<=threshold<=1.)

0 will output a fully resolved tree, 0.5 corresponds to the majority rule and 1 to the strict consensus, but any intermediate value can be specified.


=======================================================================================
6. Results
=======================================================================================

The results of MORPHY are based in the MO-Phylogenetics output format, 
creates two files called FUN+ExperimentID and VAR+ExperimentID. 
The FUN file contains the data of the pareto front approximation and 
the VAR file contains the optimized phylogenetic trees in newick format.

=======================================================================================


7.- News
=======================================================================================


=======================================================================================