HyGP

C++ Hybrid Genetic Programming code for symbolic regression of explicit metamodels from data

Main features

Memetic approach

The memetic/hybrid approach is implemented using a sequential quadratic programming (SQP) algorithm to tune the numerical coefficients of the individuals. The number of random initial guesses of the numerical coefficients can be set by the user.

Encoding

Models or individuals are represented by trees with unary and binary operations. The root node is always a binary node.

Selection

Tournament out of three individuals selected from elite or whole population

Genetic operators

The implemented genetic operators are:

reproduction (copy of the elite unchanged to new generation. In case of copies, the reproduction operator replaces copies in the elite with new individuals generated from scratch)
crossover
mutation (alternatively subtree mutation (even generations) and point mutation (odd generations)). The genetic operators are independently applied, that is a model can be subjected to only one of the three operators at each generation.

Fitness function

Multiobjective definition through weighted approach or MinMax (non Pareto). The objectives used to control the shape and behaviour of the evovled models are:

individual root mean square error divided by average elite root mean square error
number of numerical coefficients (squared)
number of illegal operations (i.e. division by zero)
number of nodes (model size)
variance and average of target function, computed from provided data
max and min value of provided data
first root of autocorrelation function (for 1D problems)

The "strategies" that define how the above mentioned objectives are combined to define the fitness function F can be selected from STRAT_STATP parameter in input file:

Weighted approach:

Given RMSE(i,t) the Root Mean Square Error of the i-th HyGP model at generation t as evaluated on the building data set (declared in the input file), N_(tuning coeff) the number of numerical coefficient in the model, N_(illegal op) the number of illegal operations found in the model, N_nodes the number of nodes of the model

F = a_1 RMSE(i,t)/RMSE(i,t-1) + a_2 N_(tuning coeff)^2 + a_3 N_(illegal op) + a_4 N_nodes+ a_8 F_8

Strategy 4 (STRAT_STATP=4):

F_8=sqrt(|input_data_variance - tree_variance|) / [1 + |input_data_mean – tree_mean|]
Strategy 6 (STRAT_STATP=6):

F_8=(sqrt(|input_data_variance - tree_variance|))^3 + |input_data_mean – tree_mean|^3
Strategy 7 (STRAT_STATP=7):

F_8=(sqrt(|input_variance - tree_variance|))^2 + |input_mean – tree_mean|^2
Strategy 8 (STRAT_STATP=8):

F_8 = (sqrt(|input_variance - tree_variance|))^3 + |input_mean – tree_mean|^3 + |input_max – tree_max|^3 + |input_min – tree_min|^3
Strategy 9 (STRAT_STATP=9):

F_8 =(sqrt(|input_variance - tree_variance|))^3 + |input_mean – tree_mean|^3 + |input_max – tree_max|^3 + |input_min – tree_min|^3 + a9 * diverging (1,0)
Strategy 10 (STRAT_STATP=10):

as Strategy 8, but with editing of high level polynomials if found before fitness function evaluation. Diverging terms are replaced by numerical coefficients.
Strategy 13 (STRAT_STATP=13):

F = a_1 F_1 + a_2 F_2 + a_3 F_3 + a_4 F_4 + a_8 F_8 + a_10 F_10 + a_11 F_11

with:

 F_1 = exp(10.0*RMSE(i,t)/|input_max - input_min|)
 F_2 = N_(tuning coeff)^2
 F_3 = N_(illegal op)
 F_4 = N_nodes
 F_8 = exp(10.0*|input_variance-tree_variance|/input_variance) + exp(10.0*|input_mean-tree_mean|/(|input_mean)|+1)) + |input_max-tree_max|/|input_max| 
 F_10 = exp(10.0*|input_ACF_half_point-tree_ACF_half_point|/input_ACF_half_point)-1.0
 F_11 = (|input_tot_variation - tree_tot_variation|/input_tot_variation)^3

Strategy 14 (STRAT_STATP=14) - evolution of statistical equivalent signal:

F = a_2 F_2 + a_3 F_3 + a_4 F_4 + a_8 F_8 + a_10 F_10 + a_11 F_11

with:

 F_2 = N_(tuning coeff)^2
 F_3 = N_(illegal op)
 F_4 = N_nodes
 F_8 = exp(10.0*|input_variance-tree_variance|/input_variance) + exp(10.0*|input_mean-tree_mean|/(|input_mean)|+1)) + |input_max-tree_max|/|input_max| 
 F_10 = exp(10.0*|input_ACF_half_point-tree_ACF_half_point|/input_ACF_half_point)-1.0
 F_11 = (|input_tot_variation - tree_tot_variation|/input_tot_variation)^3

MinMax approach:

Strategy 11 (STRAT_STATP=11):

F = Min{Max[a1 RMSE(i,t)/RMSE(i,t-1), a2 N_(tuning coeff), a3 N_(illegal op), a_4 N_nodes, a5 F8 as in Strategy 8 ]}

Bounded models

The parameter BOUNDED in input file forces HyGP to evolve bounded models (BOUNDED= 1), for extrapolation purposes.

Termination criteria

Evolution ends when the best individual root mean square error goes below a user defined threshold or the maximum number of generations is reached

Further measures

Maximal depth restriction implemented to avoid generation of trees of excessive depth.

Execution

Sequential or parallel (SGE array job)

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
genetic_code		genetic_code
input/Benchmarking		input/Benchmarking
output/Rosenbrock_test		output/Rosenbrock_test
postproc		postproc
test		test
HyGP_flowchart.jpeg		HyGP_flowchart.jpeg
HyGP_quick_guide.pdf		HyGP_quick_guide.pdf
LICENSE		LICENSE
README.md		README.md
experiment_new		experiment_new
experiment_openmp		experiment_openmp
experiment_sge		experiment_sge
gp		gp
makefile		makefile
master.cpp		master.cpp
master.o		master.o
parallel_master.cpp		parallel_master.cpp
posteriori		posteriori

License

umbax/HyGP_3_0

Folders and files

Latest commit

History

Repository files navigation

HyGP

Memetic approach

Encoding

Selection

Genetic operators

Fitness function

Weighted approach:

MinMax approach:

Bounded models

Termination criteria

Further measures

Execution

About

Resources

License

Stars

Watchers

Forks

Languages