Likelihood curve fitting by nonlinear least squares.
Building the library requires CMake.
Compiling the lcfit C library requires the GNU Scientific Library and NLopt.
Compiling the lcfit C++ extension library, the lcfit-compare
tool, and the test suite requires a C++11-compatible compiler.
Additionally, the lcfit-compare
tool requires the libraries bpp-core
, bpp-seq
, and bpp-phyl
from Bio++ 2.2.0.
On Debian/Ubuntu:
sudo apt-get install \
libgsl0-dev \
libnlopt-dev \
libbpp-core-dev \
libbpp-seq-dev \
libbpp-phyl-dev
Run make
to obtain static and dynamic libraries.
Run make doc
to build documentation (requires Doxygen).
To build the lcfit-compare
tool required for running the example and simulations, run make lcfit-compare
.
To build and run the test suite, run make test
.
nestly is used to build an extensive hierarchy of directories and configuration files to measure the behavior of lcfit
when applied to a wide variety of data.
Running the simulations requires Python 2.7, R 3.2.4, and BppSuite 2.2.0 in addition to the dependencies required for the lcfit-compare
tool described above.
Instructions for installing the Python and R package dependencies and running the simulations can be found in sims/README.md
.
The most straightforward way of fitting a model to an empirical log-likelihood function is via the C API function lcfit_fit_auto()
declared in lcfit_select.h
.
This function accepts a callback function for computing the log-likelihood at a given branch length, an initial model, and the minimum and maximum branch lengths to consider.
As an example, assume you have a function for computing log-likelihood with the following signature:
double my_lnl(tree_t* tree, int node_id, double branch_length);
To adapt this function for lcfit_fit_auto()
, you might do something like the following:
typedef struct {
tree_t* tree;
int node_id;
} my_lnl_data_t;
double my_lnl_callback(double branch_length, void* data)
{
my_lnl_data_t* lnl_data = (my_lnl_data_t*) data;
return my_lnl(lnl_data->tree, lnl_data->node_id, branch_length);
}
You could then use the callback and data struct like this:
my_lnl_data_t my_lnl_data;
// populate my_lnl_data struct
my_lnl_data.tree = ...
my_lnl_data.node_id = ...
// choose initial model
bsm_t lcfit_model = DEFAULT_INIT; // DEFAULT_INIT defined in lcfit.c
// set bounds
double min_t = ...
double max_t = ...
// fit model and estimate maximum-likelihood branch length
double ml_t = lcfit_fit_auto(&my_lnl_callback, &my_lnl_data, &lcfit_model, min_t, max_t);
In C++, callback data isn't limited to plain datatypes or C-style structs.
See the function log_likelihood_callback()
in lcfit_cpp_src/lcfit_compare.cc
for an example of using a C++ class in the callback function for computing log-likelihoods.
The lcfit C++ API includes a simple rejection sampler for sampling from the unnormalized posterior given an lcfit model and an exponential prior. It's used as follows:
gsl_rng* rng = gsl_rng_alloc(gsl_rng_default);
bsm_t lcfit_model = ... // fitted model
double lambda = ... // exponential distribution rate parameter
lcfit::rejection_sampler sampler(rng, lcfit_model, lambda);
// single sample
double x = sampler.sample();
// multiple samples
std::vector<double> xs = sampler.sample_n(1000);
The rejection sampler class also has a few functions exposed for computing the likelihood, density, and cumulative density at a given branch length. The density and cumulative density functions rely on GSL for numerical integration.