Skip to content

julietsitron/RANSA

Repository files navigation

RANSA (Receptor Array Nested Sampling Algorithm) User Guide

Copyright: © 2011 Julia Tsitron and Alexandre V. Morozov

This document provides instructions for using the RANSA (C++) package.

Here is a Summary of what the algorithm does and its applications:

RANSA can be used in artificial nose systems. Artificial noses mimic the mammalian olfactory system by using arrays of sensors. A single odor molecule invokes a complex pattern of responses across a sensor array. Mixtures of odors complicate things further. The advantage of such systems is their ability to detect and quantify a large number of analytes with a relatively small number of sensors but the challenge is to decode their complex output. And how does one build an artificial nose for mixture detection in the first place? RANSA is a Bayesian-based algorithm that allows us to discover design principles for engineered sensor arrays optimized for mixture recognition. For example, analysis on yeast cells expressing human olfactory receptors shows that mixtures are more easily discriminated when they include antagonists - molecules that do not evoke any response signal from the cells but that bind competitively to the receptor, preventing a "real" signal from being detected. In practice, this means that our best array would include a sensor that has no response to the target chemical, a surprising result. RANSA was also used to successfully infer concentrations of chemicals in complex mixtures in several implementations of artificial arrays.

For complete details about the design and applications of the algorithm please refer to the following two papers:

Tsitron J, Ault AD, Broach JR, Morozov AV (2011) Decoding Complex Chemical Mixtures with a Physical Model of a Sensor Array. PLoS Comput Biol 7(10): e1002224. doi:10.1371/journal.pcbi.1002224

and

Tsitron J, Kreller CR, Sekhar PK, Mukundan R, Garzon FH, Brosha EL, Morozov AV. (2014) Bayesian Decoding of the Ammonia Response of a Zirconia-based Mixed-Potential Sensor in the Presence of Hydrocarbon Interference. Sens. Actuators B: Chem 192: 283-93.

RANSA is licensed under the GNU General Public License (see LICENSE for details). If you use RANSA in your research, please cite the appropriate paper(s) listed above.

Requirements:

To compile the executable `sample', run:

make

or

make sample

-f filename

Load Data file

-model model_type

Choose class of models

DEFAULT: recX_ligY

All Possible modes:

  • Calibration and Inference (GPCR paper):

recX_ligY

MULT_SIG_recX_ligY

MuTOT_recX_ligY

(X = number of receptors, Y = number of ligands. Both are single digits)

  • Optimization of array design:

FULL_HESS_recXX_ligYY

(XX = number of receptors, YY = number of ligands. Both are double digits)

  • Calibration and Inference (gas sensors paper):

LINEAR_Xrec_Ylig

NONLIN_CALIB_Xrec

NONLIN_PRED_Xrec

(X = number of receptors, Y = number of ligands. Both are single digits)

-gapfile gapfilename

Load concentration `gap' file with dilution steps.

(Not needed in recX_ligY and FULL_HESS_recXX_ligYY modes)

-mfile modelfilename

Load file with fixed model parameters

-pfile priorfilename

Load file with priors for each parameter

DEFAULT:

  • ΔG: Uniform [-20 +5] (All modes in GPCR paper)
  • A, b: Uniform [0 1] (All modes in GPCR paper)
  • V0: Uniform [-1 1] (All modes in gas sensors paper)
  • A (slope): Uniform [0 1] (All modes in gas sensors paper)
  • σ: Jeffreys [0.0001 100.0] (ALL modes)
  • α: Jeffreys [0.0001 100.0] (ALL modes)
  • μtot: Uniform[-10 -2] (All modes in GPCR paper)
  • [TOTAL]: Uniform[0.0001 1000.0] (All modes in gas sensors paper)

-n number of nested objects

DEFAULT: 100

-max number of nesting iterations

DEFAULT: 1000

-mc number of MCMC trials for finding a new object

DEFAULT: 20

-D size of unweighted posterior sample

DEFAULT: 1000

-out posterior_filename

File with unweighted posterior samples

(nested sampling results in a posterior distribution where each sample is weighted by its log likelihood. Here, you can output an unweighted sample generated by MCMC)

-rs random seed

change random seed from default value

For analysis of data featured in GPCR-based sensor array paper,

Use one of the following Options:

-model recX_ligY

-model MULT_SIG_recX_ligY

-model MuTOT_recX_ligY

You will need a Data file consisting of three columns:

   mu        I      recN
   -3       .99       1
 -3.5       .92       1
   -4       .84       1
   -3       .98       2 
 -3.5       .81       2
   -4       .80       2
  ...       ...      ...    
  ...       ...      ...
  ...       ...      ...    

(where mu = log₁₀[total conc], I = Intensity, and recN = receptor_number)

recX_ligY mode:

  • fit for any or all of the following: ΔG's, A's, b's, α's, and σ
  • usually used for Calibration step
  • there is a single σ (noise parameter) for ALL of the data that you input

MULT_SIG_recX_ligY mode:

  • fit for any or all of the following: ΔG's, A's, b's, α's, and multiple σ's
  • there are X σ's, one for each receptor (i.e., the noise parameters associated with each receptor's data set need not be the same.)

MuTOT_recX_ligY mode:

  • fit for any or all of the following: ΔG's, A's, b's, α's, σ's, and μtot (μtot is defined as log₁₀[total conc] at the reference point - see METHODS in the paper)

  • usually used for Inference step

  • in this mode, the first column of the Data file is ignored. Instead of reading the total concentration from the Data file, you can fit for it at a reference point, and provide a separate `gap' file that will input the dilution steps

  • so, in addition to inputting a Data file, you will need a concentration `gap' file which includes the dilution steps. A dilution step is defined as the difference of logs (base 10) of the total concentrations of two consecutive measurements. The format of the gapfile is a single column:

      deltaValue
      0.0
      0.5
      1.0
      1.5
      2.0
      2.5
      3.0
      3.5
      6.0
    

in this example, the gapfile could correspond to the first column in the following Data file:

  mu           I        recN
  -3           1          1
-3.5        0.98          1
  -4        0.99          1
-4.5        0.77          1
  -5        0.44          1
-5.5        0.14          1
  -6        0.04          1
-6.5        0.02          1
  -9        0.02          1

here, log₁₀[total conc] at the reference point is -3.

For analysis of data featured in gas-sensors paper,

Use one of the following Options:

-model LINEAR_Xrec_Ylig

-model NONLIN_CALIB_Xrec

-model NONLIN_PRED_Xrec

You will need a Data file consisting of three columns:

   TOT       V      recN
    0       .0006         1
    0       .0006         1
    0       .0005         1
   20       .0009         1 
   20       .0009         1
   20       .0009         1
  ...       ...           1
  ...       ...           1 
    0       .0006         2
    0       .0006         2
    0       .0006         2
   20       .0010         2 
   20       .0011         2
   20       .0010         2
  ...       ...           2
  ...       ...           2
  ...       ...          ...    
  ...       ...          ...
  ...       ...          ...    

as well as a corresponding concentration gap file (same as described above). Here, TOT = [TOTAL], V = Voltage Response, and recN = receptor_number). In this example, there are 3 measurements at each concentration.

LINEAR_Xrec_Ylig mode:

  • fit for any or all of the following: V0's, A's, α, σ's and TOTAL
  • LINEAR_1rec_1lig usually used for Calibration step with α, TOTAL fixed
  • LINEAR_Xrec_Ylig, Y not equal to 1, usually used for Prediction step

NONLIN_CALIB_Xrec mode:

  • fit for: C's, p's, σ's, while keeping α and TOTAL fixed
  • Number of ligands is fixed to 2

NONLIN_PRED_Xrec mode:

  • fit for α and TOTAL
  • must provide a Model File with fixed parameters a, b, a', b', c'

Use the Option

-model FULL_HESS_recXX_ligYY

  • It is not necessary to input a Data file in this mode.
  • fit for any or all of the following: ΔG's, A's, b's, α's, and μtot

Any or all of the parameters can be fixed in any mode by including a model file.

For all modes in GPCR paper, use the following format:

  Prm         Val       OnOff       Ind
  ddG          -5         0          1
    A           1         0          1
    B           0         1          1

  ddG          -5         0          2
    A           1         0          2
    B           0         1          2

alpha           1         1          1
sigma       .0001         1          1
mutot          -3         1

You can fix any of the following Prms:

ddG
A
B
alpha
sigma
mutot

and set them to any Val (keeping in mind the priors you are using). OnOff can be either 0 or 1. 1 will fix the given Prm at the indicated Val whereas 0 will allow that parameter to be fit freely. Setting the OnOff switch to 0 is equivalent to excluding that parameter from the model file.

Ind is the index label of a given parameter. Since there are (X × Y = N) ddG's, A's, and B's the Ind of these parameters will run from 1 to N. There are (Y-1) alpha's, X sigma's and only one mutot - so no Ind label is necessary for mutot.

The order of the index labels for ddG, A, and B follows this pattern:rec# lig#Ind

rec1 lig11

rec1 lig22

rec1 lig33

rec2 lig14

rec2 lig25

rec2 lig36

rec3 lig17

......

The order of the index labels for alpha follows this pattern:alphaInd

alpha1 = lig2/lig11

alpha2 = lig3/lig12

alpha3 = lig4/lig13

......

The index label for sigma is just the receptor label.

An example of a Model File for LINEAR_Xrec_Ylig mode from gas sensors paper:

  Prm         Val       OnOff       Ind
  V_0        -.0032       1          1
    A        .00089       1          1

  V_0       .00006        1          2
    A       .00058        1          2

alpha           1         0          1
sigma       .0001         0          1
TOTAL         200         0

Where you can fix any of the Prms:

V_0
A
alpha
sigma
TOTAL

Any or all of the priors can be changed by including a prior file in the following format:

  Prm    Val1    Val2   PriorType   Ind 
  ddG   -20.0     0.0       u        1 
    A     0.0     2.0       u        1 
    B     0.0     1.0       u        1 

  ddG   -20.0     0.0       u        2 
    A     0.0     2.0       u        2 
    B     0.0     1.0       u        2 

alpha    0.01   100.0       j        1 
sigma   0.001   100.0       j        1

You can modify the PriorType and/or the lower and upper bounds (Val1 and Val2, respectively) of the prior for any Prm. For the Gaussian prior, Val1 and Val2 are the mean and standard deviation, respectively.

The possible PriorTypes are:

u (Uniform)

j (Jeffreys)

g (Gaussian)

The Ind labels follow the same pattern as described above (see the section on Model Files)

1. Calibration

To see an example of a Calibration run, you will need to load a Data file (or use the one provided in the ~/RANSA/examples/ directory called UDPglc_H-20.dat):

./sample -f UDPglc_H-20.dat

The end of the output should look like:

dG1 = -6.2547 +- 0.215176
A1 = 0.750912 +- 0.0555299
B1 = 0.0299895 +- 0.0262493
sigma1 = 0.0517271 +- 0.0235276
log(sigma1) = -3.08013 +- 0.504612

To see how parameter estimation improves with an increased number of nested iterations, try:

./sample -f UDPglc_H-20.dat -max 15000

Now, the output should look like:

dG1 = -6.29776 +- 0.0580941
A1 = 0.746728 +- 0.0177048
B1 = 0.0128173 +- 0.00841609
sigma1 = 0.0364009 +- 0.00488333
log(sigma1) = -3.32185 +- 0.130941

If you want to plot a histogram of your results, you will need an unweighted posterior distribution for each parameter. You can output these results with, say, 10,000 samples into a file called Posterior_sample.out. Also, try running with a different random seed to see if your previous results are stable:

./sample -f UDPglc_H-20.dat -max 15000 -out Posterior_sample.out -D 10000 -rs 54345

2. Inference

To see an example of an Inference run, load a Data file and a gapfile (you can use the ones provided in the ~/RANSA/examples/ directory called UDPgal-UDPglc_4rec.dat and deltaFile_4rec.dat) as well as a Model file with all of the ΔG's, A's, b's, and σ's fixed (you can use the one provided called model_4rec_4lig.dat). The objective here is to fit for α's and μtot.

Since we are inputting parameters for four receptors and four ligands in model_4rec_4lig.dat, we need to use the MuTOT_rec4_lig4 mode. UDPgal-UDPglc_4rec.dat is the data collected from four experiments, each using a mixture of lig1 (UDPGal) and lig2 (UDPglc) so we expect that alpha1 = lig2/lig1 should be ≅ 1, and alpha2 = lig3/lig1 and alpha3 = lig4/lig1 should both be ≅ 0. mutot should be ≅ -3. So, if you run:

./sample -f UDPgal-UDPglc_4rec.dat -model MuTOT_rec4_lig4 -gapfile deltaFile_4rec.dat -mfile model_4rec_4lig.dat -max 15000

Your output should include the following lines:

alpha1 = 1.00997 +- 0.0867217

alpha2 = 0.00277454 +- 0.00643838

alpha3 = 0.0114507 +- 0.00107997

mutot = -2.9861 +- 0.0159282

3. Optimization of parameters in sensor array design

For a simple example of an Optimization run, try optimizing array parameters for a one-receptor, two-ligand system. You can use the Model file provided in the ~/RANSA/examples/ directory called model_FULL_HESS_example.dat which fixes the B's to 0, mutot to -3, and alpha to 0.25. The ΔG's and A's are free to be optimized. Run:

./sample -model FULL_HESS_rec01_lig02 -mfile ~/RANSA/examples/model_FULL_HESS_example.dat -max 20000

Your output should include the following lines:

dG1 = -11.4413 +- 0
A1 = 0.999991 +- 0

dG2 = -11.8439 +- 0
A2 = 0.000124153 +- 0

which indicates that this system is optimized when one ligand is a strong agonist (A1 ≅ 1) and the other is a strong antagonist (A2 ≅ 0).

About

Receptor Array Nested Sampling Algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published