Skip to content

jarkki/mc-control

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mc-control is a C++ library for solving stochastic dynamic optimization problems with Monte Carlo optimal control. It solves continuous state & continuous action problems by discretizing the continuous variables.

Example value function contours with optimal policy for optimal savings problem:

Optimal policy for optimal consumption problem

Example discretized probability distribution for optimal savings problem (one state variable):

Discretized probability distribution

Introduction

The library implements the two on-policy algorithms (exploring starts, epsilon-soft policy) described in the 5th chapter of

Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 1998.

The optimization problem considered is the stochastic dynamic optimization problem of finding a policy that maximizes the expected discounted rewards over either a finite or an infinite time horizon. The finite horizon problem is

,

where is a policy function, is a discount factor, is a reward function, is a stochastic state variable, is an action taken by the agent when at state , following the policy . denotes the time period. The state variable is assumed to be Markovian, which is why problems of this type are often called Markov Decision Processes.

While approximate dynamic programming methods like fitted value iteration can be the logical choice for continuous state & continuous action problems, they can be unstable and hard to implement due to the several layers of approximations. The Monte Carlo control algorithms can be very useful for checking the results obtained from dynamic programming methods. Depending on the nature of the problem, the Monte Carlo methods can be a better fit compared to other reinforcement learning methods like Q-learning and fitted Q-iteration. MC methods are also less prone to violations of the Markov property and do not need a model for the dynamics, only simulations or samples from interacting with the system.

Installation

Dependencies

This library depends on three other libraries:

  • Armadillo for matrices, vectors and random number generation
  • Boost for boost::irange range-based iterator

For plotting you also need

Compilation

mc-control is a header-only library and uses some c++11 features. Just run make in the root directory to compile the example optimal savings model.

If the compiler cannot find Armadillo or Boost, edit the makefile, which has variables for custom header and library search paths for these libraries (boost is header only).

Example

A classic example for a stochastic dynamic optimization problem in economics is the neoclassical consumption model with stochastic income. Agent splits her income into consumption and savings and seeks the savings policy that maximizes her expected discounted utility from consumption over an infinite time horizon:

s.t.

, (feasibility constraint),

The transition function for income is

with the action representing the amount to save, given the income.

Popular choice for the shock is log-normal distribution For utility function, .

More details:

  • Stachurski, John. Economic dynamics: theory and computation. MIT Press, (2009).
  • Stokey, Nancy, and R. Lucas. Recursive Methods in Economic Dynamics Harvard University Press (1989).

Implementation can be found here: examples/optgrowth.cpp.

Solving the dynamic problem

Dynamic optimization problem such as the optimal consumption/savings can be solved with the help of the recursive Bellman equation:

The Bellman equation represents the value of being in a state and following policy .

For the optimal savings problem the Bellman equation represents the rewards/returns as

.

C++ Implementation

Any model has to be derived from the base model struct (in mc-control/model.hpp):

/*! Abstract base struct for the models
*
*/
struct Model{

    /*! next_state = f(state, action)*/
    virtual vec transition(const vec & state, const double & action) const = 0;

    /*! Samples the transition funciton n times*/
    virtual mat sample_transitions(const double & action, size_t n) const = 0;

    /*! Reward from being in a state, taking action and ending in next_state */
    virtual double reward (const vec & state_value, const double & action_value, const vec & next_state_value) const = 0;

    /*! Returns true if it is possible to take the action from this state */
    virtual bool constraint(const double & action, const vec & state) const{
    return true;
    };
};

Then one of the two episode generating functions has to be implemented:

// For soft policies
tuple<uvec,uvec,vec> episode_soft_pol(const DiscretizedOptimalGrowthModel & discrete_model,  const uvec & pol);

// For exploring starts
tuple<uvec,uvec,vec> episode_es(const DiscretizedOptimalGrowthModel & discrete_model,  const size_t & state,  const size_t & action, const  uvec & pol);

The episode generating functions returns a three-tuple of all actions, states and returns occurring during the episode.

For a full example implementing the optimal savings model, see examples/optgrowth.cpp.

The two implemented algorithms

  1. Monte Carlo control with exploring starts (Figure 5.4 in Sutton & Barto)
    • For infinite horizon problems (like the optimal savings problem), this algorithm reduces to randomly sampling the state-action space.
  2. Monte Carlo control with a soft policy (epsilon greedy) (Figure 5.6 in Sutton & Barto)

Both algorithms are implemented in file mc-control/algorithms.hpp.

See chapter 5. in Reinforcement learning: An introduction for details.

#License

mc-control is made available under the terms of the GPLv3.

See the LICENSE file that accompanies this distribution for the full text of the license.

About

Monte Carlo optimal control for stochastic dynamic optimization problems written in C++

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published