Skip to content

abhinav04sharma/logistic_regression_mpi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logistic Regression on MPI

Utility to run logistic regression over MPI. It was built to run gradient descent quickly over data with high dimensionality. The idea is to split dimensions across MPI workers and process gradients in parallel. The implementation consists of two types of MPI processes:

  • Parameter Server: Responsible for distribution and collection of weights and data batches
  • Workers: Responsible for calculating gradient for a range of dimensions

There is a single parameter server and there can be one or more worker (i.e. it needs at least two MPI workers to run).

It uses 20% of the traning data for validation. It supports two modes:

  • Synchronous: In this mode the parameter server does not begin calculation with the next batch until all the workers return their gradients for the current batch. This is mathematically sound.
  • Asynchronous: In this mode the parameter server sends latest weights to a ready worker regardless of whether the previous batch is completed by all workers.

Asynchronous mode is expected to have a smaller runtime especially when run on a heterogenous cluster, it may lead to a lower accuray though.

To build an executable for simply run make.

The executable takes the following command line parameters:

logistic_regression <training file> <delimiter> <learning rate> <regularization parameter> <sync/async> <data passes> [<batch size>]

Batch size is optional and when not provided the entire data set is used (which is basically batch gradient descent).

A sample PBS script is provided to run the code on a cluster.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published