Foreword

This is not a complete package, but a starting point to future research on asynchronous SGD based on caffe. You need some minimal efforts to make it work.

Compile

git clone --recursive git@github.com:raingo/caffe-mpi.git
Compile ./caffe with CPU only
- Has to be on a machine with CUDA SDK
Install MPICH3
Refer to ./mpi_env.csh to setup environment variables
make

Test

Generate nodefile: Usually, you need to reserve nodes on your cluster. Edit nodefile to have the hostnames of the reserved machines. For example, if the torque system is used, you can use qsub to reserve machine and cat $PBS_NODEFILE to get the nodes reserved, and put it to the file nodefile
./release.sh: to pack everything to the distribute directory
- You need to take a look at ./add-deps.sh as an example to resolve dependency on remote machines
./sync.sh: to copy everything onto cluster machines
./run_all_exp.sh: on the master machine (qsub) to run the experiments
./wrap-evaluate.sh: on a GPU machine to evaluate the trained model
- Because there are intensive snapshot, a special format is used to store multiple snapshots. See snapshot.proto for details.
./wrap-plot.sh: generate the plots in the paper

Source Files

./sgd-mpi.cpp: asynchronous SGD
./sgd.cpp: single node
./mpi.hpp: MPI primitives
./snapshot.proto: snapshots format for persistence
./flags.hpp: common flags used by ./sgd.cpp and ./sgd-mpi.cpp

Contact

Please write your comments on the issue tracker

Reference

Xiangru Lian, Yijun Huang, Yuncheng Li and Ji Liu. "Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization." Advances in Neural Information Processing Systems (NIPS) 2015. arxiv

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
caffe @ b12c171		caffe @ b12c171
example		example
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md
add-deps.sh		add-deps.sh
common.hpp		common.hpp
count		count
count.cpp		count.cpp
evaluator.cpp		evaluator.cpp
evaluator.hpp		evaluator.hpp
evaluator.sh		evaluator.sh
extra-deps.sh		extra-deps.sh
flags.hpp		flags.hpp
mpi.hpp		mpi.hpp
mpi_env.csh		mpi_env.csh
nodefile		nodefile
plot.py		plot.py
plot2.py		plot2.py
release.sh		release.sh
rm-data.sh		rm-data.sh
run_all_exp.sh		run_all_exp.sh
run_exp.sh		run_exp.sh
run_mpi.sh		run_mpi.sh
sgd-mpi.cpp		sgd-mpi.cpp
sgd.cpp		sgd.cpp
snapshot.proto		snapshot.proto
sync.sh		sync.sh
wrap-evaluate.sh		wrap-evaluate.sh
wrap-plot.sh		wrap-plot.sh

raingo/caffe-mpi

Folders and files

Latest commit

History

Repository files navigation

Foreword

Compile

Test

Source Files

Contact

Reference

About

Resources

Stars

Watchers

Forks

Languages