GitHub - benihana/HadoopBNEM: Bayesian Network (BN) parameter learning using Expectation Maximization (EM) on Hadoop! And Age-Layered EM (ALEM) too.

Info

This code amends the libDAI library to provide:

A new parameter learning implementation called Age-Layered Expectation Maximization (ALEM), inspired by the ALPS genetic algorithm work of Greg Hornby of NASA Ames.
A distributed parameter learning implementation using MapReduce and a population structure, with either ALEM or a large random restart-equivalent which I call Multiple Expectation Maximzation (MEM).

This code can be used to recreate the results shown in my paper:

Reed, E. and Mengshoel, O. “Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce”. In BigLearning Workshop of Neural Information Processing Systems (NIPS 2012).

Additionally, this work can be used (with more emphasis on high performance computing) to recreate the algorithms suggested in the following two papers:

Mengshoel, O. J., Saluja, A., & Sundararajan, P. (2012). Age-Layered Expectation Maximization for Parameter Learning in Bayesian Networks. In Proc. of the Fifteenth International Conference on Artificial Intelligence and Statistics (pp. 984-992).

A. Basak, I. Brinster, X. Ma, and O.J. Mengshoel. Accelerating Bayesian network parameter learning using Hadoop and MapReduce. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine '12, pages 101–108, Beijing, China, 2012. ACM.

How to test using non-Hadoop local MapReduce

# working directory: ./mapreduce
make
./main.sh

How to run using Hadoop locally

This is useful for testing purposes before deploying on Amazon EC2 or another Hadoop cluster.

Install Hadoop (tested up to v1.1.1)
Set up environment variables (e.g.) export HADOOP_PREFIX=/home/erik/hadoop/hadoop-1.1.1

Also set up Hadoop for pseudo-distributed operation: http://hadoop.apache.org/docs/r1.1.0/single_node_setup.html
Run the following (I have hadoop in my path):

# working directory: .../libdai/mapreduce
make clobber # WARNING: this clears any existing HDFS data
             # which seems to cause a bug sometimes when
             # accessing the HDFS
hadoop namenode -format
start-all

Initialize a BN and some data

./scripts/init dat/bnets/asia.net 100 4

Start Hadoop streaming

make
./scripts/streaming dat/in

Gander at results

ls out

Stop Hadoop

stop-all

How to run using Amazon EC2

$ make
Launch an EC2 instance and send the mapreduce folder to it.

scp -rCi ~/Downloads/dai-mapreduce.pem mapreduce ec2-user@ec2-107-21-71-50.compute-1.amazonaws.com:

ssh into the instance and launch a cluster.

ssh -Ci ~/Downloads/dai-mapreduce.pem ec2-user@ec2-107-21-71-50.compute-1.amazonaws.com

# Assumes hadoop-ec2 has been configured.
# Launch a cluster of 10 (small) instances
hadoop-ec2 launch-cluster dai-mapreduce 10
# ... wait a really long time for the cluster to initialize

Push the mapreduce folder to the cluster master.

hadoop-ec2 push dai-mapreduce mapreduce

Login to the cluster master.

hadoop-ec2 login dai-mapreduce

Initialize a big bnet and start Hadoop streaming

./scripts/init dat/bnets/water.net 1000 10
# Do standard EM, pop-size=10, mappers=10
./scripts/streaming dat/in -u 10 10
# ... wait a few hours

Collect data!

ls out

Name		Name	Last commit message	Last commit date
Latest commit History 200 Commits
examples		examples
include/dai		include/dai
mapreduce		mapreduce
mapreduce_jni_attempt		mapreduce_jni_attempt
matlab		matlab
param_sharing		param_sharing
scripts		scripts
src		src
swig		swig
tests		tests
utils		utils
AUTHORS		AUTHORS
ChangeLog		ChangeLog
LICENSE		LICENSE
Makefile		Makefile
Makefile.ALL		Makefile.ALL
Makefile.CYGWIN		Makefile.CYGWIN
Makefile.LINUX		Makefile.LINUX
Makefile.MACOSX		Makefile.MACOSX
Makefile.WINDOWS		Makefile.WINDOWS
Makefile.conf		Makefile.conf
README.md		README.md
customdoxygen.css		customdoxygen.css
doxygen.conf		doxygen.conf
install_dependencies.sh		install_dependencies.sh

License

benihana/HadoopBNEM

Folders and files

Latest commit

History

Repository files navigation

Info

How to test using non-Hadoop local MapReduce

How to run using Hadoop locally

How to run using Amazon EC2

About

Resources

License

Stars

Watchers

Forks