Dali

An automatic differentiation library that uses reverse-mode differentation (backpropagation) to differentiate recurrent neural networks, or most mathematical expressions through control flow, while loops, recursion.

This is an reimagination of Andrej Kaparthy's recurrentJS (Github) in C++. It has similar API names but the backbones are using MShadow and C++11's standard library.

@authors Jonathan Raiman and Szymon Sidor

Features

Automatic differentiation
Broadcasting between matrices and vectors
Speed (Language model trained using a 2-Layer LSTM processes 25,000 words per second on a Nvidia GTX 780 TI -- vs. 15,000 words per second on Russel Stewart's NLP-Caffe)
Clarity of API
Lazy evaluation of matrix operations
Hybrid GPU-CPU computation, with best device for each operation selected at runtime
Visualize Neural Network output in real time

Why not use Theano?

Theano is a fantastic tensor and automatic differentiation library, with excellent packages for Deep Learning. Unfortunately, it cannot differentiate through control flow, and computation graphs with many nodes and recurrence require long compilation time (this may somewhat change with the arrival of Josh Schulman's Graph Computation Toolkit). Long compilation times can be alleviated by moving most operations out of scan loops, however this strongly limits expressivity or complicates the code. Finally, because of the separation between the computation and the mathematical description, debugging can be hard.

(Note: Hypergrad offers gradient through control flow, but does not match the performance of Theano)

Why not use Torch?

Torch has excellent community support and a wide variety of packages for Deep Learning, including the popular NN and NN Graph packages, which permit automatic differentiation of Torch Tensors. However, use of these packages requires the definition of forward and backward passes, module / param cloning (See the Torch utilities inside Andrej Karpathy's Char-RNN code), pre-allocation of memory when performing recurrence, and the requirement that all parameters be concatenated when optimizing in Optim, the defacto solver for Torch. Additionally, transfering computation to the GPU demands pre-allocation of memory, which can be problematic in the case of memory-hungry tasks. Finally, running computations in parallel (Hogwild or otherwise) is tricky in Lua / Torch.

Usage

Run a super duper simple example

Create two 3x3 matrices filled with uniform random noise between -2 and 2:

Mat<float> A(3,3, weights<float>::uniform(-2.0, 2.0));
Mat<float> B(3,3, weights<float>::uniform(-2.0, 2.0));

Now let's multiply them:

auto C = A * B;

Now's let take the gradient of the squared sum of this operation:

auto error = (C ^ 2).sum();

And get the gradient of error with respect to A and B:

error.grad();
graph::backward();

auto A_gradient = A.dw();
auto B_gradient = B.dw();

Behind the scenes:

Each matrix has another matrix called dw that holds the elementwise gradients for each matrix. When we multiply the matrices together we create a new output matrix called C, and we also add this operation to our computational graph (held by a thread local variable in graph::tape). When we reach C.sum() we also add this operation to our graph.

Computing the gradient is done in 2 steps, first we tell our graph what the objective function is:

error.grad();

error needs to be a scalar (a 1x1 matrix in this implementation) to use grad(). Step 2 is to call graph::backward() and go through every operation executed so far in reverse using graph::tape's record. When we run through the operations backward we update the gradients of each intermediary object until A and B's dws get updated. Those are now the gradients we we're looking for.

Run a simple (yet advanced) example

Let's run a simple example. We will use data from Paul Graham's blog to train a language model. This way we can generate random pieces of startup wisdom at will! After about 5-10 minutes of training time you should see it generate sentences that sort of make sense. To do this go to build and call:

examples/language_model --flagfile ../flags/language_model_simple.flags

A more extensive example for training a language model can be found under: examples/language_model.cpp.
For a more in-depth description of usage see the character model tutorial
For a funny example where you teach stacked LSTMs about multiplication, substraction, and addition check this out.

Installation

Get GFlags, HiRedis, Clang, and protobuf, then head to the build folder and use cmake to configure and create the appropriate Makefiles.

You need the latest version of Clang (>= 3.6.0).

1. Dependency Installation

1.a on Mac OSX

brew install cmake
brew install gflags
HOMEBREW_CC=clang HOMEBREW_CXX=clang++ brew install protobuf
brew install libev
HOMEBREW_CC=clang HOMEBREW_CXX=clang++ brew install hiredis
cmake ..

1.b on Fedora Linux

yum install make cmake
yum install blas blas-devel
yum install openblas openblas-devel
yum install clang
yum install gflags gflags-devel
yum install sqlite-devel
yum install protobuf protobuf-devel protobuf-compiler
yum install libev libev-devel
yum install hiredis hiredis-devel

If during compilation cblas.h is not found, install the Atlas SSE fixes the problem:

yum install atlas-sse2-devel

2. Compilation

Then use cmake to create the make targets, and run make to compile the code:

With CUDA (if available)

git submodule init
git submodule update
cd build
cmake ..
make -j 9

Without CUDA:

git submodule init
git submodule update
cd build_cpu
cmake .. -DWITH_CUDA=false
make -j 9

That's it. Now built examples will be stored in build/examples. For instance a character prediction model using Stacked LSTMs is built under build/examples/character_prediction.

Tests

To compile and run tests you need Google Tests. Download it here.

1. Compile and run tests

From the build (or build_cpu) folder do the following:

cmake ..
make -j 9 run_tests

2.a Install Gtest on Mac OSX

Homebrew does not offer a way of installing gtest, however in a few steps you can get it running:

wget https://googletest.googlecode.com/files/gtest-1.7.0.zip
cd gtest-1.7.0
mkdir mybuild
cd mybuild
cmake ..
make -j 9
cp libgtest_main.a /usr/local/lib/libgtest_main.a
cp libgtest.a /usr/local/lib/libgtest.a
cp -R ../include/* /usr/local/include/
cd ../..
rm -rf gtest-1.7.0

2.b Install Gtest on Fedora Linux

Using yum it's a piece of cake:

sudo yum install gtest gtest-devel

Latest Clang compiler on Mac OSX

Until Apple decides to fully embrace thread_local abstraction we are sadly forced to update our compilers manually (and no replacing with __thread is not enough...). Here are steps for updating your compiler:

# Go to http://llvm.org/releases/download.html
# Download "Clang for OSX" (tarball). Use version
# 3.6.0 or above
# Unpack .tar.xz (which will by default be in ~/Downloads)
tar xf CLANG.tar.xz
# Then cd into clang and copy to /usr/local:
cd CLANG
cp -R ./* /usr/local/

Utils

In the utilities namespace you will find several tools to make data processing and saving easier.

To create folders similar to how os.makedirs works in Python, you can do:

utils::makedirs("folder/subfolder/");

Random integer between 0 and 2 (included):

utils::randint(0, 2);

Check whether a file is gzipped:

utils::is_gzip("folder/suspicious.gz");

Sort the arguments of a list np.argsort style:

auto sorted_lengths = utils::argsort(lengths);

Future steps

Add ImageNet, Caffe loading, broader ConvNet support (currently have conv2d and conv1d, but no pooling)
Web interface for managing experiments (today Dali-visualizer only shows progress and sample predictions).
Web interface for visualizing network activity.
Add some mathematical expressions from Deepmind's Torch Cephes module.
Distribute training over multiple machines.
Ensure feature parity with Python extension
Implement multigpu support with Fast Asynchronous Parallel SGD
Make it brew, yum/dnf and apt-get installable

Additional Notes

Debugging Assertion Failures

You can use gdb to debug assertion failures in Dali. The majority of the assertions in Dali use utils::assert2 instead of the usual assert method to provide more informative error messages. It is easy to catch and trace these errors using gdb:

gdb --args example/dali_code.o arg1 arg2
...
catch throw
run
...
backtrace

A stack trace for the assertion error should now appear.

Name		Name	Last commit message	Last commit date
Latest commit History 1,180 Commits
build		build
build_cpu		build_cpu
cmake		cmake
dali		dali
data		data
docs		docs
examples		examples
flags		flags
misc		misc
protobuf		protobuf
scripts		scripts
third_party		third_party
.floo		.floo
.flooignore		.flooignore
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE.md		LICENSE.md
README.md		README.md
TODO.txt		TODO.txt

License

codeaudit/Dali

Folders and files

Latest commit

History

Repository files navigation