You will need eigen(http://eigen.tuxfamily.org/) to compile this.
The cpu version is very slow, I have included a gpu version, written in BSGP, which is a wrapper for CUDA. (more on BSGP here: http://houqiming.net/)
Demos will be available soon on my website...