GitHub - DrahmA/HLearn: Homomorphic machine learning

#Homomorphic Learning

HLearn is a suite of libraries for interpreting machine learning models according to their algebraic structure. Every structure has associated algorithms useful for learning. When we show that a model is an instance of a particular structure, we get the associated algorithms "for free."

Structure	What we get
Monoid	parallel batch training
Monoid	online training
Monoid	fast cross-validation
Abelian group	"untraining" of data points
Abelian group	more fast cross-validation
R-Module	weighted data points
Vector space	fractionally weighted data points
Functor	fast simple preprocessing of data
Monad	fast complex preprocessing of data

This interpretation of machine learning is somewhat limiting in that not all models have obvious algebraic structure. But many important models do. Currently implemented models include:

Univariate distributions: exponential, log-normal, normal, kernel density estimator, binomial, categorical, geometric, poisson
Multivariate distributions: normal, categorical, subset of markov networks
Classifiers: naive bayes, full bayes, perceptron, bagging, boosting (sort of)
Univariate regression: exponential, logistic, power law, polynomial
NP-hard approximations: k-centers, bin packing, multiprocessor scheduling
Other: markov chains, many nearest neighbor based algorithms with cover trees

Note: These models not included in the latest hackage releases: decision stumps/trees, and k-nearest neighbor (kd-tree based)

Example: normal distribution

Every model in HLearn is trained from a data set using the function train. The type signature specifies which model we're training.

let dataset =  [1,2,3,4,5,6]
let dist = train dataset :: Normal Double

We can train in parallel using the higher order function parallel. The GHC run time automatically takes advantage of multiple cores on your compuer. If you have 4 cores, then run time is 4x faster.

let dist' = parallel train dataset :: Normal Double

We can also train in online mode. This is where you add data points to an already existing model using either the function add1dp or addBatch.

let dist_online1 = add1dp dist 7
let dist_online2 = addBatch dist [7,8,9,10]

Finally, once we've trained a data point, we can do all the normal operations on it we would expect. One common operation on distributions is evaluating the probability density function. We do this with the pdf function.

pdf dist 10

For more details on why the Normal distribution has algebraic structure and what we can do with it, see the blog post Normal distributions form a monoid and why machine learning experts should care.

Future directions

HLearn is under active development. At present, it is primarily a research tool. This means that the interfaces may change significantly in the future (but will definitely follow the PVP). I'm hoping HLearn will eventually become a stable package that will make it easy to incorporate machine learning techniques into Haskell programs.

Current development is focused in two areas. First, implementing new models and their algebraic structures. Many unimplemented models have "trivial" algebraic structure. But for many successful models it is unknown whether they can have interesting structure. The second area is investigating new structures. Many models have Functor/Applicative/Monoid structure (or in some strict sense almost have these structures) and I'm working on how to exploit these structures.

Any comments / questions / pull requests are greatly appreciated!

Name		Name	Last commit message	Last commit date
Latest commit History 705 Commits
cbits		cbits
datasets @ ea5a06d		datasets @ ea5a06d
examples		examples
scripts		scripts
src		src
subhask @ 3093a59		subhask @ 3093a59
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
hlearn.cabal		hlearn.cabal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cbits

cbits

datasets @ ea5a06d

datasets @ ea5a06d

examples

examples

scripts

scripts

src

src

subhask @ 3093a59

subhask @ 3093a59

.gitignore

.gitignore

.gitmodules

.gitmodules

LICENSE

LICENSE

README.md

README.md

hlearn.cabal

hlearn.cabal

Repository files navigation

Example: normal distribution

More documentation

Future directions

About

Releases

Packages

Languages

License

DrahmA/HLearn

Folders and files

Latest commit

History

Repository files navigation

Example: normal distribution

More documentation

Future directions

About

Resources

License

Stars

Watchers

Forks

Languages