Please visit our web page for information about MeTA!
MeTA is a modern C++ data sciences toolkit featuring
-
text tokenization, including deep semantic features like parse trees
-
inverted and forward indexes with compression and various caching strategies
-
various ranking functions for the indexes
-
topic modeling algorithms
-
language modeling algorithms
-
clustering and similarity algorithms
-
classification algorithms
-
wrappers for liblinear and slda
Doxygen documentation can be found here.
Our current goal for MeTA is to publish in JMLR's Machine Learning Open-Source Software.
-
This project requires a very well conforming C++11 compiler. Currently, clang is the de-facto compiler for use with this project
-
Additionally, you will need a conformant implementation of the C++11 standard library and ABI---currently libc++ and libc++abi are the best options for this. See your distribution's package manager for more information on installing these dependencies.
-
Windows users: YMMV. It is not currently supported, but things may work. You will likely need Visual Studio 2013 for the C++11 features.
-
This project makes use of several git submodules. To initialize these, run
git submodule init
git submodule update
-
Once the submodules are instantiated, go to deps/libsvm-modules and run
make
in the liblinear and libsvm directories if you plan on using thesvm_wrapper
class. -
To compile initially, run the following commands
mkdir build
cd build
# omit CXX=clang++ if you want to use your default compiler
CXX=clang++ cmake ../ -DCMAKE_BUILD_TYPE=Debug
make
- There are rules for clean, tidy, and doc. (Also, once you run the cmake command once, you should be able to just run make like usual as you're developing---it'll detect when the CMakeLists.txt file has changed and rebuild Makefiles if it needs to.)