Skip to content

KoreanFoodComics/meta

 
 

Repository files navigation

MeTA: ModErn Text Analysis

Please visit our web page for information about MeTA!

Overview

MeTA is a modern C++ data sciences toolkit featuring

  • text tokenization, including deep semantic features like parse trees

  • inverted and forward indexes with compression and various caching strategies

  • various ranking functions for the indexes

  • topic modeling algorithms

  • language modeling algorithms

  • clustering and similarity algorithms

  • classification algorithms

  • wrappers for liblinear and slda

Doxygen documentation can be found here.

Our current goal for MeTA is to publish in JMLR's Machine Learning Open-Source Software.

Build Status (by branch)

  • master: Build Status
  • develop: Build Status

Project setup

  • This project requires a very well conforming C++11 compiler. Currently, clang is the de-facto compiler for use with this project

  • Additionally, you will need a conformant implementation of the C++11 standard library and ABI---currently libc++ and libc++abi are the best options for this. See your distribution's package manager for more information on installing these dependencies.

  • Windows users: YMMV. It is not currently supported, but things may work. You will likely need Visual Studio 2013 for the C++11 features.

  • This project makes use of several git submodules. To initialize these, run

git submodule init
git submodule update
  • Once the submodules are instantiated, go to deps/libsvm-modules and run make in the liblinear and libsvm directories if you plan on using the svm_wrapper class.

  • To compile initially, run the following commands

mkdir build
cd build
# omit CXX=clang++ if you want to use your default compiler
CXX=clang++ cmake ../ -DCMAKE_BUILD_TYPE=Debug
make
  • There are rules for clean, tidy, and doc. (Also, once you run the cmake command once, you should be able to just run make like usual as you're developing---it'll detect when the CMakeLists.txt file has changed and rebuild Makefiles if it needs to.)

About

A Modern C++ Data Sciences Toolkit

Resources

License

MIT, NCSA licenses found

Licenses found

MIT
LICENSE.mit
NCSA
LICENSE.ncsa

Stars

Watchers

Forks

Packages

No packages published