For the faster and lateset version (supporting Modified Kneser-Ney and its extensions) please see:

Welcome to LM-SDSL

This is our implementation used in the following paper:

@inproceedings{shareghicompact,
  author={Shareghi, Ehsan and Petri, Matthias and Haffari, Gholamreza and Cohn, Trevor},
  title={Compact, Efficient and Unlimited Capacity: Language Modeling with Compressed Suffix Trees},
  booktitle={Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, September 19-21, 2015, Lisbon, Portugal},
  year={2015},
}

Compile instructions

Check out reprository
git submodule init
git submodule update
cd build
cmake ..

Usage

Create a collection:

./create-collection.x -i toyfile.txt -c ../collections/toy

Build index:

./build-index.x -c ../collections/toy/

Query index:

./query-index-knm.x -c ../collections/toy/ -p toyquery.txt -n 3

Single CST method

The default is the Dual CST. To call the faster and more space efficient version, Single CST method, pass -b:

./query-index-knm.x -c ../collections/toy/ -p toyquery.txt -n 3 -b

Running unit tests

rm -r ../collections/unittest/
./create-collection.x -i ../UnitTestData/data/training.data -c ../collections/unittest
./unit-test.x

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
CMakeModules		CMakeModules
UnitTestData		UnitTestData
build		build
external		external
include		include
src		src
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMakeModules

CMakeModules

UnitTestData

UnitTestData

build

build

external

external

include

include

src

src

.gitmodules

.gitmodules

CMakeLists.txt

CMakeLists.txt

README.md

README.md

Repository files navigation

For the faster and lateset version (supporting Modified Kneser-Ney and its extensions) please see:

Welcome to LM-SDSL

Compile instructions

Usage

Single CST method

Running unit tests

Note: This version requires the training and test data to be numberized.

About

Releases

Packages

Languages

eehsan/lm-sdsl

Folders and files

Latest commit

History

Repository files navigation

For the faster and lateset version (supporting Modified Kneser-Ney and its extensions) please see:

Welcome to LM-SDSL

Compile instructions

Usage

Single CST method

Running unit tests

Note: This version requires the training and test data to be numberized.

About

Resources

Stars

Watchers

Forks

Languages