Tools and library for lexical processing of the Slovak language.
- Word and sentence boundary identification
- Number normalization
- Abbreviation identification
- Number rewritting to spoken form
- Fast string replacement
- Includes Forking TCP server based on LibUV
cmake C++ compiler lemonstring library Ragel (optional) libuv (optional)
- Install lemonstring
- mkdir bin
- cd bin
- cmake ../src
- make
- sudo make install
cat slovaktext.txt | slovaktokenizer > tokenizedtext.txt
GPLv3 for other uses contact the author at daniel.hladek@tuke.sk