Entropy

C++ program calculating the entropy of a language based on a corpus of texts.

Usages :

The usage template is as follow : ./entropy [command] [store_filename] [additional options]

More precisely, there are three commands available :

assimilate takes a text file as imput and computes the conditionnal probability of appearance of each character/word(*) assuming it follows a nth order stationary markov model. All data is then stored in a file. The exact usage is
- ./entropy assimilate [store_filename] [input_filename] [markov_order]
calculate computes the entropy of the markov model (see explanation for more details). Usage :
- ./entropy calculate [store_filename] [markov_order]
generate is undoubtedly the most fancy feature. It randomly generates text based on the probabilities computed with the assimilate command. Usage :
- ./entropy generate [store_filename] [output_filename] [text_length] [markov_order]

Explanation

We assume the frequency of each letter follows a stationary markov model of order k. This means that the probability equals for all n. More precisely, it equals with p being a function independent from n (the markov chain is supposed stationnary).

Python scripts

I added a few python scripts to show how the code can be used to calulate the entropy of, say, french.

First you should download the raw material using :

$ python Scripts/french_texts_dl.py $ python Scripts/code_civil_dl.py

(You may want to launch the second script in the Text folder created by the first one to regroup all files)

Then launch the program on all files (don't forget to adjust the markov order in the python file)

$ python Scripts/assimilate_folder.py

Of course you can change any parameter you want in the python files ;)

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Scripts		Scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scripts

Scripts

src

src

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

Repository files navigation

Entropy

Usages :

Explanation

Python scripts

About

Releases

Packages

Languages

License

pmichel31415/Entropy

Folders and files

Latest commit

History

Repository files navigation

Entropy

Usages :

Explanation

Python scripts

About

Resources

License

Stars

Watchers

Forks

Languages