Skip to content
/ lda-ruby Public
forked from ealdent/lda-ruby

A Ruby wrapper for Latent Dirichlet Allocation (LDA).

License

Notifications You must be signed in to change notification settings

taf2/lda-ruby

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latent Dirichlet Allocation – Ruby Wrapper

What is LDA-Ruby?

This wrapper is based on C-code by David M. Blei. In a nutshell, it can be used to automatically cluster documents into topics. The number of topics are chosen beforehand and the topics found are usually fairly intuitive. Details of the implementation can be found in the paper by Blei, Ng, and Jordan.

The original C code relied on files for the input and output. We felt it was necessary to depart from that model and use Ruby objects for these steps instead. The only file necessary will be the data file (in a format similar to that used by SVMlight). Optionally you may need a vocabulary file to be able to extract the words belonging to topics.

Example usage:

require 'lda'
lda = Lda::Lda.new      # create an Lda object for training
corpus = Lda::Corpus.new("data/data_file.dat")
lda.corpus = corpus
lda.em("random")        # run EM algorithm using random starting points
lda.load_vocabulary("data/vocab.txt")
lda.print_topics(20)    # print the topic 20 words per topic

See the rdocs for further information. You can also check out the mailing list for this project if you have any questions or mail lda-ruby@groups.google.com [email link]. If you have general questions about Latent Dirichlet Allocation, I urge you to use the topic models mailing list, since the people who monitor that are very knowledgeable.

Resources

References

Blei, David M., Ng, Andrew Y., and Jordan, Michael I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. 3 (Mar. 2003), 993-1022 [pdf].

About

A Ruby wrapper for Latent Dirichlet Allocation (LDA).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 80.9%
  • Ruby 19.1%