Babel_SGD

Data Analysis (ML) on Babel data set

Some code to do various ML techniques on a phone based data set (speech recognition) (360 feature data set).

Techniques shown include:

SGD with two sets of beta parameters, one for mapping down to the 42 phones, and the other for mapping to the 1000 states.
A SGD tree where we first map to one of the 42 phones, and then map to one of the states associated with that specific phone. This was done by running SGD independently for each of the 42 phones, as well as at a top level to map from the features to the 42 phones, and then combining these 42 + 1 sets of parameters to generate log likelihoods.
Nearest neighbours for the PCA'ed data. Generally whilst the LSH approach is fast, there is quite a big tradeoff between speed, and accuracy in results.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
log		log
README.md		README.md
cvKnn		cvKnn
lsh-run.cpp		lsh-run.cpp
main.cpp		main.cpp
main.out		main.out
main1.out		main1.out
main_bound.cpp		main_bound.cpp
main_bound.out		main_bound.out
main_cov.cpp		main_cov.cpp
main_cov_eff.cpp		main_cov_eff.cpp
main_covnl.cpp		main_covnl.cpp
main_covnl_eff.cpp		main_covnl_eff.cpp
main_error.cpp		main_error.cpp
main_ion.cpp		main_ion.cpp
main_likelihood.cpp		main_likelihood.cpp
main_likelihood_sub.cpp		main_likelihood_sub.cpp
main_likelihood_sub.cpp.old		main_likelihood_sub.cpp.old
main_pca.cpp		main_pca.cpp
main_pca.out		main_pca.out
main_prior.cpp		main_prior.cpp
main_reformat.cpp		main_reformat.cpp
main_sgd.cpp		main_sgd.cpp
main_sgd_42.cpp		main_sgd_42.cpp
main_sgd_pca.cpp		main_sgd_pca.cpp
main_sgd_pcanl.cpp		main_sgd_pcanl.cpp
main_sgd_plus.cpp		main_sgd_plus.cpp
main_sgd_sub.cpp		main_sgd_sub.cpp
main_working.cpp		main_working.cpp
maint.out		maint.out
prior copy.cpp		prior copy.cpp
prior.out		prior.out
sgd_acc_noreg.cpp		sgd_acc_noreg.cpp
sgd_fast_noreg.cpp		sgd_fast_noreg.cpp

adamdossa/Babel_SGD