BigOptim – Large Scale Finite Sums Cost functions Optimization for R

Description

BigOptim is an R package that implements the Stochastic Average Gradient(SAG)[1] optimization method. For strongly convex problems, SAG achieves batch gradient descent convergence rates while keeping the iteration complexity of stochastic gradient descent. This allows for efficient training of machine learning algorithms with convex cost functions.

Setup

install.packages("devtools")
devtools::install_github("hadley/devtools")  ## Optional
devtools::install_github("IshmaelBelghazi/bigoptim")

Example: Fit with Linesearch

## Loading Data set
data(covtype.libsvm)
## Normalizing Columns and adding intercept
X <- cbind(rep(1, NROW(covtype.libsvm$X)), scale(covtype.libsvm$X))
y <- covtype.libsvm$y
y[y == 2] <- -1
## Setting seed
#set.seed(0)
## Setting up problem
maxiter <- NROW(X) * 10  ## 10 passes throught the dataset
lambda <- 1/NROW(X) 
sag_ls_fit <- sag_fit(X=X, y=y, lambda=lambda,
                      maxiter=maxiter, 
                      tol=1e-04, 
                      family="binomial", 
                      fit_alg="linesearch",
                      standardize=FALSE)
## Getting weights
weights <- coef(sag_ls_fit)
## Getting cost
cost <- get_cost(sag_ls_fit)

Example: Demo – Monitoring gradient norm

demo("monitoring_training")

Runtime comparison

Ran on intel i7 4710HQ 16G with intel MKL and compilers.

demo("run_times")

Dense dataset: Logistic regression on covertype

Logistic Regression on Covertype – 581012 sample points, 55 variables

	constant	linesearch	adaptive	glmnet
Cost at optimum	0.513603	0.513497	0.513676	0.513693
Gradient L2 norm at optimum	0.001361	0.001120	0.007713	0.001806
Approximate gradient L2 norm at optimum	0.001794	0.000146	0.000214	NA
Time(seconds)	1.930	2.392	8.057	8.749

Sparse dataset: Logistic regression on rcv1_train

Logistic Regression on RCV1_train – 20242 sample points, 47237 variables

	constant	linesearch	adaptive	glmnet
Cost at optimum	0.046339	0.046339	0.046339	0.046342
Gradient L2 norm at optimum	3.892572e-07	4.858723e-07	6.668943e-10	7.592185e-06
Approximate gradient L2 norm at optimum	3.318267e-07	4.800463e-07	2.647663e-10	NA
Time(seconds)	0.814	0.872	1.368	4.372

References

[1] Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing Finite Sums with the Stochastic Average Gradient. arXiv:1309.2388 [cs, math, stat], September 2013. arXiv: 1309.2388. [ bib | http ]

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
MakeR @ 8160974		MakeR @ 8160974
R		R
data		data
demo		demo
man		man
misc		misc
src		src
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
ERRS.org		ERRS.org
LICENSE		LICENSE
Makefile		Makefile
NAMESPACE		NAMESPACE
README.org		README.org
bigoptim.Rproj		bigoptim.Rproj
test_custom		test_custom

License

IshmaelBelghazi/bigoptim

Folders and files

Latest commit

History

Repository files navigation

BigOptim – Large Scale Finite Sums Cost functions Optimization for R

Description

Setup

Example: Fit with Linesearch

Example: Demo – Monitoring gradient norm

Runtime comparison

Dense dataset: Logistic regression on covertype

Sparse dataset: Logistic regression on rcv1_train

References

About

Resources

License

Stars

Watchers

Forks

Languages