R package data.table
extends data.frame
.
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by reference by group using no copies at all, cells can contain vectors, chained queries and a fast file reader (fread
). Offers a natural and flexible syntax, for faster development.
The main benefit of data.table
is its syntax - ability to combine where
, select|update
and by
into one query without having to string together a sequence of isolated function calls. Infact, speed is only secondary.
- Programming time (easier to write, read, debug and maintain).
- Compute time.
* DT[X] # fast join for large data, DT and X are data.tables.
* DT[, sum(b*c), by=a] # fast aggregation, a, b and c are column names
* DT[i, b := 3.14] # add new column (or modify existing column) by reference
* DT[, p := x/sum(x), by=grp] # fast sub-assignment (to column b) by reference.
* fread('big.csv') # is 3+ times faster than read.csv(, colClasses, nrow, etc).
All even numbered releases (ex: 1.9.0, 1.9.2 etc.) are stable versions available on CRAN. Similarly all odd numbered releases are development versions.
The current stable release is v1.9.2 on CRAN, released 27th Feb 2014. To install, open an R session and type:
install.packages("data.table")
The current development version is 1.9.3. If you're interested in staying up-to-date, you can do so by installing the latest commit using devtools
as follows:
devtools:::install_github("datatable", "Rdatatable")
To be updated...
Stackoverflow's data.table tag is an excellent place to get started. You can search if your question has already been answered, and if not, you can post a question with a nice reproducible example there. At the time of writing, 93.7% of the questions under the data.table tag have answers.
Another place to ask questions is the data.table mailing list. It requires a subscription, which is fairly straightforward. Once you've subscribed to the mailing list, you can start posting by sending an email to datatable-help @ lists.r-forge.r-project.org.
You can browse the questions asked previously on the mailing list on Nabble, Gmane, HTML archive or RSS feed.