Skip to content

R package data.table extends data.frame. Fast aggregation of large data, fast ordered joins, fast add/modify/delete of columns by reference by group using no copies at all, cells can contain vectors, chained queries and a fast file reader (fread). Offers a natural and flexible syntax, for faster development.

kaybenleroll/datatable

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

data.table

Build Status

R package data.table extends data.frame.

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by reference by group using no copies at all, cells can contain vectors, chained queries and a fast file reader (fread). Offers a natural and flexible syntax, for faster development.

The main benefit of data.table is its syntax - ability to combine where, select|update and by into one query without having to string together a sequence of isolated function calls. Infact, speed is only secondary.

data.table builds on base-R functionality to reduce two types of time:
  1. Programming time (easier to write, read, debug and maintain).
  2. Compute time.

Operations using data.table:

* DT[X]                          # fast join for large data, DT and X are data.tables.
* DT[, sum(b*c), by=a]           # fast aggregation, a, b and c are column names
* DT[i, b := 3.14]               # add new column (or modify existing column) by reference
* DT[, p := x/sum(x), by=grp]    # fast sub-assignment (to column b) by reference.
* fread('big.csv')               # is 3+ times faster than read.csv(, colClasses, nrow, etc).

Installation

All even numbered releases (ex: 1.9.0, 1.9.2 etc.) are stable versions available on CRAN. Similarly all odd numbered releases are development versions.

Stable version

The current stable release is v1.9.2 on CRAN, released 27th Feb 2014. To install, open an R session and type:

install.packages("data.table")

Development version

The current development version is 1.9.3. If you're interested in staying up-to-date, you can do so by installing the latest commit using devtools as follows:

devtools:::install_github("datatable", "Rdatatable")

How to get started?

To be updated...

Getting help

Stackoverflow

Stackoverflow's data.table tag is an excellent place to get started. You can search if your question has already been answered, and if not, you can post a question with a nice reproducible example there. At the time of writing, 93.7% of the questions under the data.table tag have answers.

Mailing list

Another place to ask questions is the data.table mailing list. It requires a subscription, which is fairly straightforward. Once you've subscribed to the mailing list, you can start posting by sending an email to datatable-help @ lists.r-forge.r-project.org.

You can browse the questions asked previously on the mailing list on Nabble, Gmane, HTML archive or RSS feed.

About

R package data.table extends data.frame. Fast aggregation of large data, fast ordered joins, fast add/modify/delete of columns by reference by group using no copies at all, cells can contain vectors, chained queries and a fast file reader (fread). Offers a natural and flexible syntax, for faster development.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published