The COOPY toolbox

Diffing, patching, merging, and revision-control for spreadsheets and databases. Focused on keeping data in sync across different technologies (e.g. a MySQL table and an Excel spreedsheet).

For a stripped-down js version see http://paulfitz.github.com/coopyhx/
See BUILD.md for information on building the programs.
- Summary: CMake
See SERVE.txt for server-side information.
- Summary: fossil
See COPYING.txt for copyright and license information.
- Summary: GPL. Relicensing of library core planned for version 1.0.

Example uses

Enumerating differences between any pairwise combination of CSV files, database tables, or spreadsheets.
Applying changes to a database or spreadsheet, without losing meta-data (formatting of spreadsheet, indexing/type information for database). Particularly useful for applying changes in an exports CSV file back to the original source.
Editing a MySQL/Sqlite database in gnumeric/openoffice/Excel/...
Distributed editing of a spreadsheet/database using a DVCS. Benefits: revision history, offline editing in tool of choice, self-hosting possible.

The main programs

ssdiff - generate diffs for spreadsheets and databases.
sspatch - apply patches to spreadsheets and databases.
ssmerge - merge tables with a common ancestor.
ssfossil - the fossil DVCS, modified to use tabular diffs rather than line-based diffs.
coopy - a graphical interface to ssfossil.

Supported data formats

CSV (comma separated values)
SSV (semicolon separated values)
TSV (tab separated values)
Excel formats (via gnumeric's libspreadsheet)
Other spreadsheet formats (via gnumeric's libspreadsheet)
Sqlite
MySQL
Microsoft Access format (via mdbtools - READ ONLY)
A JSON representation of tables.
A custom "CSVS" format that is a minimal extension of CSV to handle multiple sheets in a single file, allow for unambiguous header rows, and have a clear representation of NULL.

Supported diff formats

TDIFF (format developed with Joe Panico of diffkit.org)
DTBL (csv-compatible format, COOPY specific, may be dropped)
SQL (Sqlite flavor)

Features

By default, when comparing tables, no initial assumption is made about schema similarity. Column names are not required to exist, or to be preserved between tables. The number and order of columns may also differ.
If schema changes are not expected, COOPY can be directed to use certain columns as a trusted identity for rows (a key).
Respects row order for table representations for which row order is meaningful (spreadsheets, csv).

Algorithm

The core of the COOPY toolbox is a 3-way comparision between an ancestor and two descendents. First, rows are compared using bags of substrings drawn from across all columns. Once corresponding rows are known, columns are compared, again using bags of substrings. Row and column assignments are optimized and ordered using a Viterbi lattice. Once the pairwise relationships between each descendent and its ancestor are known, differences are computed, and a good merged ordering is determined (again using the Viterbi algorithm).

Status

COOPY targets a stable, fully-documented release at version 1.0. At the time of writing, the version number is just beyond 0.5. It is about half way there.

Name		Name	Last commit message	Last commit date
Latest commit History 791 Commits
bindings		bindings
conf		conf
doc		doc
packaging		packaging
rb_coopy		rb_coopy
scripts		scripts
src		src
tests		tests
BUILD.md		BUILD.md
CMakeLists.txt		CMakeLists.txt
COPYING.txt		COPYING.txt
ChangeLog		ChangeLog
CoopyGuide.pdf		CoopyGuide.pdf
GPL.txt		GPL.txt
README.md		README.md
SERVE.txt		SERVE.txt
autogen.sh		autogen.sh

License

gijs/coopy

Folders and files

Latest commit

History

Repository files navigation

The COOPY toolbox

Example uses

The main programs

Supported data formats

Supported diff formats

Features

Algorithm

Status

About

Resources

License

Stars

Watchers

Forks