Skip to content

dvcs-autosync is a project to create an open source replacement for Dropbox/Wuala/Box.net/etc. based on distributed version control systems (DVCS). It offers nearly instantaneous mutual updates when a file is added or changes on one side but with the added benefit of (local, distributed) versioning and does not rely on a centralized service prov…

Dieterbe/dvcs-autosync

 
 

Repository files navigation

What does it do?
------------------------
Automatically keep DVCS repositories in sync whenever changes happen by automatically committing and pushing/pulling.

How does it do it?
------------------------
0. Set up desktop notifications (for these nice bubble-style popups when anything happens) and log into a Jabber/XMPP account specified in the config file.

1. Monitor a specific path (and its subdirectories) for changes with inotify.
At the moment, only one path is supported and multiple script instances have to be run for multiple disjoint paths. This path is assumed to be (part of) a repository. Currently tested with git, but should support most DVCS (the config file allows to specify the DVCS commands called when interacting with it).
Optionally, an [ignores] file is read with one exclusion pattern per line and files matching any of the patterns are ignored. This will typically be the .gitignore file already existing the git tree.

2. When changes are detected, check them into the repository that is being monitored (or delete, or move, etc.).
It automatically ignores any patterns listed in .gitignore and the config file allows to exclude other directories (e.g. repositories within the main repository).

3. Wait for a configurable time. When nothing else changes in between, commit.

4. Wait a few seconds longer (again configurable) and, if nothing else is commited, initiate a push.

5. After the push has finished, send an XMPP message to self (that is, to all clients logged in with the same account) to notify other accounts of the push.

Furthermore:

 * At any time in between, when receiving a proper XMPP message, pull from the repository.
 * A PID file is written to [pidfile] for killing the daemon later on.

Dependencies
-----------------------

 * Python >= 2.6
 * patched JabberBot (>= 0.9) (included in this repository)
   the patch allows reception of messages from its own XMPP id (patch already pushed upstream and will be included in next upstream JabberBot version)
 * xmpppy (http://xmpppy.sourceforge.net/)

Linux:

 * Linux kernel with inotify enabled
 * Pyinotify (better performance with version >= 0.9)

Mac OS X:

 * MacFSEvents (https://github.com/malthe/macfsevents/)
 * Python 2.7 (included in Lion)

Recommended:

 * Pynotify (for desktop notifications on linux)
 * Growl python binding (for desktop notifications on Mac OS X, included in this repository)

Installation
------------------------

(on Mac OS X, see INSTALL_MAC file for detailled instructions)

[PREFERRED] PACKAGE INSTALLATION
 * Either install the Debian package (generated by dpkg-buildpackage from the source tree) or use the arch package
 * or (on other systems) simply execute (to install to /usr/local/bin and /usr/share/dvcs-autosync):
   1. python setup.py build
   2. sudo python setup.by install

MANUAL INSTALLATION
 * Copy dvcs-autosync to a location in $PATH and jabberbot.py to a location in $PYTHONPATH
 * (Quick and dirty: keep both in the same directory and run ./dvcs-autosync later)

Create the repository and do initial push
----------------------

 [on the server used to host the central git repository] 
 $ git init --bare autosync.git

 [on the first host using that repository]
 $ cd ~ && git clone <server>:autosync.git autosync
 $ cd autosync
 $ [ populate initial contents and add to index ]
 $ git commit -m 'Initial commit'
 $ git push origin master

 [on each additional host]
 $ git clone <server>:autosync.git autosync

Note that these are only examples. You can use arbitrary directories and repositories.

Configuration
-----------------------

 * Create an XMPP/Jabber account (for example on jabber.org, or set up your own server)
 * Copy the included .autosync-example config file to ~/.autosync (or wherever you want)
 * Change it to your needs

Running the program
-----------------------

    autosync.py [config file] # config defaults to ~/.autosync

Potential pitfalls
----------------------
 * for Jabber login, there probably needs to be a
  _xmpp-client._tcp.<domain name of jabber account> SRV entry in DNS so that
  the Python XMPP module can look up the server and port to use. Without such
  an SRV entry, Jabber login may fail even if the account details are correct
  and the server is reachable.

 * when there are errors
  ERROR:pyinotify:add_watch: cannot watch ...
  on startup, it will either be an invalid file or directory name which can
  not be watched for changes, or the number of files a user may watch
  concurrently using the kernel inotify interface has reached the set limit.
  In the latter case, the limit can be changed by modifying the sysctl variable
  fs.inotify.max_user_watches and increasing it to a sufficient value
  (e.g. 500000).

 * Note that, when keeping changing media files (or other large binaries) in 
   an automatically synchronized repository, it may grow quickly. In the
   current version, dvcs-autosync will never delete any history and keep all
   previous versions. This is intentional for documents and text files, but may
   be problematic for large binaries. I will try to address this problem in 
   future versions, e.g. by integrating with git-annex (see TODO).

Thoughts that should be considered at some point but have not yet been implemented:
------------------------
- The XMPP push message already contains a parameter, namely the repository the push went to. Add another parameter to specify the repository in which the change happened so that others can try to pull directly from there, in case it is quicker. The main use case for this optimization is my standard one: the laptop sitting next to the desktop and both of them syncing each other's home directories. Going via the main, hosted server is quite a bit more inefficient than pulling via 1GB/s LAN....

- Pulls and pushes can and should be optimized. At the moment, I take a conservative locking approach whenever a conflict may occur and performance is reasonable on my main work tree with ca. 16GB (cloned GIT repo), but not stellar. Specifically, actually implement the "optimized" pull lock strategy already described in the example config file.

- Implement another option for synchronization besides XMPP (idea: a simple broadcast reflector on a single TCP port that could even run on e.g. OpenWRT, or re-use whatever the Sparkleshare server does).

- Automatically adding some context to each commit message besides the automatic date/time would be useful for finding out why a change happened. Nepomuk anybody (just kidding, maybe, for now...)?

- Allow to specify commit messages via popups. When ignored, use default commit message.


Disclaimer
------------------------
This is my first Python program that is longer than 100 lines. Please be easy on me with the patches, complaints and "what did you think, doing it this way?" messages. I have tried to comment wherever I found it necessary for my own understanding, but this is neither the best structured nor the most elegant program I ever wrote. Any hints for improving it are greatly welcome, and interoperability patches to work with Sparkleshare even more so. In the future, the two projects should definitely interoperate, which will come done to implementing each other's notification mechanism. My autosync Python script could then be used wherever headless operation might be required and/or Mono is not installed.
I have tested it between three systems and, in this version, it works reasonably well. However, there does seem to be the occasional kink when editors go crazy on temporary file creation, renaming, deleting originals, etc. These might be races, but I don't know for certain yet. Additional test cases are more then welcome. This script should be fairly safe to try, considering that the worst it will do is add a few hundred commits to your DVCS repo and push them to the configured default remote. But, after all, what is the point in using a DVCS if you can't roll back any changes made by you or a buggy script (yes, I did have to do that a number of times while developing the manual inotify event coalescing to cooperate better with git add/remove/mv actions).


Rene Mayrhofer <rene@mayrhofer.eu.org>

About

dvcs-autosync is a project to create an open source replacement for Dropbox/Wuala/Box.net/etc. based on distributed version control systems (DVCS). It offers nearly instantaneous mutual updates when a file is added or changes on one side but with the added benefit of (local, distributed) versioning and does not rely on a centralized service prov…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 83.1%
  • Objective-C 9.6%
  • C 6.8%
  • Shell 0.5%