This is a prototype of a deduplication system, named destor.
- Container-based storage;
- Chunk-level pipeline;
- Fixed-sized chunking and Content-Defined Chunking (CDC);
- A variety of fingerprint indexes, including DDFS, Extreme Binning, Sparse Index, SiLo, etc.
- A variety of rewriting algorithms, including CFL, CBR, CAP, HAR etc.
- A variety of restore algorithms, including LRU, optimal replacement algorithm, rolling forward assembly.
-
The design of the fingerprint index:
a) Avoiding the Disk Bottleneck in the Data Domain Deduplication File System, @FAST'08.
b) Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality, @FAST'09.
c) Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup, @MASCOTS'09.
d) SiLo: A Similarity-Locality based Near-Exact Deduplicatin Scheme with Low RAM Overhead and High Throughput, @USENIX ATC'11.
e) Building a High-Performance Deduplication System, @USENIX ATC'11.
f) Block Locality Caching for Data Deduplication, @SYSTOR'13.
g) The design of a similarity based deduplication system, @SYSTOR'09.
-
The fragmentation:
a) Chunk Fragmentation Level: An Effective Indicator for Read Performance Degradation in Deduplication Storage, @HPCC'11.
b) Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets, @MASCOTS'12.
c) Reducing impact of data fragmentation caused by in-line deduplication, @SYSTOR'12.
d) Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication, @FAST'13.
e) Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information, @USENIX ATC'14.
-
The restore cache
a) A Study of Replacement Algorithms for a Virtual-Storage Computer, @IBM Systems Journal'1966.
b) Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication, @FAST'13.
c) Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information, @USENIX ATC'14.
-
Garbage collection (available in first release (Mouse), but to be continue in new version):
a) Building a High-Performance Deduplication System, @USENIX ATC'11.
b) Cumulus: Filesystem Backup to the Cloud, @FAST'09.
c) Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information, @USENIX ATC'14.
- Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information, @USENIX ATC'14.
Linux 64bit.
- libssl-dev is required to calculate sha-1 digest;
- GLib 2.32 or later version (The header files, including glib.h, are required to be moved to a searchable path, such as "/usr/local/include");
- Makefile is automatically generated by GNU autoconf and automake.
If all dependencies are installed, compiling destor is straightforward:
./configure
make
make install
To uninstall destor, run
make uninstall
If compile and install are successful, the executable file, destor, should have been moved to /usr/local/bin by default. You can create a config file, destor.config, in where you run destor. A sample destor.config is in the project directory. Run rebuild script to clean data.
destor can run as follows:
-
start a backup task,
destor /path/to/data -p"a line as in config file"
-
start a restore task,
destor -r /path/to/restore -p"a line as in config file"
-
start a delete job (disabled in the latest version),
destor -d
-
lookup statistics of system,
destor -s
-
help
destor -h
-
make a trace
destor -t /path/to/data
A sample configuration is shown in destor.conf
- If the running destor is crashed artificially or unexpectedly, data consistency is not guaranted and you'd better run rebuild script.
- Does NOT support concurrent backup/restore.
- If working path in destor.config is modified, the rebuild script must be modified too.
Min Fu,
Email : fumin@hust.edu.cn
Blog : fumin.hustbackup.cn