Skip to content

mpetri/rlz-store

Repository files navigation

Install Instructions

  1. install libboost-all-dev and liblzma-dev on Ubuntu or equivalent
  2. git clone git@github.com:/mpetri/rlz-store.git
  3. cd rlz-store
  4. git submodule update --init --recursive
  5. mkdir build
  6. cd build
  7. cmake ..
  8. make -j

Creating a collection and index

Create a collection from a file english.200MB

./create-collection.x -c ../collections/english200 -i english.200MB

Output:

Processing english.200MB
[======================================================================] 100 %
Replaced zeros = 0
Replaced ones = 0
Copied 209715200 bytes.
Writing document order file.

Create the index using a 8 MiB dictionary and 5 threads

./rlz-create.x -c ../collections/english200 -s 8 -t 5

Output:

Parsing command line arguments
Parsing collection directory ../collections/english.200MB/
Found input text with size 200 MiB
Create dictionary (dict_uniform_sample_budget-1024)
	Create dictionary with budget 8 MiB
	Dictionary samples = 8192
	Sample steps = 25600
	Writing dictionary.
	dict_uniform_sample_budget-1024 Total time = 0.227 sec
Dictionary hash before pruning '728721042'
Prune dictionary with dict_prune_none
	No dictionary pruning performed.
Dictionary after pruning '728721042'
Create/Load dictionary index
	Construct and store dictionary index
	Store reverse dictionary
	Construct index
Factorize text - 200 MiB (5 threads) - (factorizor-65536-l0-factor_select_first-factor_coder_blocked-t=3-zlib-9-zlib-9-zlib-9)
   (1) FACTORS = 972902 BLOCKS = 160 F/B = 6080.64 Total F/B = 6080.64 SPEED = 2.00441 MiB/s (25%)
   (4) FACTORS = 970562 BLOCKS = 160 F/B = 6066.01 Total F/B = 6066.01 SPEED = 1.98216 MiB/s (25%)
   (0) FACTORS = 1016864 BLOCKS = 160 F/B = 6355.4 Total F/B = 6355.4 SPEED = 1.95274 MiB/s (25%)
   (3) FACTORS = 1040484 BLOCKS = 160 F/B = 6503.02 Total F/B = 6503.02 SPEED = 1.89251 MiB/s (25%)
   (2) FACTORS = 1074809 BLOCKS = 160 F/B = 6717.56 Total F/B = 6717.56 SPEED = 1.79921 MiB/s (25%)
   (0) FACTORS = 963648 BLOCKS = 320 F/B = 6022.8 Total F/B = 6189.1 SPEED = 2.07555 MiB/s (50%)
   (4) FACTORS = 989139 BLOCKS = 320 F/B = 6182.12 Total F/B = 6124.07 SPEED = 2.0012 MiB/s (50%)
   (1) FACTORS = 1019185 BLOCKS = 320 F/B = 6369.91 Total F/B = 6225.27 SPEED = 1.90476 MiB/s (50%)
   (2) FACTORS = 981985 BLOCKS = 320 F/B = 6137.41 Total F/B = 6427.48 SPEED = 2.0145 MiB/s (50%)
   (3) FACTORS = 1030735 BLOCKS = 320 F/B = 6442.09 Total F/B = 6472.56 SPEED = 1.87371 MiB/s (50%)
   (4) FACTORS = 987986 BLOCKS = 480 F/B = 6174.91 Total F/B = 6141.01 SPEED = 1.92382 MiB/s (75%)
   (0) FACTORS = 1045156 BLOCKS = 480 F/B = 6532.23 Total F/B = 6303.48 SPEED = 1.86846 MiB/s (75%)
   (1) FACTORS = 1045955 BLOCKS = 480 F/B = 6537.22 Total F/B = 6329.25 SPEED = 1.88324 MiB/s (75%)
   (2) FACTORS = 998092 BLOCKS = 480 F/B = 6238.07 Total F/B = 6364.35 SPEED = 1.94024 MiB/s (75%)
   (3) FACTORS = 1006081 BLOCKS = 480 F/B = 6288.01 Total F/B = 6411.04 SPEED = 1.94628 MiB/s (75%)
   (0) FACTORS = 1016034 BLOCKS = 640 F/B = 6350.21 Total F/B = 6315.16 SPEED = 1.95198 MiB/s (100%)
   (4) FACTORS = 1022341 BLOCKS = 640 F/B = 6389.63 Total F/B = 6203.17 SPEED = 1.90042 MiB/s (100%)
   (1) FACTORS = 1057448 BLOCKS = 640 F/B = 6609.05 Total F/B = 6399.2 SPEED = 1.90223 MiB/s (100%)
   (3) FACTORS = 1013531 BLOCKS = 640 F/B = 6334.57 Total F/B = 6391.92 SPEED = 1.90767 MiB/s (100%)
   (2) FACTORS = 1069202 BLOCKS = 640 F/B = 6682.51 Total F/B = 6443.89 SPEED = 1.87512 MiB/s (100%)
Factorization time = 21.013 sec
Factorization speed = 9.51792 MB/s
Factorize done. (factorizor-65536-l0-factor_select_first-factor_coder_blocked-t=3-zlib-9-zlib-9-zlib-9)
Output factorization statistics
=====================================================================
text size          = 209715200 bytes (200 MB)
encoding size      = 90267296 bytes (86.0856 MB)
compression ratio  = 43.0428 %
space savings      = 56.9572 %
number of factors  = 20322139
bits per factor    = 35.5346
number of blocks   = 3200
avg factors/block  = 6350.67
=====================================================================
Merge factorized text blocks
	Rename offset 0 block to output files
	Copy block 1 factors/offsets/counts
	Copy block 2 factors/offsets/counts
	Copy block 3 factors/offsets/counts
	Copy block 4 factors/offsets/counts
	Delete temporary files
Create block map (block_map_uncompressed)
	Load block offsets from file
RLZ construction complete. time = 22 sec
Loading RLZ store into memory
	Load block map
	Load dictionary
	Determine text size
RLZ store ready
Verify that factorization is correct.
SUCCESS! Text sucessfully recovered.

About

High performance Relative LZ store

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published