Skip to content

tr11/shuffle

Repository files navigation

shuffle

image

Shuffle if a small C++ library with Python bindings designed to shuffle an array in a similar manner to what the Shuffle filter in the HDF5 file format does. This is particularly useful when compressing arrays of numerical values of similar magnitudes.

For example, in Python:

import shuffle
import numpy
import zlib
import gzip

vec = numpy.random.rand(1024*1024)      # 8 MB in a 64bit machine
print('Vector size:                   %d' % len(bytes(vec.data)))
print('Compressed size:               %d' % len(gzip.compress(vec.data)))
print('Compressed size with shuffle:  %d' % len(gzip.compress(shuffle.shuffle_ndarray(vec).data)))
print('Compressed difference:         %0.2f%% smaller' % (100*(1 - len(gzip.compress(shuffle.shuffle_ndarray(vec).data)) / len(gzip.compress(vec.data)))))

gives results along the lines of:

Vector size:                  8388608
Compressed size:              7910645
Compressed size with shuffle: 7355671
Compressed difference:        7.02% smaller

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published