Skip to content

Jingyi-Shen/LossyCompressStudy

 
 

Repository files navigation

#A study of lossy compression schemes

##By Tao Lu

Scientific simulations generate large sums of floating-point data, which are hardly compressible using traditional data reduction methods such as deduplication and lossless compression. The emerging lossy floating-point data compression is promising to satisfy the data reduction demand of HPC systems. However, lossy compression has not been widely adopted in HPC production systems. We believe one main reason is that there lacks comprehensive evaluation of the benefits and pitfalls of conducting lossy compression on scientific data.

To expedite the landing of lossy compression on HPC production platforms, we conduct extensive evaluation on state-of-the-art lossy compression schemes, including ZFP, ISABELA, and SZ, using real and representative HPC datasets. Our eval- uation reveals the crushing influences of compressor design on compression performance. We also uncover the impact of high compression ratio on data analytics. Our evaluation provides domain scientists a good understanding of what to expect from lossy compression. Moreover, we propose compressor-aware sampling methods and build compression ratio estimation model to accurately estimate compression ratio. Considering compression consumes computation resources and not all datasets are highly compressible, the proposed compression ratio estimation mechanisms can help domain scientists or data management middleware make “compress or not” decisions.

  1. Run the code

./run.sh -c gzip -i testdouble_8_8_128.zfp

./run.sh -c fpc -i fpc/testdouble_8_8_128.dat -t file.fpc

./run.sh -c zfp -i fpc/testdouble_8_8_128.dat -t testdouble_8_8_128.zfp

./run.sh -c sz -i testdouble_8_8_128.dat

./run.sh -c isb -i fpc/testdouble_8_8_128.dat -t testdouble_8_8_128.isb

##Note, please cite our IPDPS'18 paper if you use any material in this repository.

Tao Lu, Qing Liu, Xubin He, Huizhang Luo, Eric Suchyta, Norbert Podhorszki, Scott Klasky, Matthew Wolf and Tong Liu, Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data, IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2018.

About

A study of lossy compression schemes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 42.4%
  • Shell 21.0%
  • Makefile 12.8%
  • C++ 11.6%
  • M4 7.0%
  • TeX 3.7%
  • Other 1.5%