-
Notifications
You must be signed in to change notification settings - Fork 1
hantaniold/parallel-gzip
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
For the curious, gzip-src contains our initial modifications using OpenMP. The real meat is in pgzip-src, containing a parallel implementation of gzip using p-threads. The below describes the contents of the pgzip-src folder. To test out the suite, navigate to parallel-gzip/pgzip-src, and run ./test_suite.sh . This will get you all the binaries and test it on a bunch of files. The output times and othe rinfo will go to tso.txt. The test suite only does filesizes of 20, 100 and 250 MB, though if you wish you can also do them for 500 MB, 750, 1000, 2000 (or any size you desire) by commenting out/adding the desired lines in test_suite.sh . Last modified: 12/2/11. ---------------------------------------------------------------------------- This is a set of various implementations, both sequential and parallel of the gzip compression program. There are four folders: gzip, pigz, quickzip, and pgzip. The folder gzip is the standard implementation of gzip found at www.gzip.org. There is a script called init_script.sh which can be run and will download the file, build it, and copy the executable to the directory. The folder pigz is the standard implementation of parallel gzip found at www.zlib.net/pigz/. There is a script called init_script.sh which can be run and will download the file, build it, and copy the executable to the directory. These are the standard two implementations most people use. The next folder quickzip, implements pseudo gzip parallelism, by simply splitting a file into chunks (unix split command) and then compresses each in a background process, we could join them back by removing the last two bytes in each chunk and then cat the compressed files and add a crc byte and a length byte. Furthermore we would need to remove the end block codes of each seperate chunk except the last one to comply with gzip file format. However we do not join in this manner because we wanted a clean implemenation and did not want to modify end chunk codes, which would need to be done in the gzip src (see LINE 653 in deflate.c source from gzip). That is why we call it pseudo gzip, since it is technically not RFC Gzip, but does the same job. The next folder pgzip, implements true gzip parallelism. It follows a similar approach as pigz, but absolutely no source code is borrowed or taken as inspiration from that project. It is developed purely from the gzip source files, and proceeds to create a threadpool and then send chunks of data to free threads, and then takes finished work and flushes to a file appropriately. For the time being it only supports compression. ---------------------------------------------------------------------------- Build Details: gzip: simply go to directory gzip and run init_script.sh pigz: simply go to directory pigz and run init_script.sh pgzip: simply go to directory pgzip and run make quickzip: depends on having gzip and python, but otherwise you're set ---------------------------------------------------------------------------- Contributors: https://github.com/a-wild-tigger https://github.com/SeanHogan https://github.com/norseboar ----------------------------------------------------------------------------
About
parallelized gzip implementation using pthreads
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published