GitHub

README

PolyBench/GPU 1.0: PolyBench Benchmarks on the GPU using CUDA, OpenCL, HMPP, and OpenACC. *

This benchmark suite is partially derived from the PolyBench benchmark suite developed by Louis-Noel Pouchet pouchet@cse.ohio-state.edu and available at http://www.cse.ohio-state.edu/~pouchet/software/polybench/

If using this work, please cite the following paper: Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos.
Auto-tuning a High-Level Language Targeted to GPU Codes. Proceedings of Innovative Parallel Computing (InPar '12), 2012.

Paper available at http://www.eecis.udel.edu/~grauerg/

Available benchmarks:

Convolution: 2DCONV 3DCONV

Linear Algebra: 2MM 3MM ATAX BICG DOITGEN GEMM GESUMMV GRAMSCHMIDT LU MVT SYR2K SYRK

Datamining: CORRELATION COVARIANCE

Stencils: ADI FDTD-2D JACOBI-1D JACOBI-2D

The CUDA, OpenCL, and HMPP codes are based on PolyBench 2.0 (with the exception of convolution which isn't part of PolyBench 2.0). The OpenACC codes are based on PolyBench 3.2.

Instructions - to compile/run CUDA, OpenCL, and HMPP (OpenACC described separately below):

CUDA:

Set up PATH and LD_LIBRARY_PATH environment variables to point to CUDA installation
Run "make" in target folder(s) with codes to generate executable(s)
Run the generated .exe file(s).

OpenCL:

Set up PATH and LD_LIBRARY_PATH environment variables to point to OpenCL installation
Set location of SDK in common.mk file in OpenCL folder
Run "make" in target folder(s) to generate executable(s)
Run the generated .exe file(s).

HMPP:

Change to bash shell if not currently in it.
Set up PATH and LD_LIBRARY_PATH environment variables to point to CUDA/OpenCL installation
Set up HMPP environment variables with source hmpp-env.sh command in {HMPP_INSTALLATION}/bin folder
Run "make exe" in target folder(s) with codes to generate executable(s)
Run generated .exe file(s). If there's an error when running the .exe file(s), try running them with the "make" or "make run" command in the folder.

Modifying Codes:

Parameters such as the input sizes, data type, and threshold for GPU-CPU output comparison can be modified using constants within the codes. After modifying, run "make clean" then "make" on relevant code for modifications to take effect in resulting executable.

NOTES ABOUT PARAMETERS:

DATA_TYPE: By default, the DATA_TYPE used in these codes are floats; that can be changed to doubles by changing the DATA_TYPE typedef to "double"

PERCENT_DIFF_ERROR_THRESHOLD: The PERCENT_DIFF_ERROR_THRESHOLD refers to the percent difference (0.0-100.0) that the GPU and CPU results are allowed to differ and still be considered "matching"; this parameter can be adjusted for each code in the input code file.

======================================================================================================================================================================================= OPENACC INFO:

** To compile OpenACC version using HMPP Workbench / CAPS Compiler:

Targeting CUDA:

$> hmpp --codelet-required --openacc-target=CUDA gcc -O2 -I./utilities -I./linear-algebra/kernels/gemm/gemm utilities/polybench.c linear-algebra/kernels/gemm/gemm.c -o gemm_acc

OR

$> capsmc --codelet-required --openacc-target=CUDA gcc -O2 -I./utilities -I./linear-algebra/kernels/gemm/gemm utilities/polybench.c linear-algebra/kernels/gemm/gemm.c -o gemm_acc

Targeting OpenCL:

$> hmpp --codelet-required --openacc-target=OPENCL gcc -O2 -I./utilities -I./linear-algebra/kernels/gemm/gemm utilities/polybench.c linear-algebra/kernels/gemm/gemm.c -o gemm_acc

OR

$> capsmc --codelet-required --openacc-target=OPENCL gcc -O2 -I./utilities -I./linear-algebra/kernels/gemm/gemm utilities/polybench.c linear-algebra/kernels/gemm/gemm.c -o gemm_acc

** To generate the reference output of a benchmark:

Pass the -DPOLYBENCH_DUMP_ARRAYS argument to the host compiler (i.e. gcc or icc) when compiling

$> ./gemm_acc 2>gemm_ref.out

Some available options (OpenACC):

They are all passed as macro definitions during compilation time (e.g, -Dname_of_the_option).

POLYBENCH_TIME: output execution time (gettimeofday) [default: off]
POLYBENCH_NO_FLUSH_CACHE: don't flush the cache before calling the timer [default: flush the cache]
POLYBENCH_LINUX_FIFO_SCHEDULER: use FIFO real-time scheduler for the kernel execution, the program must be run as root, under linux only, and compiled with -lc [default: off]
POLYBENCH_CACHE_SIZE_KB: cache size to flush, in kB [default: 33MB]
POLYBENCH_STACK_ARRAYS: use stack allocation instead of malloc [default: off]
POLYBENCH_DUMP_ARRAYS: dump all live-out arrays on stderr [default: off]
POLYBENCH_CYCLE_ACCURATE_TIMER: Use Time Stamp Counter to monitor the execution time of the kernel [default: off]
POLYBENCH_PAPI: turn on papi timing (see below).
MINI_DATASET, SMALL_DATASET, STANDARD_DATASET, LARGE_DATASET, EXTRALARGE_DATASET: set the dataset size to be used [default: STANDARD_DATASET]
POLYBENCH_USE_C99_PROTO: Use standard C99 prototype for the functions.
POLYBENCH_USE_SCALAR_LB: Use scalar loop bounds instead of parametric ones.

PAPI support (OpenACC):

** To compile a benchmark with PAPI support:

$> gcc -O3 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_PAPI -lpapi -o atax_papi

** To specify which counter(s) to monitor:

Edit utilities/papi_counters.list, and add 1 line per event to monitor. Each line (including the last one) must finish with a ',' and both native and standard events are supported.

The whole kernel is run one time per counter (no multiplexing) and there is no sampling being used for the counter value.

Accurate performance timing (OpenACC):

With kernels that have an execution time in the orders of a few tens of milliseconds, it is critical to validate any performance number by repeating several times the experiment. A companion script is available to perform reasonable performance measurement of a PolyBench.

$> gcc -O3 -I utilities -I linear-algebra/kernels/atax utilities/polybench.c linear-algebra/kernels/atax/atax.c -DPOLYBENCH_TIME -o atax_time $> ./utilities/time_benchmark.sh ./atax_time

This script will run five times the benchmark (that must be a PolyBench compiled with -DPOLYBENCH_TIME), eliminate the two extremal times, and check that the deviation of the three remaining does not exceed a given threshold, set to 5%.

It is also possible to use POLYBENCH_CYCLE_ACCURATE_TIMER to use the Time Stamp Counter instead of gettimeofday() to monitor the number of elapsed cycles.

Generating macro-free benchmark suite (OpenACC):

(from the root of the archive:) $> PARGS="-I utilities -DPOLYBENCH_TIME"; $> for i in cat utilities/benchmark_list; do create_cpped_version.sh $i "$PARGS"; done

This create for each benchmark file 'xxx.c' a new file 'xxx.preproc.c'. The PARGS variable in the above example can be set to the desired configuration, for instance to create a full C99 version (parametric arrays):

$> PARGS="-I utilities -DPOLYBENCH_USE_C99_PROTO"; $> for i in cat utilities/benchmark_list; do ./utilities/create_cpped_version.sh "$i" "$PARGS"; done

====================================================================================================================================================================================

Contact Info:

Contacts: Scott Grauer-Gray sgrauerg@gmail.com Will Killian killian@udel.edu John Cavazos cavazos@udel.edu

Paper describing work:

Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos.
Auto-tuning a High-Level Language Targeted to GPU Codes. To Appear In Proceedings of Innovative Parallel Computing (InPar '12), 2012.

Codes are based on PolyBench codes which are able to be parallelized on the GPU; Original PolyBench codes available at http://www.cse.ohio-state.edu/~pouchet/software/polybench/.

Acknowledgement: This work was funded in part by the U.S. National Science Foundation through the NSF Career award 0953667 and the Defense Advanced Research Projects Agency through the DARPA Computer Science Study Group (CSSG).

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
CUDA		CUDA
HMPP		HMPP
OpenACC		OpenACC
OpenCL		OpenCL
OpenMP		OpenMP
common		common
polybenchCodesCudaOpenClHMPPOpenAcc		polybenchCodesCudaOpenClHMPPOpenAcc
.gitignore		.gitignore
README.md		README.md
README.txt		README.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA

CUDA

HMPP

HMPP

OpenACC

OpenACC

OpenCL

OpenCL

OpenMP

OpenMP

common

common

polybenchCodesCudaOpenClHMPPOpenAcc

polybenchCodesCudaOpenClHMPPOpenAcc

.gitignore

.gitignore

README.md

README.md

README.txt

README.txt

Repository files navigation

** To compile OpenACC version using HMPP Workbench / CAPS Compiler:

** To generate the reference output of a benchmark:

** To compile a benchmark with PAPI support:

** To specify which counter(s) to monitor:

About

Releases

Packages

lnangong/polybenchGpu

Folders and files

Latest commit

History

Repository files navigation

** To compile OpenACC version using HMPP Workbench / CAPS Compiler:

** To generate the reference output of a benchmark:

** To compile a benchmark with PAPI support:

** To specify which counter(s) to monitor:

About

Resources

Stars

Watchers

Forks