Distributed OpenCL for Networked GPUs

distributedCL has been tested on OS X 10.9.3, as well as Ubuntu 12.04 LTS.

It has been tested using OpenCL v1.2, and OpenMPI v1.8.1.

OpenCL

The first requirement to running distributedCL is to have OpenCL available upon all machines you wish to target. There are many guides online for setting up OpenCL, but a good one for linux is available at Andreas Klöckner’s wiki http://wiki.tiker.net/OpenCLHowTo.

OpenMPI

The next is to ensure all machines have OpenMPI installed. You’ll compile and run the program as a typical MPI program. All the information needed to prepare OpenMPI on your machines can be found at http://www.open-mpi.org/.

Samples

Some samples are included in the repository under src/samples

To run them, they must first be compiled with with both OpenMPI and OpenCL, then run as followed

nonblocking_communication: This must be compiled with MPI and run with at least three processes. A sample command to run it locally would be "mpirun -n 3 ./bin/nonblocking_communication". See code for citations.
heat_simulation: Based on the code provided by Dr. D.B. Thomas for his High Performance Computing course. The step world function has been modified to support distributed computing. More information can be found in the makefile included in this sample.

Coding with distributedCL

After including “distributedCL.h”, OpenCL code can be almost directly transcribed to distributedCL code, with a few caveats.

A distributedCL program must always start with the construction of an instance of the distributedCL class (at least before any other distributedCL functions can be called)
A distributedCL program must always terminate with a call to distributedCL::Finalize
All distCL_events that have been passed into a function must have distributedCL:: WaitForEvents called on them prior to distributedCL::Finalize
All distributedCL functions return errors as a cl_int, OpenCL functions that ordinarily return alternatives instead take a reference to that object as the first parameter
In place of void pointers for data in read and write commands, you must pass a pointer to a data_barrier
Functions that involve cross process communications allow for two events (send and receive) as opposed to the standard one

Other than that, any CL function needed is provided by the distributedCL class. For the most part, apart from some of the above mentioned cases, it’s enough to initialize distributedCL as dCL and replace the ‘cl’ at the beginning of any function call with ‘dCL.’ and add a final parameter ‘’. However, not all functions have currently been implemented and some will not work.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
report		report
samples		samples
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

report

report

samples

samples

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Distributed OpenCL for Networked GPUs

OpenCL

OpenMPI

Samples

Coding with distributedCL

About

Releases

Packages

Languages

fjos/fyp

Folders and files

Latest commit

History

Repository files navigation

Distributed OpenCL for Networked GPUs

OpenCL

OpenMPI

Samples

Coding with distributedCL

About

Resources

Stars

Watchers

Forks

Languages