Skip to content

A Clang-based compiler for compiling Kokkos code (with no syntactical differences) with the aim of generating optimized code for parallel targets such a multithreaded and GPU (NVIDIA/CUDA) and preserving domain awareness.

License

lanl/kokkos-clang

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Copyright (c) 2016, Los Alamos National Security, LLC All rights
reserved. Copyright 2016. Los Alamos National Security, LLC. This
software was produced under U.S. Government contract DE-AC52-06NA25396
for Los Alamos National Laboratory (LANL), which is operated by Los
Alamos National Security, LLC for the U.S. Department of Energy. The
U.S. Government has rights to use, reproduce, and distribute this
software.  NEITHER THE GOVERNMENT NOR LOS ALAMOS NATIONAL SECURITY,
LLC MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LIABILITY
FOR THE USE OF THIS SOFTWARE.  If software is modified to produce
derivative works, such modified software should be clearly marked, so
as not to confuse it with the version available from LANL.
 
Additionally, redistribution and use in source and binary forms, with
or without modification, are permitted provided that the following
conditions are met: 

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. Neither the name of Los Alamos National Security, LLC, Los Alamos
National Laboratory, LANL, the U.S. Government, nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
 
THIS SOFTWARE IS PROVIDED BY LOS ALAMOS NATIONAL SECURITY, LLC AND
CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL LOS
ALAMOS NATIONAL SECURITY, LLC OR CONTRIBUTORS BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Kokkos Clang was developed as part of the IDEAS (Interoperable Design of
Extreme-scale Application Software) project funded by the DOE Office of Science.

This code is unclassified and has been assigned LA-CC-16-054.

====== Kokkos GPU Compiler A.K.A Kokkos Clang

The Kokkos Clang compiler is a version of the Clang C++ compiler that
has been modified to perform targeted code generation for Kokkos
constructs in the goal of generating highly optimized code and to
provide semantic (domain) awareness throughout the compilation
toolchain of these constructs such as parallel for and parallel
reduce. This approach is taken to explore the possibilities of
exposing the developer’s intentions to the underlying compiler
infrastructure (e.g. optimization and analysis passes within the
middle stages of the compiler) instead of relying solely on the
restricted capabilities of C++ template metaprogramming. To date our
current activities have focused on correct GPU code generation and
thus we have not yet focused on improving overall performance.  The
compiler is implemented by recognizing specific (syntactic) Kokkos
constructs in order to bypass normal template expansion mechanisms and
instead use the semantic knowledge of Kokkos to directly generate code
in the compiler’s intermediate representation (IR); which is then
translated into an NVIDIA-centric GPU program and supporting runtime
calls. In addition, by capturing and maintaining the higher-level
semantics of Kokkos directly within the lower levels of the compiler
has the potential for significantly improving the ability of the
compiler to communicate with the developer in the terms of their
original programming model/semantics.

Developed by:

Nick Moss (nickm@lanl.gov) and Pat McCormick (pat@lanl.gov)

====== Project Description / Status

-A Clang-based compiler for compiling Kokkos code (with no syntactical
differences) with the aim of generating optimized code for specific
targets such a multithreaded and GPU (NVIDIA/CUDA) and preserving
domain awareness.

-Runtime for thread pooling, synching, function queueing, and GPU -
run kernel from PTX, manage device memory for Kokkos views and
dynamically allocated C++ arrays, and device <-> host memory transfer.

-Both multithreaded and GPU modes intercept the code generation paths
that would normally be followed when encountering a parallel
for/reduce - analyze C++ templated parts of the AST to pull out the
compile-time information needed for interface to the runtime.

-Multithreaded mode generates "closures" for a parallel for's which
wrap the arguments and values and generate and IR function ->
queueable functions to the runtime.

-GPU mode generates a kernel for each parallel for -- and IR function
w/ transformations which gets code-generated to PTX and embedded in
the module - uses CUDA driver library to launch kernels, allocate
memory, etc.

-GPU: translate CPU-side Kokkos views runtime and static dimensions
into device memory - translate view indexing into cuda array index

-GPU: runtime manages device memory for Kokkos views - analyze
read/write patterns in the kernels and copies memory to/from device
only as needed -  device memory for shared views across kernels is
reused/persistent. CFG and AST visitor analysis for read/write
dependence, auto-copy views/arrays to/from device memory, also used
for asynchronous parallel for/reduce launches using CUDA streams.

-Currently looking into dynamic parallelism for nested forall
-launches.

-Established a flexible LLVM foundation - each kernel is IR which can
be analyzed and manipulated - possibile further optimizations:
warp/divergence, shared memory, better overlap IO, different memory
layouts for views, view slicing, module wide optimizations across
different kernels. Many possibilities exist for
transforming/optimizing the generated code without requiring any
changes or instrumentation to the user's original Kokkkos code.

====== Code

-runtime resides in top-level runtime directory

-to locate the places within LLVM/Clang which were modified/extended, 
grep for "=== ideas"

====== Known problems

The compiled Clang may not properly find system C++ headers so the
-isystem and proper include paths may have to be specified. See the
regression tests.

====== Building the Compiler

-Clone Kokkos from https://github.com/kokkos/kokkos and place the 'kokkos' directory at the top-level directory of kokkos-clang
git checkout tags/2.02.15


-from the top-level ideas directory, e.g:

-mkdir build

-cd build

-cmake .. OR cmake -DCMAKE_BUILD_TYPE=RELEASE ..

-make

====== Running

-The build process creates a C++/Clang compiler in build/llvm/bin which 
can be used just like the ordinary clang++ compiler

-Regression tests are located in test/regress

====== Darwin cluster specific build

module load cuda/8.0
module load cmake
module load gcc/5.2.0

About

A Clang-based compiler for compiling Kokkos code (with no syntactical differences) with the aim of generating optimized code for parallel targets such a multithreaded and GPU (NVIDIA/CUDA) and preserving domain awareness.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published