A Clang-based compiler for compiling Kokkos code (with no syntactical differences) with the aim of generating optimized code for parallel targets such a multithreaded and GPU (NVIDIA/CUDA) and preserving domain awareness.
License
lanl/kokkos-clang
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Copyright (c) 2016, Los Alamos National Security, LLC All rights reserved. Copyright 2016. Los Alamos National Security, LLC. This software was produced under U.S. Government contract DE-AC52-06NA25396 for Los Alamos National Laboratory (LANL), which is operated by Los Alamos National Security, LLC for the U.S. Department of Energy. The U.S. Government has rights to use, reproduce, and distribute this software. NEITHER THE GOVERNMENT NOR LOS ALAMOS NATIONAL SECURITY, LLC MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LIABILITY FOR THE USE OF THIS SOFTWARE. If software is modified to produce derivative works, such modified software should be clearly marked, so as not to confuse it with the version available from LANL. Additionally, redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of Los Alamos National Security, LLC, Los Alamos National Laboratory, LANL, the U.S. Government, nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY LOS ALAMOS NATIONAL SECURITY, LLC AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL LOS ALAMOS NATIONAL SECURITY, LLC OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Kokkos Clang was developed as part of the IDEAS (Interoperable Design of Extreme-scale Application Software) project funded by the DOE Office of Science. This code is unclassified and has been assigned LA-CC-16-054. ====== Kokkos GPU Compiler A.K.A Kokkos Clang The Kokkos Clang compiler is a version of the Clang C++ compiler that has been modified to perform targeted code generation for Kokkos constructs in the goal of generating highly optimized code and to provide semantic (domain) awareness throughout the compilation toolchain of these constructs such as parallel for and parallel reduce. This approach is taken to explore the possibilities of exposing the developer’s intentions to the underlying compiler infrastructure (e.g. optimization and analysis passes within the middle stages of the compiler) instead of relying solely on the restricted capabilities of C++ template metaprogramming. To date our current activities have focused on correct GPU code generation and thus we have not yet focused on improving overall performance. The compiler is implemented by recognizing specific (syntactic) Kokkos constructs in order to bypass normal template expansion mechanisms and instead use the semantic knowledge of Kokkos to directly generate code in the compiler’s intermediate representation (IR); which is then translated into an NVIDIA-centric GPU program and supporting runtime calls. In addition, by capturing and maintaining the higher-level semantics of Kokkos directly within the lower levels of the compiler has the potential for significantly improving the ability of the compiler to communicate with the developer in the terms of their original programming model/semantics. Developed by: Nick Moss (nickm@lanl.gov) and Pat McCormick (pat@lanl.gov) ====== Project Description / Status -A Clang-based compiler for compiling Kokkos code (with no syntactical differences) with the aim of generating optimized code for specific targets such a multithreaded and GPU (NVIDIA/CUDA) and preserving domain awareness. -Runtime for thread pooling, synching, function queueing, and GPU - run kernel from PTX, manage device memory for Kokkos views and dynamically allocated C++ arrays, and device <-> host memory transfer. -Both multithreaded and GPU modes intercept the code generation paths that would normally be followed when encountering a parallel for/reduce - analyze C++ templated parts of the AST to pull out the compile-time information needed for interface to the runtime. -Multithreaded mode generates "closures" for a parallel for's which wrap the arguments and values and generate and IR function -> queueable functions to the runtime. -GPU mode generates a kernel for each parallel for -- and IR function w/ transformations which gets code-generated to PTX and embedded in the module - uses CUDA driver library to launch kernels, allocate memory, etc. -GPU: translate CPU-side Kokkos views runtime and static dimensions into device memory - translate view indexing into cuda array index -GPU: runtime manages device memory for Kokkos views - analyze read/write patterns in the kernels and copies memory to/from device only as needed - device memory for shared views across kernels is reused/persistent. CFG and AST visitor analysis for read/write dependence, auto-copy views/arrays to/from device memory, also used for asynchronous parallel for/reduce launches using CUDA streams. -Currently looking into dynamic parallelism for nested forall -launches. -Established a flexible LLVM foundation - each kernel is IR which can be analyzed and manipulated - possibile further optimizations: warp/divergence, shared memory, better overlap IO, different memory layouts for views, view slicing, module wide optimizations across different kernels. Many possibilities exist for transforming/optimizing the generated code without requiring any changes or instrumentation to the user's original Kokkkos code. ====== Code -runtime resides in top-level runtime directory -to locate the places within LLVM/Clang which were modified/extended, grep for "=== ideas" ====== Known problems The compiled Clang may not properly find system C++ headers so the -isystem and proper include paths may have to be specified. See the regression tests. ====== Building the Compiler -Clone Kokkos from https://github.com/kokkos/kokkos and place the 'kokkos' directory at the top-level directory of kokkos-clang git checkout tags/2.02.15 -from the top-level ideas directory, e.g: -mkdir build -cd build -cmake .. OR cmake -DCMAKE_BUILD_TYPE=RELEASE .. -make ====== Running -The build process creates a C++/Clang compiler in build/llvm/bin which can be used just like the ordinary clang++ compiler -Regression tests are located in test/regress ====== Darwin cluster specific build module load cuda/8.0 module load cmake module load gcc/5.2.0
About
A Clang-based compiler for compiling Kokkos code (with no syntactical differences) with the aim of generating optimized code for parallel targets such a multithreaded and GPU (NVIDIA/CUDA) and preserving domain awareness.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published