TeaLeaf

OpenCL quirks

The C++ MPI library is called different things depending on platform and which implementation of MPI is being used - this needs to be specified by passing in the MPICXX_LIB option to the Makefile. Possible names include:
- -lmpi_cxx
- -lmpiCC
- -lmpichcxx
Device selection is done by choosing the devices starting at the index given by opencl_device in tea.in. For example, if there are 2 devices on a system and opencl_device is 0, rank 0 will take device 0 and rank 1 will take device 1.

Extra tea.in flags

Turn on opencl kernel use by putting use_opencl_kernels in tea.in.

Solver flags

tl_max_iters specifies the number of iterations to do before stopping
tl_eps specifies the acceptable error level to stop at

Enabling these flags will turn on the relevant solver:

tl_use_jacobi - use a simple jacobi iteration
tl_use_cg - use the conjugate gradient method with a diagonal jacobi preconditioner
tl_use_chebyshev - use chebyshev solver after running a few iterations of the conjugate gradient solver (not preconditioned) to approximate eigenvalues. The number of iterations of CG to run before switching to the chebyshev solver can be specified with the tl_chebyshev_steps flag (eg, tl_chebyshev_steps=20)

OpenCL

opencl_vendor chooses the vendor - typically nvidia, advanced(AMD), or intel, but any will choose the first available platform
opencl_type chooses device type - typically gpu, cpu, or accelerator but all will just choose the first device from the specified platform. Setting this to list will list all platforms and devices.
opencl_device specifies the index of the device to use (0 indexed)
opencl_usefirst makes device selection ignore MPI ranks and make every process try to use device 0 in whatever platform it has.

TODO

Make preconditioner selectable from tea.in and not a compile time flag

Compiling

In many case just typing make in the required software version will work.

If the MPI compilers have different names then the build process needs to notified of this by defining two environment variables, MPI_COMPILER and C_MPI_COMPILER.

For example on some Intel systems:

make MPI_COMPILER=mpiifort C_MPI_COMPILER=mpiicc

Or on Cray systems:

make MPI_COMPILER=ftn C_MPI_COMPILER=cc

OpenMP Build

All compilers use different arguments to invoke OpenMP compilation. A simple call to make will invoke the compiler with -O3. This does not usually include OpenMP by default. To build for OpenMP for a specific compiler a further variable must be defined, COMPILER that will then select the correct option for OpenMP compilation.

For example with the Intel compiler:

make COMPILER=INTEL

Which then append the -openmp to the build flags.

Other supported compiler that will be recognise are:-

CRAY
SUN
GNU
IBM
PATHSCALE
PGI

The default flags for each of these is show below:-

INTEL: -O3 -ipo
SUN: -fast
GNU: -ipo
XL: -O5
PATHSCLE: -O3
PGI: -O3 -Minline
CRAY: -em Note: that by default the Cray compiler with pick the optimum options for performance.

Other Flags

The default compilation with the COMPILER flag set chooses the optimal performing set of flags for the specified compiler, but with no hardware specific options or IEEE compatability.

To produce a version that has IEEE compatiblity a further flag has to be set on the compiler line.

make COMPILER=INTEL IEEE=1

This flag has no effect if the compiler flag is not set because IEEE options are always compiler specific.

For each compiler the flags associated with IEEE are shown below:-

INTEL: -fp-model strict –fp-model source –prec-div –prec-sqrt
CRAY: -hpflex_mp=intolerant
SUN: -fsimple=0 –fns=no
GNU: -ffloat-store
PGI: -Kieee
PATHSCALE: -mieee-fp
XL: -qstrict –qfloat=nomaf

Note that the MPI communications have been written to ensure bitwise identical answers independent of core count. However under some compilers this is not true unless the IEEE flags is set to be true. This is certainly true of the Intel and Cray compiler. Even with the IEEE options set, this is not guarantee that different compilers or platforms will produce the same answers. Indeed a Fortran run can give different answers from a C run with the same compiler, same options and same hardware.

Extra options can be added without modifying the makefile by adding two further flags, OPTIONS and C_OPTIONS, one for the Fortran and one for the C options.

make COMPILER=INTEL OPTIONS=-xavx C_OPTIONS=-xavx

Finally, a DEBUG flag can be set to use debug options for a specific compiler.

make COMPILER=PGI DEBUG=1

These flags are also compiler specific, and so will depend on the COMPILER environment variable.

So on a system without the standard MPI wrappers, for a build that requires OpenMP, IEEE and AVX this would look like so:-

make COMPILER=INTEL MPI_COMPILER=mpiifort C_MPI_COMPILER=mpiicc IEEE=1 \
OPTIONS="-xavx" C_OPTIONS="-xavx"

File Input

The contents of tea.in defines the geometric and run time information, apart from task and thread counts.

A complete list of options is given below, where <R> shows the option takes a real number as an argument. Similarly  is an integer argument.

initial_timestep <R>

Set the initial time step for TeaLeaf. This time step stays constant through the entire simulation. The default value is

end_time <R>

Sets the end time for the simulation. When the simulation time is greater than this number the simulation will stop.

end_step 

Sets the end step for the simulation. When the simulation step is equal to this then simulation will stop.

In the event that both the above options are set, the simulation will terminate on whichever completes first.

xmin <R>

xmax <R>

ymin <R>

ymax <R>

The above four options set the size of the computational domain. The default domain size is a 10cm square.

x_cells 

y_cells 

The two options above set the cell count for each coordinate direction. The default is 10 cells in each direction.

The geometric information and initial conditions are set using the following keywords with three possible variations. Note that state 1 is always the ambient material and any geometry information is ignored. Areas not covered by other defined states receive the energy and density of state 1.

state density <R> energy <R> geometry rectangle xmin <R> ymin <R> xmax <R> ymax <R>

Defines a rectangular region of the domain with the specified energy and density.

state density <R> energy <R> geometry circle xmin <R> ymin <R> radius <R>

Defines a circular region of the domain with the specified energy and density.

state density <R> energy <R> geometry point xmin <R> ymin <R>

Defines a cell in the domain with the specified energy and density.

Note that the generator is simple and the defined state completely fills a cell with which it intersects. In the case of over lapping regions, the last state takes priority. Hence a circular region will have a stepped interface and a point data will fill the cell it lies in with its defined energy and density.

visit_frequency 

This is the step frequency of visualisations dumps. The files produced are text base VTK files and are easily viewed in an application such as ViSit. The default is to output no graphical data. Note that the overhead of output is high, so should not be invoked when performance benchmarking is being carried out.

summary_frequency 

This is the step frequency of summary dumps. This requires a global reduction and associated synchronisation, so performance will be slightly affected as the frequency is increased. The default is for a summary dump to be produced every 10 steps and at the end of the simulation.

tl_ch_cg_presteps 

This option specifies the number of Conjugate Gradient iterations completed before the Chebyshev method is started. This is necessary to provide approximate minimum and maximum eigen values to start the Chebyshev method. The default value is 30.

tl_ppcg_inner_steps 

Number of inner steps to run when using the PPCG solver. The default value is 10.

tl_ch_cg_errswitch

If enabled alongside Chebshev/PPCG solver, switch when a certain error is reached instead of when a certain number of steps is reached. The default for this is off.

tl_ch_cg_epslim

Default error to switch from CG to Chebyshev when using Chebyshev solver with the tl_cg_ch_errswitch option enabled. The default value is 1e-5.

tl_check_result

After the solver reaches convergence, calculate ||b-Ax|| to make sure the solver has actually converged. The default for this option is off.

tl_preconditioner_type

This keyword invokes the pre-conditioner. Options are:

none - No preconditioner.
jac_diag - Diagonal Jacobi preconditioner. Typically reduces condition number by around 5% but may not reduce time to solution
jac_block - Block Jacobi preconditioner (with a currently hardcoded block size of 4). Typically reduces the condition number by around 50% but may not reduce time to solution

tl_use_jacobi

This keyword selects the Jacobi method to solve the linear system. Note that this a very slowly converging method compared to other options. This is the default method is no method is explicitly selected.

tl_use_cg

This keyword selects the Conjugate Gradient method to solve the linear system.

tl_use_ppcg

This keyword selects the Conjugate Gradient method to solve the linear system.

tl_use_chebyshev

This keyword selects the Chebyshev method to solve the linear system.

profiler_on

This option turns the code's coarse grained internal profiler end. Timing information is reported at the end of the simulation in the tea.out file. The default is no profiling.

verbose_on

The option prints out extra information such as residual per iteration of a solve.

tl_max_iters 

This option provides an upper limit of the number of iterations used for the linear solve in a step. If this limit is reached, then the solution vector at this iteration is used as the solution, even if the convergence criteria has not been met. For this reason, care should be taken in the comparison of the performance of a slowly converging method, such as Jacobi, as the convergence criteria may not have been met for some of the steps. The default value is 1000.

tl_eps <R>

This option sets the convergence criteria for the selected solver. It uses a least squares measure of the residual. The default value is 1.0e-10.

`tl_coefficient_density

This option uses the density as the conduction coefficient. This is the default option.

`tl_coefficient_inverrse_density

This option uses the inverse density as the conduction coefficient.

test_problem 

This keyword selects a standard test with a "known" solution. Test problem 1 is automatically generated if the tea.in file does not exist. Test problems 2-5 are shipped in the TeaLeaf repository. Note that the known solution for an iterative solver is not an analytic solution but is the solution for a single core simulation with IEEE options enabled with the Intel compiler and a strict convergence of 1.0e-15. The difference to the expected solution is reported at the end of the simulation in the tea.out file. There is no default value for this option.

Name		Name	Last commit message	Last commit date
Latest commit History 384 Commits
kernel_files		kernel_files
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
build_field.f90		build_field.f90
calc_dt.f90		calc_dt.f90
data.f90		data.f90
definitions.f90		definitions.f90
diffuse.f90		diffuse.f90
field_summary.f90		field_summary.f90
field_summary_kernel_ocl.cpp		field_summary_kernel_ocl.cpp
ftocmacros.h		ftocmacros.h
generate_chunk.f90		generate_chunk.f90
generate_chunk_kernel_ocl.cpp		generate_chunk_kernel_ocl.cpp
global_mpi.f90		global_mpi.f90
initialise.f90		initialise.f90
initialise_chunk.f90		initialise_chunk.f90
initialise_chunk_kernel_ocl.cpp		initialise_chunk_kernel_ocl.cpp
makefile.deps		makefile.deps
ocl_buffers.cpp		ocl_buffers.cpp
ocl_common.hpp		ocl_common.hpp
ocl_errors.cpp		ocl_errors.cpp
ocl_init.cpp		ocl_init.cpp
ocl_kernels.cpp		ocl_kernels.cpp
ocl_pack.cpp		ocl_pack.cpp
ocl_reduction.cpp		ocl_reduction.cpp
ocl_reduction.hpp		ocl_reduction.hpp
ocl_strings.cpp		ocl_strings.cpp
ocl_strings.hpp		ocl_strings.hpp
parse.f90		parse.f90
read_input.f90		read_input.f90
report.f90		report.f90
set_field.f90		set_field.f90
set_field_kernel_ocl.cpp		set_field_kernel_ocl.cpp
start.f90		start.f90
tea.f90		tea.f90
tea.in		tea.in
tea_bm.in		tea_bm.in
tea_bm16.in		tea_bm16.in
tea_bm16_short.in		tea_bm16_short.in
tea_bm_short.in		tea_bm_short.in
tea_leaf.f90		tea_leaf.f90
tea_leaf_cg.f90		tea_leaf_cg.f90
tea_leaf_cheby.f90		tea_leaf_cheby.f90
tea_leaf_common.f90		tea_leaf_common.f90
tea_leaf_jacobi.f90		tea_leaf_jacobi.f90
tea_leaf_kernel_ocl.cpp		tea_leaf_kernel_ocl.cpp
tea_leaf_ppcg.f90		tea_leaf_ppcg.f90
tea_solve.f90		tea_solve.f90
timer.f90		timer.f90
timer_c.c		timer_c.c
timestep.f90		timestep.f90
types.hpp		types.hpp
update_halo.f90		update_halo.f90
update_halo_kernel_ocl.cpp		update_halo_kernel_ocl.cpp

UK-MAC/TeaLeaf_OpenCL

Folders and files

Latest commit

History

Repository files navigation

TeaLeaf

OpenCL quirks

Extra tea.in flags

Solver flags

OpenCL

TODO

Compiling

OpenMP Build

Other Flags

File Input

About

Resources

Stars

Watchers

Forks

Languages