This is a micro-benchmark for HSA platform.
This sample is modified from the original CLOC sample code. Please make sure you have set up the HSA environment before using this sample.
#Execute
- Run
run.sh
in dir.
#Problem
- There are still some problem needed to be fixed.
- Why global_ld & const_ld cannot execute successfully.(Both ISA are the same)
- Why speed of memeory bandwidth global,local,private,const doesn't make sense.(HSA runtime dosn't support private segment yet)
#Result
- See the picture in dir.
- HSA_enqueue
no_branch
for empty kernel.
vector_copy
for simple kernel just like vector_copy.(To ensure result is accurate)
- SNACK_enqueue
no_branch
andvector_copy
are the same function with HSA_enqueue.
- Memory_bandwidth
const
andglobal_ld
are not make sense.
Private segment is not support in HSA runtime yet.
- branch
Here is micro-bench for branch penalty.
Penalty is almost linear.
- reduce
We use a simple exaplefind max
to test reduce.
#Author NTU PASLAB
WeiChen Lin : weichen8157@gmail.com Medicine Yeh: freedomyeh@hotmail.com
Any question can email to us.