Skip to content

Latest commit

 

History

History
40 lines (37 loc) · 3.21 KB

hip_terms.md

File metadata and controls

40 lines (37 loc) · 3.21 KB

Table Comparing Syntax for Different Compute APIs

Term CUDA HIP HC C++AMP OpenCL
Device int deviceId int deviceId hc::accelerator concurrency::
accelerator
cl_device
Queue cudaStream_t hipStream_t hc::
accelerator_view
concurrency::
accelerator_view
cl_command_queue
Event cudaEvent_t hipEvent_t hc::
completion_future
concurrency::
completion_future
cl_event
Memory void * void * void *; hc::array; hc::array_view concurrency::array;
concurrency::array_view
cl_mem
grid grid extent extent NDRange
block block tile tile work-group
thread thread thread thread work-item
warp warp wavefront N/A sub-group
Thread-
index
threadIdx.x hipThreadIdx_x t_idx.local[0] t_idx.local[0] get_local_id(0)
Block-
index
blockIdx.x hipBlockIdx_x t_idx.tile[0] t_idx.tile[0] get_group_id(0)
Block-
dim
blockDim.x hipBlockDim_x t_ext.tile_dim[0] t_idx.tile_dim0 get_local_size(0)
Grid-dim gridDim.x hipGridDim_x t_ext[0] t_ext[0] get_global_size(0)
Device Kernel __global__ __global__ lambda inside hc::
parallel_for_each or [[hc]]
restrict(amp) __kernel
Device Function __device__ __device__ [[hc]] (detected automatically in many case) restrict(amp) Implied in device compilation
Host Function __host_ (default) __host_ (default) [[cpu]] (default) restrict(cpu) (default) Implied in host compilation.
Host + Device Function __host__ __device__ __host__ __device__ [[hc]] [[cpu]] restrict(amp,cpu) No equivalent
Kernel Launch <<< >>> hipLaunchKernel hc::
parallel_for_each
concurrency::
parallel_for_each
clEnqueueNDRangeKernel
Global Memory __global__ __global__ Unnecessary / Implied Unnecessary / Implied __global
Group Memory __shared__ __shared__ tile_static tile_static __local
Constant __constant__ __constant__ Unnecessary / Implied Unnecessary / Implied __constant
__syncthreads __syncthreads tile_static.barrier() t_idx.barrier() barrier(CLK_LOCAL_MEMFENCE)
Atomic Builtins atomicAdd atomicAdd hc::atomic_fetch_add concurrency::
atomic_fetch_add
atomic_add
Precise Math cos(f) cos(f) hc::
precise_math::cos(f)
concurrency::
precise_math::cos(f)
cos(f)
Fast Math __cos(f) __cos(f) hc::
fast_math::cos(f)
concurrency::
fast_math::cos(f)
native_cos(f)
Vector float4 float4 hc::
short_vector::float4
concurrency::
graphics::float_4
float4

Notes

  1. For HC and C++AMP, assume a captured tiled_ext named "t_ext" and captured extent named "ext". These languages use captured variables to pass information to the kernel rather than using special built-in functions so the exact variable name may vary.
  2. The indexing functions (starting with thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.
  3. HC allows tile dimensions to be specified at runtime while C++AMP requires that tile dimensions be specified at compile-time. Thus hc syntax for tile dims is t_ext.tile_dim[0] while C++AMP is t_ext.tile_dim0.