Idiots Algorithm: OpenCL

1. What is OpenCL

OpenCL- Open Computing Language
Open Specification
Proposed by Apple
Specification developed by number of Companies
Specification is maintained by the Khronos Group

2. Why OpenCL ?

Computational performance has shifted from clock speed to cores
Multiple CPUs and programmable GPUs
Need a programming interface that allows users to take advantage of all the system resources
Supports general purpose parallel computations
OpenCL is device agnostic
As an open standard, code should be portable across implementations
No single company controls the specification – vendor neutral

3. OpenCL Devices

Commonly CPUs and GPUs
FPGA
Embedded processors
DSPs

4. Uses of OpenCL

Image, Video and audio processing
Simulations and scientific calculations
Medical imaging
Financial models
Data parallel algorithms

5. What is not right for OpenCL

Sequential problems
Calculations that require a lot of pointer chasing or constant data permutation
Calculations that require a lot of communication and result updates
Device dependent limitations

6. OpenCL Programming Model

In developing an OpenCL project, the first step is to code the host application. This runs on a user's computer (the host) and dispatches kernels to connected devices. The host application can be coded in C or C++, and every host application requires five data structures:cl_device_id, cl_kernel, cl_program, cl_command_queue, and cl_context.

Data Structures

Device: OpenCl device receives kernels from the host represented by cl_device_id
Kernel: A host application distributes kernels to devices represented by a cl_kernel
Program: The host selects kernels from a program represented by a cl_program
Command queue: Each device receives kernels through a command queue represented by a cl_command_queue
Context: An OpenCL context allows devices to receive kernels and transfer data represented by a cl_context

OpenCL Kernels

One of OpenCL's great advantages is that kernels can execute on high-performance computing devices such as GPUs.

The OpenCL Execution Model: Kernels are executed by one or more work-items. Work-items are collected into work-groups and each work-group executes on a compute unit.
The OpenCL Memory Model: Kernel data must be specifically placed in one of four address spaces — global memory, constant memory, local memory, or private memory. The location of the data determines how quickly it can be processed.

7. OpenCL Memory model

The OpenCL memory model identifies four address spaces:

Global memory: Stores data for the entire device.
Constant memory: Similar to global memory, but is read-only.
Local memory: Stores data for the work-items in a work-group.
Private memory: Stores data for an individual work-item.

FAQ on OpenCL

What makes OpenCL fast?
On what type of devices it will work?
What is the difference between CUDA, OPENMP and OPENCL?
What is the stability of OpenCL

Thursday, 23 March 2017

OpenCL