Introduction to Parallel Programming: CUDA and OpenCL - Yuri Ardila

Using exactly the same machine, same cpu, same gpu, same programming language,

Who doesn’t want to see his programs run faster?

Many parallel programming frameworks have been developed and released recently, and with the help of many amazing programmers worldwide, these APIs have also been wrapped beautifully into many scripting languages such as Python, Ruby, or even Javascript.

Haskell: FFI binding to the CUDA interface for programming NVIDIA GPUs
Ruby: SGC-Ruby-CUDA
Python: PyOpenCL

Soo for those of you who haven’t experienced the elegance of parallel computing, what are you waiting for?

First of all, there is CUDA, which is maintained by NVIDIA. I have been developing some programs using CUDA for ~4 years now. The framework itself is pretty simple and straightforward to use, and I am amazed with the number of sample programs NVIDIA provides which come along with the CUDA Toolkit. They make us incredibly easy to learn some basic stuff, not only how to use CUDA API but also some important algorithms and optimization methods to develop a parallel program using CUDA.

Having said so, actually it was pretty hard to develop using CUDA SDK back then (when it was still version 1.x). But as the time goes, the current SDK version just became easier to understand, to install, and to develop with. However, since CUDA API is not an open-source framework, it is only available for NVIDIA’s GPUs (from GeForce 8800 to the newest one).

FYI: The technique of having a computation that traditionally runs on CPU, and then ports it to GPU, is basically called a GPGPU (General Purposed GPU) computing.

Then there is OpenCL, derived from Open Computing Language. OpenCL was initially developed by Apple and Khronos Group. It is another parallel programming framework, with the perk of being able to be executed in many platforms. At first OpenCL was only released with the standard C99, but then they added the C++ wrappers to the runtime API.

Some major vendors support it: AMD, Intel, NVIDIA, with its own compiler. Since every vendor manufactures different computational technology, each of them releases an OpenCL programming SDK and this can be seen inside the following
website respectively: AMD OpenCL SDK, Intel OpenCL SDK, and NVIDIA OpenCL SDK. Each SDK is provided with sample programs. IMO, learning OpenCL will not be that hard if you have previously done some CUDA programming.

There are plenty more out there and you can get some of them for free (or maybe even already installed in your computer) but some aren’t.

Examples for free APIs: OpenMP, Intel’s TBB, Intel’s ArBB, Pthreads

Non free: PGI’s Compiler with OpenACC, CAPS’ Compiler with OpenACC