GPGPU opens up with OpenCL 1.0 spec release - 10 December 2008

Earlier this past month, the final version of the Apple-backed OpenCL specification was delivered to standards group participants for vetting, and the final OpenCL 1.0 specification has now released. Simultaneous with the final spec's release, NVIDIA announced full support for it in its GPU products. Now we know why Apple has standardized on NVIDIA GPU hardware across its entire product line.

In a nutshell, OpenCL is a so-called "GPGPU" specification that enables programmers to tap the power of the GPU as a data-parallel coprocessor without having to learn to speak the specialized language of graphics, i.e., OpenGL or a DirectX flavor. NVIDIA had been pushing things in this direction for some time with C for CUDA, and Microsoft is also headed there with DirectX 11 Compute, so it was natural that Apple would move to ensure that its forthcoming "Snow Leopard" version of Mac OS X would sport comparable capabilities.

Apple and NVIDIA collaborated heavily on the development of OpenCL, but (as its name implies) the standard is open and has been vetted and put out under the auspices of the Khronos Group, a consortium of companies that have banded together to develop and promote royalty-free media APIs. AMD has also promised OpenCL support on its hardware; Intel is a Khronos Group member, so it will presumably support OpenCL with Larrabee, as well.

Many of the industries that stand to benefit the most dramatically from GPGPU have been extremely reluctant to invest a lot of development labor in a single vendor's toolchain (i.e., NVIDIA's C for CUDA, which has been the only real game in town). OpenCL gives them an open, multivendor alternative to C for CUDA, although the two specs aren't quite interchangeable. The following comparison of the two was taken from a slide in NVIDIA's OpenCL presentation:

C for Cuda	OpenCL
C with parallel keywords	Hardware API—similar to OpenGL
C runtime that abstracts driver API	Programmer has complete access to hardware device
Memory managed by C runtime	Memory managed by programmer
Generates PTX	Generates PTX

(Note: PTX is assembler for CUDA. It's the layer that sits closest to NVIDIA's GPU hardware.)

The official spec launches today, and NVIDIA plans to have a beta of it running on its hardware by the first quarter of the coming year, with the final release arriving in the second quarter.

OpenCL memory model

I'm going to speculate that we'll see support for this on consoles before long, with the PlayStation 3 being the most obvious candidate (since it's powered by an NVIDIA GPU). This will give game developers who want to go nuts rethinking the standard SGI render pipeline (I'm thinking of Epic's Tim Sweeney) a cross-platform way to access GPU horsepower.

In connection with this, it's also worth mentioning that OpenCL can take a regular CPU as a target, as one of the design goals listed on slide 13 the OpenCL slide deck (PDF) is to

Enable use of all computational resources in a system
Program GPUs, CPUs, Cell, DSP and other processors as peers
Support both data- and task- parallel compute models

Note also the support for "task-parallel compute models." Task-parallel compute models aren't exactly a good fit for a conventional GPU, but they are for Cell and Larrabee.

Quick note: Apple and OpenCL

I've mentioned before that Apple has an internal "GPGPU" group that serves the company's in-house app developers by giving them ways to use the GPU to boost performance. Apple announced that an OpenCL implementation will be a major feature of Snow Leopard, so third-party developers will be able to get the same kinds of GPU-based speedups that internal Apple developers see. This is good for developers, and it's also good for Apple, because having the Mac ecosystem's collective eyes on the code means more improvements in the company's implementation.

source : http://arstechnica.com/