OpenCL: transforming the industry or another tool in the toolbox?

The use of the Open Computing Language (OpenCL™) has been heralded as an easier method to harness the power of devices which provide hardware acceleration, for use in heterogeneous computing applications. Chris May, an expert in electronics hardware and FPGA design at PA Consulting, discusses the issues.

Add This Share Buttons

He writes:

From our experience using OpenCL, we have found that it provides several advantages when compared to traditional development flows. For it to be suitable for use across a wider range of systems and applications, and to become truly transformational it needs to develop further, but it is a great start.

Growing processing demands

The processing demands made by computationally intensive processing applications continue to grow, and can often exceed the performance available within CPU bound software platforms. Examples of such applications include: image and video processing, encryption, Radar, signal processing, and high performance computing.

Devices such has DSPs, GPUs and FPGAs can be used as hardware accelerators or compute devices to offload CPUs, and enable significant performance improvements for these applications through the use of specialised hardware processing pipelines or by executing tasks in parallel. Traditionally, it can be difficult to develop designs to include these accelerators, and take full advantage of their performance due to complex device architectures and tool chains.

OpenCL aims to address some of these issues, and is gradually gaining traction. However, while it offers several benefits which transform the development process, developers must keep in mind language, device and system architecture considerations.

Why OpenCL?

OpenCL is a high-level C-based framework intended for the development of heterogeneous computing applications. The OpenCL standard is maintained by the Khronos Group, and as of 2016 there are conformant implementations for CPUs, GPUs, FPGAs, DSPs as well as some ASICs.

OpenCL allows developers to harness the power of existing compute devices for use alongside CPUs while avoiding some of the low level design and integration tasks.  In the past, development work for the components of heterogeneous computing systems was often performed using separate tools, and each stage was then followed by system integration step. When using OpenCL SDKs, the development, verification and implementation of designs for host and compute devices takes place within the same OpenCL IDE as part of a single development flow. An added benefit of this integrated development process is that developers are able to skip porting and rework steps when compiling designs which are created as higher level representations in OpenCL. All of these aspects are intended to yield improved productivity, reduced development time, and reduced time to market.

These claims of increased productivity and reduced development time have been made before, the key question is whether the tools will deliver on these promises? From our experience working on heterogeneous computing applications and working with OpenCL, so far it appears that it is on the right track.

Designing with OpenCL

Another benefit of the use of OpenCL is its portability. However developers must be aware that functional portability may not immediately translate into performance portability. Design performance will depend on the architecture and resources of compute devices, and coding style.

Key components of an OpenCL system are the host and the compute device. OpenCL SDKs allow developers to create APIs for the host, and Kernels to process the data on compute devices. The Kernel code is compiled to run on the hardware architecture of the compute device.

The framework supports data and task parallelism, which can be defined for applications by host API calls. Kernel instances can execute concurrently over a virtual 1D, 2D or 3D data processing grids to create parallel processing structures.

When an application is executed the host triggers processing on the compute device, and data is moved between the host and device, and often within compute devices, via shared memory buffers.

OpenCL for FPGAs

GPUs were some of the first devices to find use with OpenCL. The next logical step for OpenCL is FPGAs, and the use of these devices opens up a new range of benefits for system designers, including reduced system power envelopes, increased design flexibility, and increased performance.

Before getting started, developers must be aware that OpenCL compilers for FPGAs require a Board Support Package (BSP) to describe the FPGA’s hardware environment. The development of a BSP for a custom board can be a non-trivial design activity. However, several COTS board vendors already provide FPGA OpenCL development platforms along with BSPs in order to accelerate development.

OpenCL SDKs for FPGAs translate Kernel code into custom processing pipelines in the FPGA fabric. This translation process may give FPGAs a performance per watt advantage over other competing devices; FPGAs exploit pipeline parallelism, whereas for example other devices may exploit SIMD functions based on their fixed internal architecture.

Another advantage from the use of FPGAs, is the ability to select from different types of external memory in order to satisfy the performance demands of the application.  For example: Quad Data Rate SRAM is often used in applications which need large buffers of high performance random access memory, whereas SDRAM is better suited to streaming memory traffic patterns.

Upcoming FPGA vendor extensions for OpenCL will provide further benefits by enabling connections to other types of streaming IO interfaces in these heterogeneous systems, e.g. 10 Gigabit Ethernet interfaces. New extensions will also allow developers to increase performance further by defining the topology of data movement between Kernels, and avoid sharing data through internal memory buffers. The end result should be more integrated, larger and higher performance systems.

When an OpenCL design is created for a FPGA the SDK will generate and include infrastructure to enable communication between the various parts of the design. This feature takes a significant amount of design activity of the developer’s plate. The tool generated infrastructure can be non-trivial with regard to size, due to the inclusion of system and external memory arbiters depending on the complexity of the design.

Kernel functions and infrastructure are implemented in the FPGA through the use pre-verified vendor IP. Developers are able to influence the performance of the design generated by compiler through the use of tool constraints and coding style, rather than set explicit limits as done in the traditional FPGA build process. Generally, the tool generated output does not yet match that which can be created expert developers, but the tools continue to improve.

In addition to the changes involved in the FPGA design process, the use of OpenCL with FPGAs presents some changes to the traditional FPGA design verification flow too. Developers must be carefully plan their verification effort, now that a significant portion of the development takes place within the OpenCL SDK instead of in separate simulators and simulation environments.

OpenCL, powerful and useful

OpenCL allows developers to access hardware acceleration features from a wide range of existing devices, to enable the continued scaling of CPU driven computationally intensive processing applications.

OpenCL is a relatively new language; first released in 2008. It has been used for many applications already, and the standard continues to evolve. The SDKs offer some promising features, with upcoming improvements targeted at increased design performance and efficiency, and ease of use.

After comparing OpenCL to the other tools that are used in PA Consulting’s advanced software development and electronics systems groups we have concluded that OpenCL makes developing heterogeneous processing applications easier, and is a great addition to developer’s toolbox. The tools deliver on the promises of increased productivity and reduced development time too. For OpenCL to be truly transformational and for it to be used across a wider range of systems, the language and tools must mature further, and the development community should continue to improve and add new features targeted at further optimisations and the creation of larger, integrated and higher performance systems. However, it is a great start.

 For more information please contact chris.may@paconsulting.com.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

 

PA Consulting

An independent firm of over 2,600 people, we operate globally from offices across the Americas, Europe, the Nordics, the Gulf and Asia Pacific.

We are experts in consumer and manufacturing, defence and security, energy and utilities, financial services, government, healthcare, life sciences, and transport, travel and logistics.

Our deep industry knowledge together with skills in management consulting, technology and innovation allows us to challenge conventional thinking and deliver exceptional results that have a lasting impact on businesses, governments and communities worldwide.

Our clients choose us because we don’t just believe in making a difference. We believe in making the difference.

_________________________________________________

Looking for something specific?