TECHNOLOGY IN SYSTEMS
Developing Hybrid code Using OpenCL
Parallel Computing with AMD Fusion-Based Computer-on-Modules
The integration of powerful graphics processors on the same die with multicore x86 processors is opening new areas of compute-intensive embedded applications.
JOHN DOCKSTADER, CONGATEC
Page 1 of 1
Embedded computing tasks are getting more and more demanding across all applications. The same applies to the processors, which must be flexible and customizable in order to encode or decode a variety of media signals and data formats such as JPEG, MP3 and MPEG2.Depending on the specific requirements, a choice of processor types is available. If the application is highly specific and individual, a digital signal processor (DSP) is a common choice. If the application is basic enough to be handled by an x86 architecture type processor, the use of a General Purpose Computing on Graphics Processing Unit (GPGPU) can enhance performance. AMD Fusion-based Computer-on-Modules, which include AMD’s integrated GPGPU, are now appearing on the market and provide compute capabilities beyond the traditional x86 performance scope (Figure 1).
Computer-on-Module Concept with AMD Fusion.
For a long time CPUs have been required to offer dedicated and often parallel performance for the processing of complex algorithms on top of high generic, mostly serial, processing power. This is necessary, for instance, when encoding or decoding high definition video, processing raw data—such as in industrial image processing—or performing complex vector calculations in diagnostic medical imaging procedures. Until now, if processed in an x86 design, these tasks required high computing performance with high clock frequencies, resulting in high energy consumption and heat dissipation. While multicore technology and continuous efficiency improvements in processor technology can address these issues to a certain degree, the fact remains that a speeding up of the clock rate alone is not enough to meet all application requirements.
For example, high 3D performance is required for appealing animation, visualization and smooth playback of HD content. The graphics core also needs to support the CPU when decoding HD videos—something that is of particular importance in medical technology, as in 4D ultrasound or endoscopy, and also in infotainment applications. The closer the embedded application gets to the consumer sector, the higher the user expectations.
For this reason, AMD has combined both technologies in one package with the release of the embedded G-Series and R-Series platforms. Users can now take advantage of an extremely powerful graphics unit with highly scalable processor performance. The so-called Accelerated Processing Unit (APU) combines the serial processing power of the processor cores with the parallel processing power of the graphics card. This signals an end to the previous software-based division of tasks between the processor and the graphics unit. Simply put, this means the processor cores can offload parallel tasks to the graphics unit, thereby increasing the overall performance of the system far beyond what has previously been possible.
Driven by the consumer market, the performance of graphics cores has steadily increased. In particular, the 3D representation of virtual worlds has pushed the specialization of graphics cards and created a demand for high parallel processing capacity. Due to the variety of graphics data, such as the calculation of texture, volume and 3D modeling for collision detection and vertex shaders for geometry calculations, the functions are no longer firmly cast in hardware, but are freely programmable. As a consequence, advanced graphics units provide an enormous and highly flexible performance potential.
With the help of GPGPUs, this potential can be used not just for the calculation and representation of graphics, but also for data processing. Possible uses include the calculation of 3D ultrasound images in medical applications, face recognition in the security sector, industrial image processing and data encryption or decryption. Certain types of data—such as from sensors, transducers, transceivers and video cameras—can be processed faster and more efficiently with dedicated processing cores than with the generic serial computing power of x86 processors. This is due to the fact that with a GPGPU it is irrelevant whether the data generated by the program code is purely virtual or whether it is supplied via an external source. So it makes good sense to unite the CPU and GPU in an APU for an even stronger team (Figure 2).
The AMD-R-Series integrates two to four x86 cores along with an SIMD parallel processing engine originally designed for high-end graphics, but which can also be used for numerically intensive parallel operations.
It is not so much the CPU but the APU performance that is important. This means OEMs and users need to say goodbye to the phrase “excellent CPU performance,” because processing power is no longer defined by the CPU alone. These days the graphics unit plays a crucial role as well. In addition to the pure representation of graphics, it is already used in mass applications such as filtering algorithms of photo editing programs like Photoshop, programs for encoding and converting video data and Adobe Flash Player. In the past, developers struggled with the fact that traditional CPU architectures and programming tools were of limited use for vector-oriented data models with parallel multi-threading. With the introduction of AMD Fusion technology, that hurdle has been overcome. Easy to use APIs such as Microsoft DirectCompute or OpenCL, which are supported by the AMD Fusion technology, enable application developers to efficiently harness the power of the graphics core of the APU for a variety of tasks beyond imaging—provided, of course, that the graphics core supports it.
The AMD embedded G-Series and R-Series platforms, with integrated graphics, do exactly this and AMD offers software development kits for it. This makes moving to a new type of data processing easier than ever before.
In signal processing, a GPU covers a specific application area. Even though there are less graphics engines compared with a DSP processor, the GPU comes up trumps on programmability. The individual engines can be used flexibly and can be allocated to different tasks. For example, it is possible to use 30 engines in parallel for fast Fourier transform (FFT), 20 engines for JPG and another 30 for a total of up to 80 possible engines for MPEG2 encoding.
For specific tasks, a GPU is therefore more efficient than a DSP. In general, applications with less data and simple algorithms are better suited in order to avoid overloading the system and memory bus.
Good examples from the medical industry are portable ultrasound devices with low imaging rates or image analysis equipment. Another very exciting application is the use in multiple security testing processes to validate the authenticity of banknotes. In these applications, the developer is not tied to existing algorithms, but can program his or her own security mechanisms.
A classic DSP is often used for smaller applications such as seamless processing of digital audio or video signals. A distinction is primarily made between floating and fixed point DSPs. The DSP is optimized for a single operation, massively parallelized and achieves a fast execution speed. Typical applications include mixing consoles for sound manipulation, hard drives or speaker crossovers.
In the future, GPGPUs will be able to fulfill even more of the classic functions of DSPs. But it is also clear that a pure DSP application will not be replaced by a GPGPU (Figure 3). For a GPGPU to perform digital signal processing effectively, the application has to support typical computing features.
AMD-Fusion-GPU-Architecture. In addition to the integrated GPGPU, an external graphics processor or DSP can be attached for specialized tasks.
GPGPUs also work for “simple” embedded computing tasks. AMD Fusion technology is not exclusively positioned for specialized applications. On the contrary, the Computer-on-Module standard COM Express from congatec with AMD Fusion can be used across the entire embedded computing spectrum. Thanks to high scalability—ranging from single core processors to quad core processors based on the AMD R-Series—the new AMD platform covers approximately 80% of all application requirements in the embedded market; from low power right through to high performance applications. Breaking down the performance spectrum to known standards, we can also say that the AMD embedded G-Series platform is scalable for solutions requiring anything between an Intel Atom and an Intel Core i5 dual core processor.
It is important to note that this power calculation does not take into account the superior graphics performance, which thanks to the GPGPU can also be used for other embedded computing tasks. So depending on the application, the performance potential may even be much higher. OEMs can therefore implement their entire product range on the basis of a single processor architecture, regardless of the specific sector. This not only reduces development time, but also simplifies the supply chain and lifecycle management and reduces associated costs. For OEMs and developers who prefer to use core computing components without much design effort and who strive to optimize their supply chain management by using highly flexible COTS platforms, Computer-on-Modules are the appropriate solution.
San Diego, CA.