Integrating for Parallelism, Performance and Power—A Dance with Complexity


  • Page 1 of 1
    Bookmark and Share

The effects of integration on high performance embedded computing are definitely producing high performance. While this may be basically enabled by Moore’s law, it has unleashed a wide range of creativity. And like every other outbreak of innovation, these developments are taking markedly different directions. We have only to remember the over 100 schemes for switched fabrics that emerged some ten years ago, which ultimately resulted in the much smaller number used today, to appreciate what a positive thing this is.

Time will tell how this all plays out, but from here it looks like the ability to integrate really powerful hardware performance while maintaining a high degree of configurability and programmability is poised to push the ASIC into ever more rarified zones of high volume and special needs. With the development time for a highly integrated specialized ASIC stretching as long as four years (!), these other choices are going to look increasingly attractive.

In what may appear to be a somewhat subjective classification, I see this generation of highly integrated core devices breaking out in a number of ways, some of which appear to be the integration of what were once distinct devices on a board or module. There are, for example, the now fairly well known integrations of multicore ARM processors in the same silicon die with a set of their standard peripherals and an FPGA fabric along with, in some cases, additional analog components. These offerings come notably from Altera, Xilinx and Microsemi.

Then we have other offerings from companies such as NVIDIA and AMD that integrate multicore CPUs on the same die with very powerful and parallel general-purpose graphics processing units (GPGPUs) tightly integrated with the CPUs. These GPGPUs are designed to do very high-level graphics, video and machine vision processing—tasks that also often involve intensive mathematical operations, all of which lend themselves to be executed with a high degree of parallelism.

Next there are families of highly integrated microcontrollers that incorporate CPU cores along with a highly integrated set of on-chip peripherals, memory, memory interfaces and graphic processors along with internal buses. Families like the PIC32MZ from Microchip and the Atom Z36xxx and Z37xxx (formerly “Bay Trail”) from Intel come with versions that provide different combinations of internal functions that the designer can select from to best fit his or her needs.

Finally, there are multicore processors that replicate CPU cores with identical instruction sets in devices with two to ten or more cores. These include multicore processors from AMD, Intel, ARM partners and many more. The CPU/FPGA, GPGPU and multicore directions have in common the fact that they are trying along different approaches to increase performance by offering parallelism in terms of the programmable fabric, the parallel architecture of the GPGPU, or the multicore architecture.

One general observation about these different approaches is that they appear to involve different levels and complexities of software issues. Perhaps the most difficult and as of yet not fully solved hurdle comes with the CPU/FPGA combinations. Here we are bringing together two different disciplines of programmable devices that traditionally have been programmed by their own specialists. Individual manufacturers do supply tools, but there is so far no overall programming/configuration or analysis tool methodology that applies to all of them.

The CPU/GPGPU approach fares better in that there are tools and software platforms that let developers express themselves in an extended world of the C/C++ language. NVIDIA has developed the CUDA platform for its Kepler architecture, which will parallelize C code intended for execution on the GPGPU. AMD has selected the OpenCL platform developed for this same purpose to use on its graphics coprocessors to implement parallel mathematical operations. OpenCL also has the advantage that it is starting to be used for programming parallel operations in FPGAs as well.

The world of advanced microcontrollers can be programmed in a single language, as long as the manufacturers provide drivers for their internal peripherals as they of necessity must do. The “homogenous” multicore processors enjoy several alternatives. They can be programmed with a single operating system and a single language, or they can make use of such things as hypervisors and virtualization to accommodate multiple OSs. Such devices also lend themselves to having what would otherwise be special hardware peripherals implemented in software instead.

Integrating diverse hardware elements also has implications for complexity across interfaces via different protocols as well as obstacles to scalability. These are just an indication of some of the issues that will face developers pursuing greater device integration as we move through an exciting and promising period of innovation.