BROWSE ARTICLES BY TECHNOLOGY

DIGITAL EDITION

RTC Magazine Digital Edition

INDUSTRY NEWS

RECENT COMMENTS

  • Hi Juan, This article shows you how to implement a quadrature encoder interface on the FPGA using digital lines. It was written for our PCI or P...

    Meghan Meckstroth Kerry - See Article

  • Good coverage on the general advantages of COM, and X86 implementations. It would have been nice to ARM options for lower-power (handheld) applicat...

    Brian Empey, P.Eng. - See Article

  • Your article about Application Service Platforms in RTC April is another example of great reporting by RTC. Can we have a new RTC index category -...

    Kenneth G Blemel - See Article

  • Static analysis tools/scanners are a great arsenal for companies who require high quality code. It does a great job of finding a wide range of pro...

    Andrew Yang - See Article

  • I hope that the microcessor based Insulin Pump riding on my belt would be held to a higher standard. If it quits, I can work around that inconvenie...

    Karl Williamson - See Article

WHITEPAPERS

QUICK DOWNLOADS

RTEC10 is an index made up of 10 public companies which have revenue that is derived primarily from sales in the embedded sector. The companies are made up of both software and hardware companies being traded on public exchanges.

COMPANY PRICECHANGE
Kontron
7.81
4.577%
Adlink
1.54
2.388%
Advantech
2.32
1.505%
Interphase
1.61
-3.012%
Radisys
9.26
-1.016%
-   Performance Technologies2.100.000%
-   Enea5.630.000%
PLX
3.62
-3.209%
Mercury Computer
11.76
-2.931%
Elma
412.98
-0.476%
HIGH LOW MKT CAP
7.85
7.43
435.04
1.58
1.52
185.11
2.33
2.30
1,198.70
1.70
1.61
11.00
9.41
9.24
223.74
2.102.1023.34
5.635.54101.86
3.74
3.61
134.28
12.17
11.76
279.57
412.98
412.98
94.25
RTEC10 Index: 490.94 (1.11%)
RTEC10 is sponsored by VDC research

INDUSTRY INSIGHT

From Multiprocessor to Multicore

Moving from Multiprocessors to Multiple Cores

The advent of powerful multicore architectures like the Cell Broadband Engine can significantly enhance applications that were already boosted by multiprocessor approaches. The trick lies in knowing how to optimize the newly available resources.

WILLIAM LUNDGREN, KERRY BARNES AND JAMES STEED, GEDAE

  • Page 1 of 3
    Bookmark and Share

The use of multiple processing elements has become essential to software development. A variety of multicore and DSP processors are available. While each processing core is capable of doing a variety of tasks, some processing elements may be better suited to some tasks than others. Using traditional development methods, the choice of processor for each task must be done at the beginning of development. By making this choice early, the planning of the partitioning and mapping of work to processors can be done before coding is started to minimize risk to the project. However, this preplanning requires much technical experience and insight both in the type of problem and the capability of the processors. The sense of experimentation that moves most engineers and programmers into entering science is shackled and restrained by the necessary structure needed to help improve the chances of getting an expensive project through to fruition.

Other options are available. Software development tools are available that automate the implementation of distributed software. Using a model of the software that can be constructed on a single workstation, the tool generates separate threads and executables to construct the parallel implementation, and many types of processors can be supported using the same infrastructure. Using these software development tools, the distribution of work to processors, and even the choice of processors themselves, can be delayed until the final stages of software development. Through experimentation and analysis, engineers can find the optimum implementation, not just enabling the search for better software, but also reducing risk to the project by allowing the implementation parameters—that used to be set in stone before coding—to be altered in an iterative fashion.

An example of some of the benefits of using this approach to software development is the work recently done to move a synthetic aperture radar (SAR) benchmark from a quad PowerPC DSP system to the Cell Broadband Engine (Cell/B.E.) processor. The SAR algorithm consists of three main components: range processing, a matrix transpose and azimuth processing. The range and azimuth processing have many compute-intensive vector operations, including FFTs, inverse FFTs and vector multiplies. The work of the range and azimuth processing can be easily distributed to multiple processors, but distributing this work requires the matrix transpose to be distributed—what is called a “corner turn.”

The existing SAR benchmark was implemented in Gedae, a programming language and multithreading compiler that enables experimentation with many different processors and processor topologies. Gedae was used to generate an implementation for the quad PowerPC system, as shown in Figure 1. Each PowerPC in the system runs at 500 MHz and has 256 Mbytes of memory. While the 500 MHz processors are several years old, the suitably ample memory allows the large SAR images to be processed one at a time. In other words, once distributed, one SAR image easily fits in the four memories. Because of this ample memory, the corner turn operation is implemented easily by sending the i-th section of the subimage on the j-th processor to the j-th section of the subimage on the i-th processor; a very trivial implementation of a distributed matrix transpose. The quad PowerPC implementation achieves a frame rate of 3 Hz.

Using traditional development techniques, re-implementing this application on the Cell/B.E. processor presents a significant programming project. The Cell Broadband Engine Architecture is a heterogeneous multicore architecture developed through a collaboration between Sony, Toshiba and IBM. The current implementation of the Cell/B.E. processor combines one Power Processing Element (PPE) with eight identical Synergistic Processing Elements (SPE), as shown in Figure 2. The PPE is a dual-threaded PowerPC core, and each SPE contains a high-speed processor with its own 256 Kbyte local store and DMA (Direct Memory Access) engine. Using the SPEs effectively is a key programming challenge when targeting the processor. While processing can be put on both PPE threads, the power of the processor is only unleashed when the SPEs are heavily utilized. Using the SPEs heavily means the software developer must overcome the hurdle of the SPE’s 256 Kbyte local storage.

LEAVE A COMMENT