Who’s Afraid of Asymmetric Multiprocessing?

Lately, the idea of symmetric multiprocessing has been gaining popularity in the embedded world. But there’s also a growing amount of buzz around an alternative called asymmetric multiprocessing for embedded systems.


  • Page 1 of 1
    Bookmark and Share

Symmetric multiprocessing (SMP) is a well-known way to make a number of software tasks run concurrently by distributing the processing load among multiple CPUs. SMP designs typically have a number of general-purpose processors connected through a bus to a large shared memory. It is the job of an operating system for SMP to spread the processing load among the processors, in order to accelerate the execution of application software. Such SMP designs are typically limited to aggregations of general-purpose processors that are identical in functionality. In addition, SMP is not well suited for many applications that require predictable, real-time response. It does not scale well to very large systems, and it lacks fault-tolerant features.

Asymmetric multiprocessing (AMP) addresses these limitations by taking an alernative approach to distributing the processing load. In AMP, separate specialized processors are used for specific groups of processes or tasks. The processors used might be identical, or they might be quite different from one another in their architectures, their functionalities and their interfaces. These processors may be on separate silicon chips, or they may be multi-core, sharing a single chip and package. Examples of such multi-core devices include Texas Instruments’s OMAP processors, Broadcom’s SiByte family of processors and Freescale’s 8641D dual-core PowerPC-based SOC.

Specialized RTOSs Provide Basic Services

Within each processor in an AMP configuration, basic Real-Time Operating System (RTOS) services are needed. These basic services are provided in an RTOS kernel (Figure 1). Task scheduling is shown at the center of the RTOS kernel. Most operating systems for embedded applications control the execution of application software by using priority-based preemptive scheduling. The RTOS’s task scheduler will allow tasks to run and will switch among the tasks using the rule that the highest priority task that is ready to run, should always be the task that is actually running.

The second main section of an RTOS kernel is inter-task communication and synchronization. Inter-task communication and synchronization mechanisms are necessary in preemptive multitasking, because without them tasks could communicate corrupted information or otherwise interfere with one another. Most RTOS kernels offer a variety of inter-task communication and synchronization mechanisms that may include message queues, pipes, semaphores, mailboxes, event groups and asynchronous signals. It is often confusing for a software designer to choose the appropriate mechanism for a particular application need. For AMP systems, a uniform and straightforward inter-task communication model focused on asynchronous message passing is recommended, as shown in Figure 2. This uniform communication model can be used both for information transfer between tasks on the same processor, and for information transfer between tasks on different processors.

Asynchronous message passing is a simple and intuitive loosely coupled approach to information transfer from task to task, where a task sending a message does not wait for anything from the receiver task—and thus cannot fail even if the receiver task has failed or becomes inaccessible. It also avoids many of the complexities and pitfalls associated with semaphores and mutexes. It is an elegant conceptual “gateway” to multi-core, fault-tolerant and AMP embedded systems design. These basic kernel services have predictable real-time response times when provided by an RTOS kernel.

In an AMP configuration, the actual underlying RTOS software may be different on the different participating processors, because of their differing silicon architectures, functionalities and interfaces. However, there would be great advantages if those RTOSs were to offer application software developers a uniform application programmers interface (API) through which their tasks could request RTOS services—whether running on a DSP, a control processor, or a network communication processor. This makes large and complex software systems for AMP conceptually simpler to design. It also opens up the enticing possibility of moving software that was originally intended to run on one type of processor, over to other types of processors.

If the processors are different in relatively minor ways such as their I/O interfaces, the peripatetic software might run without modification. If the processors are more fundamentally different, recompilation and re-linking could be required, as might some modification of source code in situations such as different word lengths or different endianness. RTOS service calls would not need modification. For example, a control algorithm that was originally written to run on a DSP could dynamically be loaded and activated on a higher-level control processor and run there instead, if the DSP’s signal processing burden were to become overly heavy at times.

Linking the Processors

RTOS kernels focus on basic services within a single processor in an AMP environment. Support for communication between tasks residing on different processors is normally not located there. Instead, support for such inter-processor links is provided by an optional add-on RTOS component called a link handler.

Link handlers offer an elegant and very general service for asynchronous message delivery between applications running on different processors. The link handler does so by using the same asynchronous message passing model that is used in kernel-based intra-processor task-to-task communication, and extending it into the realm of distributed and multi-core multiprocessor systems. It is not in any sense a master-slave model, but rather implements a peer-to-peer relationship among the various processors in the system: If a task or a processor fails, the failure is not propagated to other processors. Other tasks and processors can continue working unhindered.

Link handlers do not require that application software “understand” the structure of a distributed or multi-core system. Nor does application task code need to know whether or not it is operating in a distributed environment at all. Rather, the location of a task’s communication partner(s) is transparent to the application software: Application software is concerned only with the passing of messages from task to task. The link handler takes care of message delivery if the destination is on a different processor than the message sender—making such message communication transparent across processor boundaries.

Link handlers need to run on all processors in an AMP system where application tasks wish to communicate with application tasks on other processors. They provide the logical channels that can pass messages across processor boundaries in ways transparent to application software (Figure 3). To the programmers of the application tasks, message passing appears to be as simple, as in Figure 2, even though the application tasks shown there could be running on three different physical processor cores.

When using link handlers, the physical channels that connect the processors can be any of a wide variety of networking, serial or bus links—and, of course, also shared memory if available. Link handler technology scales well to large systems, and also provides a number of fault-tolerance features.

This approach makes it possible to incorporate a wide range of processors in AMP systems using the link handler-based transparent communication model. For example, link handlers are available for RTOSs that run on a variety of DSPs, control processors and network communication processors. It is also possible to design an AMP system based on a combination of RTOSs and non-real-time operating systems and processors—some processors running perhaps Linux or Solaris or Win32, while other processors run RTOSs. This can be done using a link handler-based facility called a gateway.

A gateway enables RTOS-style asynchronous message passing for AMP systems that use multiple operating systems on multiple processors of heterogeneous design, providing transparent communications between RTOS-based processors and processors running other real-time or non-real-time operating systems (Figure 4). Gateway daemon software executes on the non-RTOS (in this example, Linux) platform, and native applications there use the gateway client library to communicate directly with RTOS tasks that are on RTOS-based computers. From the standpoint of the RTOS-based processors, the gateway gives the appearance that the non-RTOS tasks are, for all intents and purposes, RTOS tasks.

Medical Monitor in a “Dick Tracy” Wristwatch

Some of us are old enough to remember the comics in which Dick Tracy was a technically savvy police detective who communicated using a wireless telephone (and television) built into his wristwatch. We know that modern technology still hasn’t quite achieved this futuristic dream. But along those same lines, we can also imagine a portable medical monitoring instrument small enough to be contained within a wristwatch, based upon asymmetric multiprocessing system architecture in a single multi-core chip.

Figure 5 shows the multi-core chip for this portable medical monitor. It contains a DSP core shown on the left, which will be responsible for medical data acquisition and real-time signal analysis. It also contains a control and communication processor core shown on the right, which will be responsible for higher-level medical analyses, alarm situation detection and communication with the outside world. A detailed assignment of tasks and applications to the processor cores is shown in the Figure.

The DSP core of the multi-core chip is used for sampling and signal processing for real-time medical data such as heartbeats, blood oxygen, blood CO2, peripheral body temperature and respiration. Its RTOS makes it possible for its medical signal processing tasks to send data results onward for further processing using RTOS-style message passing. This further processing of these data is done at the ARM core, the second processor on this multi-core chip. Jobs such as meticulously detailed cardiac arrhythmia analysis, metabolic analyses, alarm detection and database management are performed on the second processor.

In addition, the second processor handles such functions as driving the graphic user interface on the wristwatch’s LCD display, and communicating via a wireless interface to the “outside world.” This link to the outside world might be via BlueTooth to a nearby cell phone in the user’s pocket, which could then report medical data and/or medical emergencies to anywhere in the world. It could also be used to receive medical or technical inputs from anywhere in the world. For example, a cardiologist in Buenos Aires could instruct the cardiac arrhythmia analysis software to be particularly sensitive to certain cardiac rhythm anomalies such as ventricular tachycardia, based on that cardiologist’s knowledge of the particular person’s past medical history.

Traditional SMP designs are typically limited to aggregations of identical processors, and are not well suited for applications that require predictable, real-time response. SMP does not scale well to large systems, and it lacks fault-tolerant features. Asymmetric multiprocessing (AMP) addresses these limitations by taking an alernative approach to distributing the processing load: separate specialized processors that can be used for specific groups of processes or tasks.

If AMP is implemented using an RTOS-style kernel, basic services such as task scheduling and inter-task communication can have predictable real-time response times. When link handlers are added in order to link together the possibly heterogeneous processors in the AMP configuration, scalability can be achieved up to very large systems. Thus AMP is a realistic alternative to SMP, which gives the system architect great flexibility in designing heterogeneous distributed and multi-core embedded systems.

Enea Embedded Technology
Kista, Sweden and Tempe, AZ.
(480) 753-9200.