TECHNOLOGY IN SYSTEMS
Developing for Multicore Systems
Harnessing the Potential of Multicore Processors for Real-Time Applications
Making a multicore real-time system behave transparently requires a number of communication and system management functions running under the hood to provide transparency while realizing the performance potential of multicore processors.
CHRIS GRUJON, TENASYS
Page 1 of 1
Multicore processing provides many opportunities to the embedded community, including the potential to scale applications while using the same code. In spite of this, many within the embedded community remain averse to using multicore platforms for real-time applications. This is because they have found no optimum way to harness the additional processing capability while maintaining a well-defined environment necessary to ensure determinism. Basic symmetric multi-processing (SMP) and asymmetric multi-processing (AMP) architectures have been tried and found wanting for use with multicore platforms.
As a means for distributing processing resources, SMP is being applied very successfully in non-real-time systems. But for real-time applications, additional features are needed to allow it to work properly, such as the means to allocate a particular core to an application, or what is known as “core affinity.” There are also many unknown variables in an SMP platform that cannot be discounted when developing mission-critical applications. However infrequent it may be, an occasional hiccup can be devastating.
AMP architecture partitions resources into discreet known entities, an environment that makes it easier to develop real-time control systems. This is more like the original development environment for real-time applications, which makes it easier to move those applications over. But multicore processors are single devices that share many resources and services, such as memory, interrupts and I/Os. For real-time applications to function properly, these resources must also be separated. They also need a managed way to communicate with each other to allow segments of an application to pass information much like they would if they were running as a single application on a single processor.
New technologies are enabling AMP architecture to be deployed successfully on multicore platforms. These technologies include embedded virtualization that allows memory, I/Os and their associated interrupts to be allocated to a specific processor core and the application that is selected to run on it, along with a networking layer that provides the communications link between the processes that are running on the individual processor cores.
With these technologies, developers can create fully scalable, real-time applications for multicore platforms. Such a solution moves easily from one to two to four or more cores or even an entirely different platform without re-engineering the application and while preserving dedicated I/O and cores for time-critical tasks.
Control System Requirements for Multicore Platforms
Historically, devices that handled real-time applications were implemented as discrete functional blocks (FBs) of a control system and each was deployed on a separate processing platform. But recent advancements in processing performance and especially multicore architectures are making it possible to consolidate multiple FBs onto a single platform. This not only reduces the cost of the system but also provides the potential for more efficient interconnect between the function blocks, yielding better overall system performance and greater reliability.
Still, consolidating several FBs onto one multicore processor brings up the same challenges of distributing single applications across several processors as discussed above. It requires a system management layer to apportion the common resources such as memory and I/O and the available processing resources.
If one considers a functional block to be comprised of application-specific hardware and software components and a processor or controller to implement the function, a multi-system manager can achieve the isolation required to keep the FBs separate. Using embedded virtualization techniques, each FB in a control system can be assigned to run on a separate core of a multicore platform, with dedicated I/O and memory management (Figure 1). Each core operates under the control of an individual system manager (RTOS), essentially functioning as an asynchronous multi-processing (AMP) system.
Functional blocks (FBs) are mapped from individual platforms to a single multicore platform, each FB running on a separate processor core and RTOS with its associated I/Os.
The next step is to enable the FBs to communicate with each other. This functionality can be added to the application layer or embedded within each FB RTOS. Traditionally this is a solution that has manifested itself as either a network service requiring more execution overhead or managed shared memory requiring extensive engineering effort. Networking interfaces scale nicely but carry a lot of overhead. Shared memory, while optimum in performance, doesn’t scale beyond a single system. With shared memory it is difficult to engineer an implementation that includes an application’s priority level management to ensure that tasks initiated from one core observe the priority levels of tasks on the cores that they are interacting with.
Integrating IPC into the RTOS
A better method is to integrate the inter-processor communication (IPC) mechanism into the RTOS (Figure 2) where it may be automatically handled by the RTOS priority scheduler. This enables programmers to use to standard “exchange objects,” such as mailboxes, semaphores and shared memory blocks within their code to perform IPC between processor cores and across platforms. This embedded IPC methodology is so transparent that it allows FBs to be separated and distributed among available processors with minimal code changes.
Inter-process communications (IPC) capability is integrated into each RTOS, allowing multi-FB applications to be hosted on multicore processors with minimal code changes. This allows large applications like FB3 to be distributed across two processor cores.
For the IPC mechanism to work reliably there must be a means for the FBs to keep track of the state of other FBs on which they depend. To do this a system management layer needs to be added to the environment. This layer should provide the means of signaling system events, such as the creation and deletion of inter-dependent processes and the FBs that they reside on. In addition, both the IPC and the management layers are designed to be extended across multiple instances of the RTOS to enable a system to scale from one to many processors, whether they are on the same platform or even multiple platforms.
Most importantly these solutions should require no additional work on the part of the application developer. The IPC link should work implicitly and exchange objects should be able to reside anywhere within the system and operate at the same level of priority that the application would require them to be if the system were not partitioned. This is a fundamental requirement for meeting the needs of multiple, high-priority processes that could otherwise fight for CPU resources.
Adding a Message-Passing Network Layer
With multiple RTOSs running independently on each core of a multicore processor, it is essential that when a process on one RTOS is interacting with an object on another RTOS, the behavior of both instances of the RTOS must continue to be predictable and the operational overhead must be negligible. More importantly, functional integrity of the system must be preserved. So as mentioned above, task priorities must remain intact across the RTOSs, and the implementation must ensure that false priority such as a low-priority thread on one node blocking a higher-priority thread on another node, cannot happen.
To minimize overhead and maximize predictability, the implementation needs to consist of a lightweight message-passing layer based on a simple shared memory plus inter-processor interrupt for the transport. A local agent on the receiving node can handle operations on behalf of the sending thread to execute that class of operations, including those that wait on an object. At the same time, proxy threads execute at the priority of the original caller ensuring that the calling thread is effectively woken up according to the scheduling policy of the RTOS where the object is resident.
With so many multicore processors available—and more on the way—embedded developers need to find ways to leverage the scalability of these platforms while preserving the core affinity required for real-time functional blocks. Because control systems are actually complex collections of discrete functional blocks, it’s only natural that there should be a way to consolidate them onto a single hardware platform.
As suggested, developers can use a lightweight message passing layer and system manager to enable inter-process communication between the isolated real-time applications running on discrete FBs portioned by means of embedded virtualization technology. This concept forms the foundation for TenAsys’s INTime Distributed RTOS and enables programmers to write applications that run without modification on different system configurations spanning from single-core to multicore processor systems to multi-platform systems with multicore processors.
A Solution Example
Embedded Virtualization technology partitions a multicore platform allowing memory and I/Os to be associated with one instance of a TenAsys INtime RTOS (node) with each node running on a separate processor core.
Global Objects Technology
The essence of Global Objects technology is that INtime RTOS is an object-based operating system that makes use of exchange objects, such as mailboxes and semaphores, to pass data and control to and from processes within the RTOS.
Global Objects extend that capability to processes running on different instances of the RTOS (“nodes”). This means that applications that were once running under one RTOS, on one processor core, can communicate even though they are running on a different node and different processor cores. The same communication is also possible between nodes that exist on different platforms, by extension of the technology over an Ethernet network.
Global Objects technology (GOBSnet) comprises:
• GOBS manager: The GOBS manager is an RTOS process that runs on every node. Its function is to manage the creation, the removal and the linkage of objects and to pass control messages from node to node on the same platform.
• GOBSnet: GOBSnet is an INtime process that run on every INtime platform. Its function is to manage the communication of GOBS messages between platforms. Messages received by the GOBSnet manager for other nodes on the same platform are forwarded via shared memory to the target node.
• Distributed System Manager (DSM): The DSM tracks the state of the system, monitors the health of its components, and cleans up in the event of component termination or failure.
The DSM provides system-level management of tracking the state of the nodes and automatically notifies nodes that are dependent on the service of another node if and when that node has shut down.
The DSM also provides process level management services, such as the notification of the termination of a process on a remote node. These services are available to user applications in order to establish process dependency relationships and can be setup with an API.
When a process wants to pass data to a process on a different node, it makes use of the same APIs as it would if it was communicating with a process within the same node (instance of the RTOS).The only difference is that there needs to be some initial discovery to locate the node where the remote process resides. Once returned, all interaction with the remote exchange object is identical to that of a local one.
From an application/user point of view, the above described operations and the associated management/housekeeping tasks are executed automatically and there is no need to concern oneself with the inner working of GOBSnet to make use of it. Existing applications can easily be partitioned to run across multiple processors. The architecture is designed to scale across additional nodes according to the needs of the application.