Advances in System Connectivity

Leveraging Mainstream PCI Express Advances in Embedded Systems

Embedded systems can take advantage of the solutions being driven by the data center market, using the high-volume cost points and powerful components in ways that are an even better fit to the embedded market than they are in their original usage.


  • Page 1 of 1
    Bookmark and Share

Article Media

There has been a trend in the embedded world to use high-volume devices and software whenever possible for the dedicated, closed systems that populate this market. Except for successful consumer devices, the volumes of embedded products do not warrant having custom-designed components beyond the specialized functions that are unique to the platform. This approach allows the designers of embedded systems to make use of the wide range of high-performance, low-cost, power-efficient components, software development systems (compilers and debuggers) and analysis equipment.

This is especially true in the area of interconnect, since it is difficult, expensive and time-consuming to create a dedicated method of interconnection when there are so many existing methods to provide this capability. Embedded system designers would prefer to focus their limited engineering dollars on where they add value, rather than in reinventing transport mechanisms that provide limited additional benefits.

Some examples of general-purpose embedded interconnect usage are AdvancedTCA (ATCA) and PXI Express. ATCA provides the traces for a variety of different existing interconnect standards, specifying an efficient form factor but allowing standard components to be used to construct the backplane. PXI Express takes the PCI Express (PCIe) standard as its basis and adds measurement and automation improvements.

There are several recent trends in the interconnect world that have a direct and beneficial application to embedded platforms. These are being deployed in the data center, where the volume being driven by the explosion of Internet-related traffic is high enough to justify the expense of creating specialized subsystems. But the same constraints that are forcing the innovation—power and space at affordable cost points, coupled with high performance—apply directly to many embedded applications. As difficult and expensive as it is to apply power and to cool a data center rack, for example, it is even more challenging in an embedded context, where the power and cooling distribution may be limited—and in many cases the system is powered by a battery. Compounding the difficulties, some embedded applications cannot provide a fan or heat-sink due to size constraints. The trends that are addressing these constraints are convergence and shared I/O—two sides of the same valuable coin.

Convergence: A Question of Efficiency

Convergence describes an approach to interconnect where the building blocks are separated—or disaggregated—rather than combined into fully functional subsystems. An aggregated system is best understood by looking at a client server, or traditional rack-based data center server. Each motherboard or blade has a CPU, storage and communications, and is self-contained. This is convenient and allows the system to use the same components and subsystems as the even higher volume client server systems, but it is highly inefficient for a number of reasons.

For one thing, aggregation tends to delay innovation in the majority of the functions. Since it is time-consuming and expensive to create new blades and motherboards, designers tend to wait until a new processor is available and then include the most up-to-date storage and communication devices on the same board at that time. Since the communications and storage devices deploy improvements on their own cycle, they are by definition not going to be updated when the newest innovations are ready; they have to wait until a CPU is ready. 

Storage technology in particular is changing rapidly as solid-state drive (SSD)-based systems are being rapidly brought to market. This constraint is especially inefficient in embedded applications, since the storage subsystem is often the critical part of the package, where large amounts of data are being acquired for later processing. And embedded systems often have very long lives, so the subsystems become obsolete long before a new system is ready to be deployed.

The mix of processing power, storage capability, communication speed and interconnect protocol varies depending upon the application. Since it is too expensive and complicated to provide systems with a sufficient number of permutations, and since there is usually some minimal amount of each function that is needed, there is almost always going to be wasted capability somewhere in the system. As true as this is with a data center application, it is even more so in an embedded context, since—unlike the data center—you cannot dedicate boxes to specific applications that better suit the configurations needed. In an embedded product, you often only have that single system. It is difficult to expand or upgrade the system quickly and easily, since you cannot just add more storage, for example, without getting more of the other components you don’t need.

Convergence solves these problems. In a converged (and disaggregated) system, the individual subsystems—processing, storage and communications—are separated and interact through a common, high-speed, low-latency interconnect (Figure 1). Each subsystem can be upgraded or expanded based on the newest technology, and as needed for the application. For example, a protocol analyzer can be tuned to have extremely high-speed storage, in which the raw data is streamed to fast SSD devices, where it is analyzed in a non-real-time manner. Or it can be enhanced to offer very high processing capability, where the data is inspected and analyzed in real time.

Figure 1
A converged system uses a common interface fabric so that functional components have equal access.

It has become clear over the past several years that the most effective converged interconnect in the data center world is PCIe. This interconnect is fast. It can scale up to 64 Gbit/s in each direction with an easily deployed x8 Gen3 connection. It is low-latency (~150ns/switch hop) and is already a native connection on almost every device necessary to create a compelling embedded system. There are technologies that will be deployed over the next several years that will enable PCIe to provide an even more complete fabric for the data center, such as ExpressFabric from PLX Technology, and embedded applications can take advantage of these same enhancements to create powerful dedicated systems.

Shared I/O: The Enabler of Truly Converged Systems

One such enhancement is the ability to share I/O devices across multiple CPUs in a system. The discussion above explained why it is more efficient to disaggregate the elements of a system, but it pretty quickly becomes clear that this only makes sense if you can share the storage and communication among multiple processing elements. Otherwise, you will still be limited in the amount of I/O that you can use in a system. In this way, sharing I/O is a precondition to a truly converged system.

In the data center world, processing elements have grown so power-hungry that they cannot be housed in densities that allow efficient deployment. This is even more of a factor in embedded applications, where the constraints are generally more stringent for the reasons explained above. Data center systems are experimenting with a concept called microservers, where a larger number of less-powerful CPUs are connected together, and this provides a lower-power and more cost-effective approach where the application is either inherently distributed, or where a large single application can be decomposed and processed in parallel. Because of the value of this approach, microserver platforms are coming to market in record numbers.

The applications that satisfy this approach in the data center—such as Web hosting—are a subset of the total. Embedded systems, however, are an attractive match for this new approach. Most embedded applications are bounded and predictable (unlike a general purpose server in a cloud data center), and because of this they can be written to take advantage of the underlying hardware that makes up a microserver. 

A data analysis platform, for example, is inherently well-suited to a decomposed approach, where separate data streams can have their own dedicated processing engine instead of sharing a much larger CPU. Alternatively, the application can be written to allow portions of the processing to be done in parallel by a large number of smaller CPUs. Data analysis can often be implemented in a vector processing manner, which is highly amenable to decomposition and parallelism. The number of processors in such a system can scale up to hundreds of devices, with thousands of processing cores being involved in the computation. 

In order to efficiently enable the vast number of processing engines to operate, they need to have common access to the information in the system. The data flowing in comes from a communication device or a special purpose acquisition front end. It is saved in—and processed from—the storage subsystem. This can be enabled by a “fabric” that has shared I/O capability (Figure 2).

Figure 2
Creating Powerful Hybrid Systems by adding functions as needed.

Here, again, PCIe with some standards-based enhancements is a compelling solution to this problem. There already exist commodity devices and software to allow multiple CPUs to share storage and communication devices, and a fabric can quickly and easily be created to allow a high-performance, low-latency flow from input, through processing, into storage and back out for further analysis.

PCIe has a number of further benefits coming from high-performance computing technology that are advantages for the embedded world. SSDs have been created by taking flash memory and enabling it to be accessed in a manner that is similar to hard discs. In the high-performance

enterprise arena, the fastest-growing segment of this market hooks SSDs directly to PCIe. So, once you already have the PCIe backbone, it is quick and easy to hook up large, powerful SSD arrays to your system.

The other technology that can be leveraged is a general-purpose graphics processing unit (GPGPU), where graphics processing engines have been repurposed to handle tasks that are highly parallel in nature. Once again, this describes many applications in test and measurement systems. The GPGPU devices are connected to the system—and to each other—through wide PCIe interfaces.

And the most powerful part of the solutions that have been described here is that they can be combined as necessary to get the right mix of processing (high-performance CPU, microserver or GPGPU), storage and communication. Modular systems can even be deployed that allow a system to be upgraded in the field, or by the customer, so that the basic platform can have an extremely long life. 

PLX Technology
Sunnyvale, CA
(408) 774-9060