TECHNOLOGY IN CONTEXT
New Interconnect Fabric Trends Help Unlock Potential of High-Performance Embedded Computing
Some of the current usage trends of fabric in some of the more sophisticated HPEC systems indicate the need to consider the pros and cons of fabric-oriented technical attributes along with the associated driving commercial technologies.
MARC COUTURE, MERCURY SYSTEMS
Page 1 of 1
There is no shortage of compute engine types available to high-performance embedded computing (HPEC) systems. However, just as one wouldn’t use a single element from the periodic table to build an entire racing car, the same holds true for processing elements. Multi-core Intel devices, PowerPCs, graphical processing units (GPUs), field programmable gate arrays (FPGAs) and other processors, ranging from the simplistic to the exotic, are selected for their individual attributes and strengths in order to construct embedded systems that optimize every last ounce of performance per unit size, weight and power. The exact mix and ratios are not only application dependent, but may also be influenced by other less technical considerations such as available engineering talent and device availability over the projected lifetime of the program.
The next critical stage of HPEC system construction lies in the sensor ingest, compute interconnect and data egress (i.e., I/O plus fabric). As with the case of leveraging the respective strengths of heterogeneous processing elements, the same analogy is applicable to I/O and fabric interconnect. By playing to the individual strengths of fabrics whose protocols have been optimized for different end purposes, effective data flow, critical for real-time HPEC processes, is realized.
A standards-based form factor is required to support a heterogeneous fabric scheme from a physical infrastructure standpoint. This equates to the need for lots of signaling interconnect—both single-ended and especially differential pair. Speed is paramount and must be measured in GHz, upward of and beyond 10 GHz. Maintaining the signal integrity of all this high-density signaling in HPEC systems running at such phenomenal rates requires resiliency against brutal vibration profiles. This becomes especially acute at the temperature extremes of arctic cold and desert heat experienced by many platforms in tough environmental situations. The VPX form factor (size 6U), as defined by VITA 46, fits the bill with its use of connectors such as TE Connectivity’s RT 2-R supplying up to 192 differential pairs of signaling per module for starters.
There are two fundamental types of modules, payload and switch. Payload modules are often compute-centric or geared toward supporting I/O such as streaming sensor data or perhaps high-capacity storage. Switch modules on the other hand are usually comprised of high port-count switching components and act as central fabric hubs for the entire VPX system. These modules are in turn placed into a chassis with a VPX backplane that ranges from just a few slots up toward twenty slots. Larger systems typically contain two switch cards, whereas smaller systems may just have a single switch or none at all, in which case the payload boards are “meshed” together.
The Right Fabric for the Right Topology
Next comes the mapping of heterogeneous fabrics onto an appropriate backplane topology, and for that there are a plethora of OpenVPX (VITA 65 standard) module, slot and backplane profiles. Of particular relevance to constructing HPEC systems is the fact that the OpenVPX Multi-Plane (e.g., SL T6-PAY-4F1Q 2U2T-10.2. 6/ MOD6-PAY-4F1Q2U2T-12.2.1-13) profiles lend themselves to supporting multiple fabric interconnect levels between OpenVPX modules. Each standard payload module slot has four main fabric planes as referenced in Figure 1. Similar to the human anatomy, these four planes coexist, serve different functions and operate in parallel for optimal efficiency.
OpenVPX Multiplane Backplane.
The lowest level plane is the management plane. Physically implemented with I2C along the backplane and Ethernet out of the chassis, this “embedded nervous system” utilizes IPMI and SMNP protocols to keep an eye on critical physical “vitals” such as temperature, power and health. One level up from that is the control plane; Gigabit Ethernet usually resides here. As the name implies, the control plane is typically centric to a managed switch in an OpenVPX switch slot or chassis manager, interconnecting multiple payload modules for the purpose of passing command, control and application-related status messages among them.
Driving the Data Plane
The OpenVPX data plane is one level up from the control plane. This is where high-rate sensor and signal processing data flows. Just like the control plane, the data plane fabric topology is often designed with one or more switch modules interconnecting many payload modules. There are numerous choices in terms of fabric protocol in the data plane. Serial RapidIO (SRIO), InfiniBand and 10G/40G Ethernet are all popular options. They are often chosen based on an affinity for particular attributes associated with the given fabric. All three protocols share superior benchmarks in the form of data throughput, latency and determinism—critical for real-time sensor-oriented HPEC systems.
Originally developed by Mercury Systems and Motorola, SRIO is one of the more popular data plane fabrics for embedded modules performing digital signal processing. Specifically designed for clustering networks of peer-to-peer embedded processors with minimal latency and software overhead, SRIO’s protocol layers are terminated in hardware keeping it efficient and compact. SRIO is currently in the process of moving from a 3.125 Gbaud rate to the SRIO Gen 2 rates of 5.0 and 6.25 Gbaud. SRIO endpoints on payload modules are typically implemented with bridging endpoints from Integrated Device Technology Inc. (IDT). Likewise, switch modules leverage RapidIO switch components as shown in Figure 2.
SRIO OpenVPX Switch Module.
The example OpenVPX Switch in the Figure above not only acts as a system hub for the SRIO data plane but also for the Gigabit Ethernet control plane, and even the management plane. There are many processing elements that “speak” native SRIO such as Freescale PowerPCs and Texas Instruments DSPs; however, there are more that do not. With Intel devices, the packets flowing over the PCI Express (PCIe) lanes must be converted to SRIO before being networked into an SRIO cluster of processors, hence the IDT bridge ASIC. A viable alternative to an ASIC-based approach is to use FPGAs from Altera or Xilinx with the appropriate IP load and enough SERDES lanes to service all interconnecting ports. The latter approach offers a significant advantage: protocols can be changed provided that there is enough room within the FPGA in terms of resource utilization and that the SERDES can keep up with the required Gbaud rate. In this case, IP is instantiated in the FPGA that marries up Intel PCI lanes to a 10 Gigabit Ethernet switched data plane instead of SRIO. In any case, the roadmap for SRIO has been somewhat quiet of late. However, Gen 3 SRIO rates of 10 Gbaud per lane are in the plan and it’s a good bet that we will see enabling technology in the not too distant future.
Bridges and switches from Mellanox Technologies are slated to have a profound influence on the OpenVPX data plane, providing InfiniBand at different rates including four lanes of five Gbaud per lane (double data rate) and 10 Gbaud per lane (quad data rate). As opposed to SRIO originating out of the embedded COTS world, InfiniBand has roots in the High Performance Computing (HPC) and data center worlds. IBM’s BladeCenter servers use InfiniBand to connect large Intel server-class multicore devices with software to middleware infrastructures such as Message Passing Interface (MPI) and OpenFabrics Enterprise Distribution (OFED), which have strong intrinsic ties back into the very fabric itself. Now with the injection of the Mellanox fabric technology into OpenVPX payload and switch modules, similar architectures with high core count, high memory capacity, InfiniBand, and MPI/OFED, can now be deployed in highly mobile, tough operating environments. The same devices from Mellanox can also be used in a 10/40 Gigabit Ethernet mode, and the analogy holds for the OpenVPX payload and switch modules in that these same modules can now run in Ethernet mode. Ethernet’s greatest strength has been and continues to be its ubiquity as a standard, and of course, its highly scalable nature. However, heavy software involvement in regards to protocol stack termination in addition to related latency penalties has precluded it from certain signal processing applications. The solution for embedded processing systems running real-time applications appears to be Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE), which is expected to become increasingly popular.
The Expansion Plane: Lanes and Speed
This brings us to the fourth plane within the OpenVPX Multi-Plane topology, the expansion plane. PCIe is typically implemented on the expansion plane. This is a super highway of as many PCIe lanes as possible that run between adjacent payload modules. For instance, an Intel-based payload module may interface to a GPU payload module in an adjacent slot via eight lanes or even 16 lanes of PCIe. The idea with the expansion plane in this capacity is very high bandwidth via many bonded, fast links. Unlike SRIO, InfiniBand and Ethernet, PCIe is not geared for peer-to-peer multi-computing clusters. However, PCIe is well suited to the OpenVPX Expansion plane where a single processor payload acts as the PCI “root complex” and other “peripheral” modules are memory mapped into it. Given that PCIe is native to so many devices, no protocol adapting is required. Whereas the data plane is often used to scale to many channels of homogenous processing elements over a switched fabric, which is dynamically changing its configuration from moment to moment, the PCIe-based expansion plane is often used to create a heterogeneous slice of processing elements (e.g., FPGA to Intel to GPU) over two or three adjacent OpenVPX slots. The next big advance regarding the expansion plane is an increase in speed from Gen 2.0 PCIe at five Gbaud to eight Gbaud with PCIe 3.0.
Only time will tell if there will ever be a single fabric winner along the OpenVPX data plane. PCIe might even become a contender on the data plane as some of the newer PCIe switches from PLX Technology are incorporating new non-transparent bridging (NTB) capabilities. This allows peer-to-peer networks to be formed using just PCIe, which once again is a native endpoint in just about all of the main processing devices in the embedded market. Regardless of the fabric, influencing factors in choosing a fabric may include the salient advantages; however, legacy momentum often has an even greater pull.
Some customers invest in the development of a custom sensor payload board that injects data right onto the fabric, in which case simply changing fabrics would require a new board design. An existing code base with strong intrinsic ties to the fabric can create real staying power for the incumbent fabric. Finally, in some platforms, such as the defense industry, there are strong incentives to use certain fabrics that are perceived as more standard than others. Regardless of the specific fabric choice, engineers in the embedded computing space are largely subject to other more vast markets that will dictate whether or not enabling technologies will continue along a thriving roadmap or will hit a dead end. One thing is certain: core counts and performance per device will only continue to increase, and until a single device can consume all applications, interconnect fabric will always be a necessity.