PCI Express Meets Serial RapidIO
Serial RapidIO Reaches a Crossroads with PCIe in Intel-Based DSP Designs
Bringing the advantages to DSP recently achieved in Intel’s x86 architecture into systems previously dominated by the Power Architecture requires an efficient means of bridging between the PCIe world of Intel to the SRIO scheme implemented in most board designs.
IAN STALKER, CURTISS-WRIGHT CONTROLS EMBEDDED COMPUTING AND DEVASHISH PAUL, IDT
Page 1 of 1
DSP systems designed for use in today’s signal processing applications require optimal bandwidth and reliability in rugged environments. To deliver the near real-time processing of analog sensor data required to locate the signals of interest, these systems require the optimal combination of data throughput and low latency. For these applications, Serial RapidIO (SRIO) is the preferred interconnect because of its high throughput, low latency, and the ease of architecting SRIO peer-to-peer processing clusters on modules, across backplanes and between chassis.
Bridging PCI Express (PCIe) to SRIO changes the embedded military DSP landscape by providing a practical, cost-effective approach for using, for the first time, Intel x86 architecture microprocessors in system designs long dominated by Freescale’s (and formerly, Motorola’s) Power Architecture. Historically, the embedded DSP market has evolved toward using general purpose microprocessors, and away from dedicated DSP processors, such as Analog Devices’ SHARC and Texas Instruments’ 320C40 and 320C6701k. The PowerPC/Power Architecture, with its AltiVec math processor for floating point calculations suitable for DSP algorithm processing, emerged as the clear favorite of DSP board and system designers. Demanding DSP systems typically comprise a mix of CPUs and FPGAs. A consequence of Power Architecture’s prevalence in embedded military DSP system designs was the establishment of SRIO as the favored serial communications fabric for interconnecting these types of devices, since PowerPC processors featured SRIO support built in.
The stature of SRIO in the embedded market is reflected in the OpenVPX (VITA 65) standard, VITA’s open standard for building embedded military systems using VPX backplanes, which defines SRIO support for designers integrating high-performance radar, sonar, image processing and signal intelligence applications. Meanwhile, Intel microprocessors have become increasingly attractive to military embedded system designers. Helping to make Intel CPUs attractive for DSP applications is the recent debut, along with Intel’s “Sandy Bridge” architecture second generation Core i7 processors, of the new Advanced Vector Extensions (AVX) math library. The AVX math library is an alternative to AltiVec that delivers double the performance of the venerable Power Architecture 128-bit floating point math tool by boosting the size to 256-bit wide floating-point instructions.
Until recently, the major remaining hurdle for using Intel CPUs in multiprocessor embedded DSP designs was the fact that Intel historically provided no support for SRIO. Now, that hurdle has been surmounted with the recent introduction of the new Tsi721 bridge chip from Integrated Device Technology (IDT), a new PCI Express Gen2 to Serial RapidIO Gen2 protocol conversion bridge for x86 processors (Figure 1). This bridge chip supports 5 Gbaud PCI Express Gen2 and Serial RapidIO Gen2 interfaces. For the designer of embedded DSP systems, the combination of Intel’s new AVX-based Core i7 CPUs and IDT’s new bridge chip creates a true technology milestone after which Intel-based DSP systems are able to deliver unmatched embedded DSP performance and advantages.
The Tsi721 from Integrated Device Technology offers efficient bridging between PCI Express and Serial RapidIO that uses minimal overhead and small package size, weight and power
The Serial RapidIO Advantage
The bridge chip addresses and solves limitations faced by earlier attempts to handle PCIe to SRIO protocol conversion in FPGAs, an approach that was both expensive and lacked support for the SRIO messaging required for control loops in signal processing applications. The use of an efficient bridge chip that features eight direct memory access (DMA) and eight messaging engines/channels, each capable of transferring large amounts of data and operating at the wire speed of 16 Gbit/s overcomes these limitations. While Power Architecture offers built-in SRIO support, options for Intel architecture-based distributed systems are more limited. One option is InfiniBand, a fabric popular in the enterprise computing world, but not often used in military system designs. Another choice is Gigabit Ethernet (GbE), but SRIO, especially the latest Gen2 SRIO, offers significant advantages for DSP designs over GbE. SRIO was designed for processor-to-processor communications within a system (be it chip to chip, board to board or chassis to chassis) and features guaranteed data packet delivery without risk that the packet might be dropped anywhere in the network. Ethernet, designed for very large networks connecting over great distances, doesn’t guarantee packet delivery. With Ethernet, packet delivery requires a packet verification protocol that adds significant overhead and burdens the processor as it checks every packet.
The new generation of Gen2 S-RIO switches operate at 20 Gbit/s signaling, at more than 2x the bandwidth of 10 GbE (after header information is removed, the actual payload data is compared). Compared to 10GbE, Gen2 SRIO offers significantly higher performance, lower and predictable end-to-end latency and saves valuable board slots. 10 GbE performance drops when packet sizes are small, which is the preferred approach in embedded systems for better real-time performance. For 256-byte packets, 10 GbE delivers only 8 Gbit/s throughput. A bridge chip that features eight DMA and eight messaging transmit and receive queues is able to support the full 16 Gbit/s line rate for 64-byte and larger packets, making it possible to transfer large amounts of data in a DSP system with low latency at 16 Gbit/s.
Even better, SRIO supports distributed switch architectures, and SRIO switches are small, low-power devices (starting at 21 x 21 mm, ~3W typical). Their size and functionality make it common for board designers to provide SRIO switching on board DSP engine cards to locally aggregate multiple computing nodes. Compared to SRIO, Ethernet switches are significantly larger (typically 30 x 30 mm to 40 x 40 mm) making them impractical to deploy on 3U or 6U VPX multiprocessor DSP cards. Also, Ethernet switches have no small lane count options while SRIO Gen2 switches are available in 16 and 32 lane options. Where Ethernet switches are used in DSP applications today, they require a separate card, taking up valuable slot space and adding weight (a typical rugged card weighs 1.0 - 1.2 Kg) in size, weight and power (SWaP)-constrained military platforms. For systems that require a high level of fault tolerance, designers must add a second redundant Ethernet switch, consuming an additional slot and adding even more weight. Additional performance and overall system power penalties associated with Ethernet switches are end-to-end packet termination latency that can be in the order of milliseconds, and the need for processor intervention to terminate the protocol stack
When it comes to OpenVPX system topologies, SRIO also comes out on top. Most embedded DSP systems deployed today have fewer than eight slots. One of the common topologies used on these distributed processing systems is a full mesh architecture in which each card is connected to every other card. This approach is attractive because it delivers very high card-to-card bandwidth and does not exhibit a single point of failure. OpenVPX defines four ports on the data plane. A system designer can use these four ports to build five-card distributed systems in which each card has a connection to the other four. While the five-card full-mesh is the ultimate in card-to-card bandwidth, larger systems can also be constructed using distributed switching where packets pass through the switches of intermediate cards. The high bandwidth of Serial RapidIO makes this practical for systems up to 16 slots in size.
The Importance of Slot Count
In comparison, a typical Intel-based DSP system using 10 GbE requires at least six slots, with one for a dedicated Ethernet switch card. A similar SRIO system requires only five slots since each DSP card can have multiple bridges per processor, mapped into a small SRIO switch and then have 4x4 SRIO links to the backplane. In addition to benefits for SWaP, minimizing board count also improves system Mean Time Between Failure (MTBF). Distributed switch systems (one example is the VITA 65 BPK6-CEN05-11.2.5-n backplane profile) make use of the local SRIO switch and thus avoid the need for a separate switch card and save valuable, costly slot(s). For example, if the system were using a ½ ATR Short enclosure (four 1-inch slots), this capability would save 25 percent of the space and a considerable amount of power. For large systems, centralized switch architectures are often preferred, and SRIO is equally adept at this approach.
Curtiss-Wright Controls Embedded Computing (CWCEC) implements the new IDT bridge on its dual Second Generation Core i7-based CHAMP-AV8 DSP OpenVPX engine (Figure 2). Each board employs four of the PCIe to SRIO bridge chips, providing two interfaces to each CPU, with each interface significantly faster than the bandwidth available from a 10 GbE interface. The bridges support 32 Gbit/s data rate for each Core i7. Overall, the CHAMP-AV8’s processors deliver up to 269 GFLOPS. With IDT’s Tsi721 bridge chip, the card delivers triple the bandwidth of first-generation VPX products—up to 160 Gbit/s fabric performance.
The new CHAMP-AV8 for Curtiss-Wright uses four of the Tsi721 chips along with the second-generation Intel Core i7 processors for bandwidth beyond that of 10 Gbit/s Ethernet.
Curtiss-Wright Controls Embedded Computing
Integrated Device Technology
San Jose, CA.