Serial Interconnects Move to the Next Generations
PCI Express Gen 3: Twice as Nice -- and Then Some
System designers will soon be able to take advantage of the improved performance and robustness of PCIe Gen 3 technology with new Gen 3 switches that will help them overcome the challenges inherent in multi-gigabit system design.
STEVE MOORE, PLX TECHNOLOGY
Page 1 of 1
With each successive generation of the industry standard PCI Express (PCIe) interconnect, the technology has been able to double its bandwidth, while at the same time adding features to improve system robustness. Designers now using PCIe Gen 1 and Gen 2 technology can look forward to another significant performance jump—to an incredibly fast eight gigabits per second (Gbits/s) per lane and 128 Gbit/s in designs using x16 port widths—along with a number of optimizations for enhanced signaling and data integrity, while maintaining full compatibility with the PCIe protocol stack and interoperability with components that support only the lower speed.
As with PCIe Gen 2, the earliest adopters of Gen 3 technology will be in the graphics space, where there is an insatiable demand for speed. Additionally, we expect to see this type of bandwidth used in fabrics for high-performance compute platforms and RAID storage systems, video capture and broadcast distribution systems where the additional bandwidth will allow the interconnect to outpace the existing interconnect technologies (Figure 1). Additionally, the improvements in link integrity and equalization will extend the adoption of PCIe in cabling and backplanes, with significant opportunities for both reducing cost and power while increasing performance.
PCIe Gen 3 Doubles Bandwidth
PCIe Gen 3 doubles the bandwidth of the interconnect without doubling the encoded bit rate. By comparison, the PCIe Gen 2 bit rate is 5 GigaTransitions per second (GT/s), and its 8b/10b encoding scheme provides an interconnect bandwidth of 4 Gbit/s per lane. A simple approach to doubling the PCIe interconnect bandwidth would have been to maintain the 8b/10b coding and then double the bit rate to 10GT/s, providing Gen 3 with an 8 Gbit/s per lane interconnect bandwidth. However, after extensive analysis, the PCI SIG determined that the overhead associated with the 8b/10b code could be eliminated by using scrambling to obtain DC balance, together with a 128,130 encoding scheme. This results in a useful bandwidth per lane of 8Gbits/s, less ~1.5% due to coding, with an encoded bit rate of only 8 GT/s. This lower bit rate results in lower power consumption, less silicon area and better signal integrity than a standard that would require a full 10 GT/s rate. This of course translates into reduced cost and improved efficiency. Table 1 shows the migration of PCIe bandwidth performance from Gen 1 through Gen 3.
What’s the trade-off? Since there’s no such thing as a free lunch, there must be some impact to the move from 8b/10b to a scrambling coding scheme. The 8b/10b encoding maps each byte of data into one 10-bit character. While using 8b/10b encoding does increase the bit rate, the benefit is that it guarantees a deterministic DC wander. This allows for the AC-coupling of the physical lane signals, and thereby relaxes the requirements for data recovery, simplifying the receiver design of the PHY.
The Gen 3 coding uses scrambling, rather than 8b/10b encoding. Scrambling is a technique by which a known polynomial is applied to the data stream in a feedback topology. Since the polynomial is known, the data is recovered by applying the inverse polynomial. The drawback at the PHY layer is that DC wander can be introduced, requiring the receiver to either correct for DC wander or be able to tolerate the accompanying margin degradation associated with DC wander.
There is also a drawback at the protocol layer: whereas the 8b/10b scheme provides out-of-band control characters that can identify the beginning and the end of a packet; with scrambling these characters do not exist. This will require additional circuitry in the transmitters and receivers, such as packet length counters, to delineate the beginning and ending of each packet. This additional circuitry has the potential to increase cost, power and complexity, but again, there’s no free lunch. Studies show that the trade-offs are worth it, since the reduced bit rate of the scrambling technique allows for the entire PHY to operate at a 20 percent lower frequency and still achieve the same link bandwidth.
PCIe Gen 3 New Features
In addition to providing twice the interconnect bandwidth, the PCI SIG is adding a handful of new transaction layer features to the Gen 3 standard. Transaction layer enhancements focus on two areas: host-intelligent device interactions to support the accelerator model (atomic operations, ID-based ordering, TLP processing hints) and means to better manage and reduce system power consumption (latency tolerance reporting, optimized buffer flush/fill, and dynamic power allocation). These transaction layer protocol options are also being released in PCIe 2.1 since their use doesn’t depend on operation at Gen 3 speeds.
The physical layer interface (PHY) sections of the Gen 3 switches coming from PLX will include several enhancements aimed at improving signal integrity—particularly in long signal traces and the presence of discontinuities that arise from vias and other layout artifacts.
Transmit pre-emphasis circuitry allows the transmitter to shift energy into the precursor or post-cursor portions of the signal without changing the overall power consumption of the PHY. A Finite Impulse Response (FIR) filter is employed in the transmitter to pre-distort the channel to inversely match the channel loss. This allows for optimized signal integrity by matching the impulse response of the driver to the channel.
A five-tap decision feedback equalizer (DFE) block is included in the receiver section. While other filter topologies tend to amplify the channel noise, the five-tap DFE operates by, in effect, canceling inter-symbol interference and reflections. The DFE is especially well suited to overcoming the effect of discrete discontinuities in the channel, such as sharp directional changes found in vias and backplanes. The DFE section can be switched into a pass-through mode for reduced power consumption. The receiver section also features a continuous time linear equalizer (CTLE). A CTLE is effective at optimizing the receiver for long continuous channels, as encountered when driving cables.
Auto-calibration routines are used both to compensate for PVT changes in analog circuit parameters and to adapt equalizer settings. This allows the system design to use longer trace lengths on the circuit board, providing layout flexibility, and enhancing system robustness. Additionally, an advanced receiver detection section is included that prevents the link-down that is often observed in long (lossy) links due to low amplitude
Advanced debug and test features include jitter injection for system margining, and automatic eye-diagram generation. These permit signal integrity testing without use of external test equipment. The Gen 3 PHY can measure both eye height and width, an improvement over Gen 2 Serdes, which could measure only eye width. Additional enhancements and features are likely to be added as the specification is developed, but not necessarily tied to Gen 3 deployment.
Gen 3 Interconnect Targets Graphics Cards
Graphical displays continue to increase in resolution and complexity. This has continued to drive up the bandwidth requirements for the I/O interconnect for graphic cards, and has also driven up the demands for GPU power. With twice the bandwidth of Gen 2, Gen 3 allows for clearer images and more realistic motion, as Gen 3 bandwidth allows the time required to paint an image at a given resolution to be cut in half. Additionally, the PCIe multi-cast (MC) feature is very well suited to enhance the performance of multi-GPU systems. As Figure 2 shows, two GPUs are used to paint a single screen. Because graphics processing has a predefined set of steps running in parallel and in processing order, the CPU can simultaneously cast drawing commands to both GPUs for processing using a PCIe Gen 3 switch with multicast-enabled. Each GPU then renders its specified portion of the screen. As shown in Figure 2, GPU2 then transfers its image to GPU1 via the peer-to-peer communication feature already built into the PCIe switch. GPU1 updates the screen with both images, providing more realistic, high-bandwidth video. Using MC reduces CPU utilization, providing more cycles for general-purpose processing of other activities.
Gen 3 technology will also be utilized in the high-performance world of video capture and broadcast distribution. The next-generation video processors will require I/O interconnects that consume less power and provide higher bandwidth. Current video capture systems use PCI and PCIe Gen1 I/O for the connection of the video codecs into the CPU. As the demand for higher resolution and higher video channel aggregation continues, the bandwidth of PCI and Gen1 PCIe links has become insufficient. Deploying PCIe Gen 3 links will allow immediate relief in the I/O congestion caused by these high-speed streams.
PCIe Gen 3 is on the horizon, and like the previous transitions in the PCIe standard, will bring doubled bandwidth along with significantly enhanced features. This will make the PCIe interconnect an even more compelling solution for high-speed graphics, in addition to all other application markets that thrive on speed and demand reduced power and cost.