SOLUTIONS ENGINEERING
I/O Technology and Subsystems
Intelligent Switches: The Next-Generation PCI Express Interconnect
The latest generation of PCIe switches available improves not only the device performance with 5.0 GTransfers/s signaling and low latency, but also overall system performance.
MIGUEL RODRIGUEZ, PLX TECHNOLOGY
PCI Express (PCIe) continues to be the interconnect of choice for high-performance embedded applications. The need for increased performance in a system results in an increase in the number of controllers used and, ultimately, in a direct increase in the number of PCIe lanes needed for interconnecting these high-performance controllers. Processors and chipsets have a finite number of lanes, and the need for a PCIe switch becomes obvious. A PCIe switch provides fan-out capability by providing additional downstream ports for PCIe endpoints and an upstream port as the path to the processor/chipset. The controllers used in these systems command aggressive data streaming requirements to and from system memory and as a result, high-performance PCIe switches with built-in features for monitoring and regulating bandwidth are required.
Even with PCIe Gen2 and its 5 GTransfers/s throughput, systems can experience a number of performance-inhibiting obstructions. Fortunately, a new generation of PCIe Gen2 switches is on the market that can help system designers overcome these problems.
PCIe Controllers Determine Server Interconnect
In a high-performance server there are PCIe-based controllers with interfaces such as Fibre Channel (FC), InfiniBand (IB) and Gigabit Ethernet (GigE) at either one or ten gigabits per second, connecting to storage and networking elements. These controllers attempt to transfer data as fast as they can without consideration of other system components. It is very unlikely for a single endpoint to experience performance limitations behind a PCIe switch as long as the ports in both the switch and the endpoint match in terms of number of lanes and speed. However, it is highly likely that a combination of these PCIe controllers will be connected to a system–several of them behind a PCIe switch.
In a case where two or more endpoints are connected to a processor/chipset through a PCIe switch, the upstream port link-width is wider than that of the downstream ports. This common PCIe switch configuration results in unbalanced upstream versus downstream link-widths. Throughput in the upstream direction is not likely affected. On the other hand, throughput in the downstream direction can be negatively affected as a result of the unbalanced port widths. This is particularly true when the number of read requests initiated by the endpoints is weighted in favor of one of them; one endpoint inevitably dominates the bandwidth and ultimately the queue resources of the processor/chipset. Consequently, the other endpoints suffer reduced bandwidth.

This phenomenon can make it appear as if the system is congested and, thus, not performing efficiently. Figure 1 illustrates a typical server with PCIe slots spanning from a PCIe switch. GigE and Fibre Channel controllers are connected to the slots. In this case, the Fibre Channel controllers are the aggressive devices, and as a result, the bandwidth to the GigE devices is affected.
A read request packet, at a high level, consists of a header without any payload associated with it. Instead, it has a request size field, which tells the completer how much data it needs to return to the requester in the form of a completion. A typical PCIe switch will blindly forward the read requests received from the endpoints up to the processor/chipset on a first-come-first-served basis, doing so without violating the flow-control mechanism in place. An endpoint capable of generating many read requests can command large data completions, which in turn exhaust the available queue resources in the processor/chipset.
Read Pacing is a new feature implemented in Gen2 PCIe switches from PLX (with other variations available from other chip vendors). When Read Pacing is enabled, the PCIe switch throttles the rate in which the read requests are forwarded to the processor/chipset. That is, the PCIe switch does not forward the read requests in a blind fashion up to the processor/chipset. The intelligence in the PCIe switch determines the bandwidth capabilities of the endpoint, which in turn determine the rate at which the read requests are allowed to be forwarded up to the processor/chipset. In this manner, the completion bandwidth does not exceed that of the endpoint. As a result, only the amount of read requests required to fulfill the endpoint’s bandwidth capabilities are forwarded to the processor/chipset. The remaining read requests from the endpoint are queued inside the switch.

Kontron
Interphase