Editor's note: To view a PDF version of this article, click here.
While higher integration is tackling device functionality and cost issues, it simultaneously has shifted the design bottleneck to the interconnect. Here, architects are charged with linking a multitude of devices in a complementary fashion that will provide higher overall system-level performance and greater scalability.
To achieve this, the system bus' efficiency must be improved to allow more transactions and to increase bandwidth. A number of approaches have already been taken to accomplish this, starting by moving from multidrop to point-to-point bus architectures to enable system designers to implement wider and faster buses. In addition, techniques such as pipelining and burst transactions are made possible through the move from asynchronous to synchronous parallel buses, and they also allow designers to implement wider and faster buses.
However, widening the bus, increasing the bus frequency and synchronizing the bus to an external clock have themselves introduced some new challenges. These techniques, combined with physical limitations, have forced interconnect architects to migrate from synchronous parallel buses toward point-to-point high-speed serial links with embedded clock and data. This migration has many communication system designers wondering about the trade-offs involved in choosing between serial and parallel interconnects, and whether parallel buses will eventually become obsolete. The answers to those questions can be derived from a close examination of the new generation of high-speed interconnect serial buses and their target applications.
Parallel approaches
A parallel bus can be multiplexed or demultiplexed and have a point-to-point or multidrop architecture. While a multiplexed bus has the advantage of requiring a fewer number of pins on each device and fewer traces on the board, a demultiplexed bus has higher performance and throughput.
A multidrop parallel bus is a shared bus between two or more devices; an example could be an SDRAM interface bus on PC motherboards or embedded-microprocessor applications. The PCI bus is a widely used multidrop architecture (Figure 1). With this approach, designers can meet the demand for higher bandwidth by widening the bus, increasing the clock frequency and pipelining the transactions. However, each of these solutions has its drawbacks. Increasing the bus width limits the maximum achievable frequency of the bus because of skew between signals on the same bus. A wider bus also means more pins on the device, more traces on the board and more pins on connectors, all of which translate to higher cost.
Increasing the multidrop bus frequency limits the number of devices that can be connected to each other on a single segment. To solve that problem, designers typically use bridges to break down a big segment into multiple smaller ones. While bridges help improve the signal integrity, they force designers to use more devices, which directly increases the cost. Examples of multidrop buses and their applications are shown in Table 1.
As both bus frequency and width increase in multidrop parallel architectures, issues such as fan-out and capacitive loading make point-to-point parallel buses more desirable. A point-to-point parallel bus usually is a source-synchronous bus, which consists of data signals, clock and a few control signals. Data signals are synchronized with the source-generated clock. As the frequency and width of the data bus increase, routing the traces on the board becomes more challenging, since just a small skew will cause the receiver to read the wrong data on clock edges. Consequently, designers have to match data signal trace lengths more precisely to the length of the source-generated clock trace. Unfortunately, as bus widths increase, matching lengths for all signals becomes more difficult. All of these traces are competing for limited real estate between two chips, and that ultimately limits the maximum achievable clock frequency.
Another problem with wider buses is the need for more pins on the device (more power), and this becomes a bigger problem when more than two devices must be connected directly using point-to-point buses. This requires duplicating an entire bus on one of the devices. One solution is to use switches to interconnect multiple devices. Basically, each device is linked to a switch through which it communicates to the other devices. Switches used with a point-to-point parallel bus are pad-limited, since the switch needs to have a dedicated bus for each device.
In short, the disadvantages of designing a system based on high-speed parallel buses are high pin counts, leading to higher power consumption, difficult board routing, more board layers and skew mismatch between traces of the same port. In addition, the high number of single-ended signals swinging in a parallel bus generates noise and EMI problems.
Serial interfaces
Serial interfaces, where address, data and control information are all carried on a single link, have been proposed as a solution to the problems of high pin counts, bit skew and synchronization. This signal can be synchronized to an external clock or can carry embedded timing information with data. A serial bus, like a parallel bus, might have a point-to-point or a multidrop architecture.
The two dominant limiting factors when it comes to increasing the data rate in a source-synchronous bus are data and clock skew and channel jitter. A serial link that embeds the clock with data solves the clock data skew problem and allows a serial link to operate four to 10 times faster than a source-synchronous bus. But since the receiver must now recover both the clock and data from the received serial link, clock-and-data recovery requires the use of smart electronics and is more challenging on higher-data-rate links.
As the data rate and distance between two endpoints increase on a serial link, modulation and equalization take priority. For modulation, some designers opt for binary signaling, where each pulse corresponds to just 1 bit, while others take a multilevel signaling approach, in which each pulse corresponds to a symbol that carries 2 or more bits.
Each signaling approach has some advantages and some disadvantages. The main advantage of multilevel signaling is the ability to run a link at lower frequency, yet carry the same amount of data as in binary signaling. The main disadvantages are high power consumption and bigger die size due to complicated clock-and-data recovery circuitry.
Serial I/O applications
High-speed serial I/Os are used for chip-to-chip, board-to-board and chassis-to-chassis connections. In chip-to-chip connections, both chips reside on the same board, and the distance between the two is usually less than 10 inches. In this case all high-speed I/O cells are integrated inside the chip, which puts new requirements, such as low power and small die size, on the I/O cells.
If more than two chips in a design are required to connect to each other directly, a switch is the most popular solution. In this case, each device will consist of one single serial port, and all of the devices are connected to a multiport-switch device. It's important to note that a port may consist of one or more serial I/O links. A focus on low-power, small-die-size, high-speed serial I/O cells becomes more important in designing switches, since each switch is a multiport device and each port may comprise several serial I/Os (Figure 2).
In board-to-board connections, two boards are connected through a backplane. In this case, a high-speed serial connection is routed in three segments. The first segment will be on the source known as the ingress board, the second will be on the backplane and the third segment will be on the destination or egress board. The two backplane connectors, which connect the ingress and egress boards to the backplane, are part of this multisegment trace. A typical trace length in a board-to-board interconnection is somewhere between 20 and 40 inches, with five to six inches of trace on each ingress and egress board and 10 to 28 inches of trace on the backplane. Xaui is one the most popular backplane serial interface standards, while PCI Express, SPI-5 and Serial RapidIO are three promising high-speed serial interfaces that address both chip-to-chip and board-to-board applications.
In chassis-to-chassis connections, two or more chassis that reside somewhere between a few meters and several kilometers are are tied together via fiber-optic or copper cable. Ethernet and Infiniband are the most popular standards to address chassis-to-chassis connections (Table 2). Each of these standards has its own strengths and applications (Table 3) below.
As shown in Table 3, several standards are addressing the same application, while at the same time, one serial standard could be used in several applications. For example, Xaui can be used in chip-to-chip and board-to-board applications in enterprise solutions, even as it gains attention in storage-area network applications. Another example could be Serial RapidIO, which mainly addresses embedded and DSP applications but is also addressing both chip-to-chip and board-to-board applications.
Interoperability
Each standard specifies an interconnection architecture in a multilayer (usually two or three layers) hierarchy. Layer 1, typically called the physical layer, specifies how packets move between two physical points (chip to chip). Electrical interfaces and flow-control specifications are part of the physical-layer definition. Layer 2 is usually called the transport layer. Transport-layer specifications provide information to route a packet between endpoints (user to user). Layer 3, typically called the logic layer, provides necessary information so that endpoints can process the transaction. For example, the logical layer is responsible for defining the frame format, the addressing scheme and the transaction types.
Each standard may call each layer by a different name or even merge the responsibilities of two layers into a single layer, or visa versa, breaking down the functionality of a single layer into two or more layers. Regardless of the number of layers and their names, each standard should have physical specifications to deal with connections between two physical points. It should describe how to route a packet between two system endpoints, and it must provide information regarding frame format so that all endpoints can initiate and recognize transactions. Partitioning an interconnection architecture in multiple layers provides the opportunity to add features to a layer without modifying the other layer's specifications (Figure 3).
Parallel vs. serial
A system engineer should be careful when choosing between serial and parallel buses. As mentioned, high-speed parallel buses have such disadvantages as channel skew, board area, high pin count and more I/O power. But simply replacing a high-speed parallel bus with a high-speed serial bus is no slam-dunk. Since a high-speed serial signal carries both data and clock on the same link, receiver circuitry needs to recover clock and data from a serial signal and then convert incoming serial high-speed data into lower-speed parallel data. This so-called SerPar function is a result of the fact that the internal logic in a typical chip usually runs 10 to 20 times slower than the serial data rate.
On the output side, the internal lower-speed parallel bus needs to be serialized into a single high-speed serial stream (the ParSer function). Clock-and-data recovery (CDR) as well as deserialization and serialization of data introduce added latency. So it is prudent to say that for applications requiring low latency, such as memory devices, a parallel bus is the better choice, despite its disadvantages. For those applications that can tolerate latency, a serial interface is a better option, even at the price of CDR, SerPar and ParSer circuits and their related introduced latency.
Interface bridging
Many systems require both serial and parallel interfaces. Since any interconnection architecture is a multilayer protocol with its own frame-formatting and addressing structure, the use of both parallel and serial protocols on the same system necessitates a bridge to translate protocols back and forth, unless the only difference between the two protocols is the physical layer. In that case, both serial and parallel interconnection protocols are identical at Layer 2 and above, and the only difference between them is the physical specification at Layer 1.
For example, a bridge is required to connect a device with a PCI interface to one with a serial or even parallel RapidIO interface. This bridge-let's call it a PCI-to-RapidIO bridge-will translate PCI transactions to RapidIO transactions and vice versa. On the other hand, no protocol translation is required to translate Serial RapidIO transactions to parallel RapidIO transactions. Also, a RapidIO switch can have multiple serial and several parallel ports, differing only in the physical layer. This is why most of the new serial standards are just extensions of the previous, existing parallel interfaces, as in the case of PCI Express and Serial RapidIO. Layer 2 and 3 specifications for Serial RapidIO are the same as for RapidIO. The RapidIO physical layer is based on a source-synchronous point-to-point parallel bus with four or eight data bit ports; the Serial RapidIO physical layer is based on high-speed serial links with one-lane (or 1x) and four-lane (or 4x) ports.
As data rates and trace lengths increase, designing high-speed serial I/O cells becomes more challenging. Higher-data-rate serial links in binary signaling translate to smaller bit periods, and designing CDR circuitry to recover the clock and data in a smaller eye (shorter time) becomes more difficult. At the same time, increasing the trace length adds to signal distortion and attenuation, which cause a smaller eye on the receiver side. Input equalization, output pre-emphasis and output-level programming are widely used to improve the signal integrity on higher data rates and longer trace lengths. The latter gives the system engineer the option of launching signals with a higher peak-to-peak differential voltage for longer traces.
Related Article
1. "Serial RapidIO: Speeding Up Control-Plane Designs," www.commsdesign.com/story/OEG20020116S0076
About the Author
Herman Eiliya (herman.eiliya@analog.com) is a senior applications engineer at Analog Devices Inc. He has a BS in computer engineering from Sharif University of Technology in Tehran, Iran, and an MSEE from California State University at Fullerton.