PCB Signal Integrity Design for DDR2-800 and DDR3

This article mainly deals with the design considerations of signal integrity and power integrity when designing printed circuit boards (PCBs) for DDR2 and DDR3, which are quite challenging. The focus of the article is to discuss related technologies in the case of as few PCB layers as possible, especially 4-layer boards, some of which have been used maturely before.

This article mainly deals with the design considerations of signal integrity and power integrity when designing printed circuit boards (PCBs) for DDR2 and DDR3, which are quite challenging. The focus of the article is to discuss related technologies in the case of as few PCB layers as possible, especially 4-layer boards, some of which have been used maturely before.

1 Introduction

At present, the speed of DDR2, which is more commonly used, has reached 800 Mbps, or even higher speeds, such as 1066 Mbps, while the speed of DDR3 has reached 1600 Mbps. For such a high speed, from the perspective of PCB design, strict timing matching is required to meet the integrity of the waveform. There are many factors to be considered, all of which will affect each other. However, There is still some personality between them, they can be classified as PCB stackup, impedance, interconnect topology, delay matching, crosstalk, power integrity and timing, there are many EDA tools that can calculate and Simulation, of which Cadence ALLEGRO SI-230 and Ansoft’s HFSS are used more.

Table 1 shows the common and proprietary technical requirements that DDR2 and DDR3 have.

2. PCB stackup and impedance

For a substrate (such as a 4-layer board) that is constrained by the number of PCB layers, all its signal lines can only go on the TOP and BOTTOM layers, the two layers in the middle, one of which is the GND plane layer, and the other layer is VDD plane layer, Vtt and Vref are routed on the VDD plane layer. When 6 layers are used for routing, it becomes easier to design a dedicated topology, and the PI is improved due to the smaller spacing between the Power and GND layers.

Another parameter impedance of the interconnection channel must be constant and continuous in the design of DDR2. The impedance matching resistance of 50 Ohms for single-ended traces must be used for all single-ended signals, and impedance matching is achieved. For differential signals , 100 Ohms termination impedance matching resistors must be used for all differential signal terminations, such as CLOCK and DQS signals. Additionally, all matching resistors must be pulled up to VTT for 50 Ohms, and the ODT setting must also be kept at 50 Ohms.

In the design of DDR3, single-ended signal termination resistors between 40 and 60 Ohms can be optionally designed to ADDR/CMD/CNTRL signal lines, which has proven to have many advantages. Moreover, the termination matching resistor that is pulled up to VTT may need to be selected differently depending on the trace impedance of the SI simulation results, usually its resistance value is between 30-70 Ohms. The impedance matching resistance of differential signals is always 100 Ohms.

3. Interconnection Path Topology

For DDR2 and DDR3, the signals DQ, DM and DQS are all point-to-point interconnections, so no topology is required. However, out of the list, it is not in the design of multi-rank DIMMs (Dual In Line Memory Modules). Such. In the point-to-point mode, it is easy to achieve impedance matching through the impedance setting of the ODT, so as to achieve its waveform integrity. For ADDR/CMD/CNTRL and some clock signals, they all need to be interconnected at multiple points, so a suitable topology needs to be selected. Figure 2 lists some related topologies, of which the Fly-By topology is a kind of A special daisy chain that does not require long wires, or even stubs sometimes.

For DDR3, all of these topologies are applicable, but the premise is that the traces are kept as short as possible. The Fly-By topology has good waveform integrity in dealing with noise, but it is difficult to implement on a 4-layer board, requiring more than 6 layers, while the daisy-chain topology is easy to implement on a 4-layer board of. In addition, the tree topology requires that the length of AB is very close to the length of AC (Figure 2). Considering the integrity of the waveform and increasing the trace length of the branch as much as possible, colleagues also have to meet the constraints of the board layer. In the DDR3 design based on a 4-layer board, the most reasonable topology is with the least stub (Stub). ) in a daisy-chain topology.

PCB Signal Integrity Design for DDR2-800 and DDR3

For DDR2-800, all of these topologies apply, with minor differences. However, the daisy-chain topology has proven to be advantageous in terms of SI.

For more than two SDRAMs, usually, the corresponding topological structure is selected according to the placement of the devices. Figure 3 shows specially designed topologies for different placements. Among these topologies, only A and D are the most suitable PCB designs for 4-layer boards. However, for DDR2-800, these topologies listed all meet the waveform integrity, while in DDR3 designs, especially at 1600 Mbps, only D is sufficient.
  

4. Delay matching

When matching the time delay, the trombone method is often used for wiring. In addition, some vias will be added when the board layer is inevitably switched during wiring. Unfortunately, but all these curved traces and traces with vias, when straightened to an ideal trace of equal length, they have unequal delays, as shown in Figure 4.
 

Obviously, it is well understood that the trombone method mentioned above is not equivalent to the straight line in terms of delay, and the line with vias is even more obvious. When the length of the center line is equal, the delay of the trombone line is smaller than the actual delay of the straight line, while for the line with vias, the delay is larger. The generation of this delay, there are two ways to solve it. One way is to do accurate delay matching calculations in the EDA tool, and then control the length of the traces. Another approach is to reduce the mismatch within an acceptable range.

For the trombone line, the asymmetry of the delay can be reduced by increasing the length of L3, because there will be coupling between parallel lines. The detailed results can be clearly seen through the SigXP simulation, as shown in Figure 5, L3 (in the figure The length of S) is different, and the result will have different delays. Extending the length of S as much as possible can better reduce the asymmetry of delays. For microstrip lines, it is necessary for L3 to be more than 7 times the distance from the trace to the ground.

The time delay of the trombone line is affected by the coupling between its parallel lines. A method that can reduce the degree of coupling without increasing the spacing is to use a saw tooth line. Obviously, saw tooth wire has better effect than trombone wire, however, it requires more space. Due to various reasons that may cause different delays, in actual design, strict calculations should be performed with the help of CAD tools to control the delay matching of traces.

Considering the factor of vias on the 6-layer board in Figure 2, when a ground via is placed close to the signal via, the impact on time delay must be considered. For example, the length of the microstrip line in the TOP layer is 150 mils, the microstrip line in the BOTTOM layer is also 150 mils, the line width is 4 mils, and the parameters of the via are: barrel diameter=”8mils”, pad diameter = “18mils”, anti-pad diameter = “26mils”.

There are three options for comparison. One is that there are no ground vias near the vias interconnected by vias, so the return path can only be provided by the edge of the PCB 250 mils away from the vias; the second is Yes, a microstrip line up to 362 mils; the third is that there are four ground vias around a signal line. Figure 6 shows the S-Parameters for a conventional line with 60 Ohm, from the figure it can be seen that the S-Parameters with signal vias surrounded by four ground vias are like a continuous microstrip line, thus Improved S21 characteristics. It can be seen that in the absence of a return path near the signal via, the signal via will greatly increase its impedance. In today’s high-speed systems, latency is particularly important.

Now make a test circuit, similar to Figure 5, the driving source is a linear 60 Ohms impedance output trapezoidal signal, the rising and falling edges of the signal are both 100 ps and the amplitude is 1V. This signal source is in the three ways of Fig. 6, and it is terminated with a 60 Ohms load, and its excitation is a periodic signal of 800 MHz. At 0.5V, we observe the time delay from the signal source to the receiver, showing the delay difference between them. The result is shown in Figure 7. In the figure, only the rising edge of the signal is shown. It can be clearly seen from this figure that the delay of the via with four ground vias is only 3 ps compared to the straight line. , and in the case of no ground via surround, the delay is 8 ps. It can be seen that it is helpful to increase the density of ground vias around the signal vias. However, in a 4-layer PCB, this is not completely feasible, because the signal lines are close to the power plane, which makes the return path of the signal determined by the degree of coupling between them. Therefore, in the design of a 4-layer PCB, it is very important to control the degree of coupling in order to meet the power integrity requirements.
 

For DDR2 and DDR3, the clock signal is transmitted differentially, while in DDR2, the DQS signal is communicated either single-ended or differentially depending on the speed at which it operates, and differential when operating at high speed. Obviously, under the same length, the switching delay of the differential line is smaller than that of the single-ended line. Depending on the timing simulation results, the clock signal and DQS may need to be a little longer than the corresponding ADDR/CMD/CNTRL and DATA lines. In addition, it must be ensured that the clock lines and DQS are routed among their associated ADDR/CMD/CNTRL and DQ lines. Because DQ and DM are transmitted at very high speed, they need to have strict length matching in each byte, and there can be no vias. The sensitivity of differential signals to impedance discontinuity is relatively low, so it is not a big problem to change layers of traces. When routing, give priority to clock lines and DQS.

5. Crosstalk

When designing microstrip lines, crosstalk is a very important factor in generating time delays. Usually, the mutual influence of crosstalk can be reduced by increasing the spacing between parallel microstrip lines. However, this is a big drawback in the rational use of trace space, so it should be controlled within a reasonable range. A typical rule is that the spacing of parallel traces is greater than twice the distance from the trace to the ground plane. In addition, ground vias also play a very important role. Figure 8 shows the degree of coupling between vias with and without ground. In the case of multiple ground vias, the degree of coupling is reduced by 7 dB. Taking into account the cost budget of the interconnection path, it is necessary to perform appropriate simulations on both sides. When a periodic excitation is added to all network lines, the signal jitter caused by crosstalk will be generated. Through simulation, the signal can be observed in the time domain. Jitter, so that through a reasonable design, comprehensive consideration of space and signal integrity, select the optimal trace spacing.

6. Power Integrity

Power integrity here refers to the tolerance of its power supply under maximum signal switching conditions. When this tolerance requirement is not met, many problems will result, such as increased clock jitter, data jitter and crosstalk.

Here, the theory related to decoupling can be well understood, and now the discussion starts with the formula definition of “target impedance”.

Ztarget=Voltage tolerance/Transient Current (1)

The key here is to understand the effect of the transient current in the worst switching case, another important factor is the switching frequency. In all frequency ranges, the decoupling network must ensure that its impedance is equal to or less than the target impedance (Ztarget). On a PCB, the capacitors formed by the power supply and the ground layer, as well as all decoupling capacitors, must be able to ensure decoupling between about 100KHz and 100-200MH. When the frequency is below 100KHz, the large capacitor in the voltage regulation module can be well decoupled. If the frequency is above 200MHz, it should be decoupled by on-chip capacitors or special packaged capacitors. Actual power integrity is quite complex, taking into account the packaging of the IC, the switching frequency of the emulated signals, and the PCB power-hungry network. For PCB design, the decoupling design of the target impedance is relatively simple and a practical solution.

There are three types of power supply in DDR design, they are VDD, VTT and Vref. The tolerance requirement of VDD is 5%, and its instantaneous current is different from Idd2 to Idd7, which is described in JEDEC in detail. Power integrity can be achieved through the plane capacitance of the power layer and a certain number of dedicated decoupling capacitors. The decoupling capacitors vary in size from 10nF to 10uF, and there are about 10 in total. In addition, surface mount capacitors are most suitable, which have lower soldering resistance.

Vref requires tighter tolerances, but it carries less current. Obviously, it only requires very narrow traces, and with a decoupling capacitor or two, the target impedance can be achieved. Since Vref is very important, place the decoupling capacitors as close as possible to the pins of the device.

However, routing the VTT is quite challenging because it not only has tight tolerances, but also has large transient currents, which can be easily calculated. Ultimately, its target impedance matching can be achieved by adding decoupling capacitors.

In a 4-layer PCB, the spacing between layers is relatively large, thereby losing the advantage of capacitance between power layers, so the number of decoupling capacitors will be greatly increased, especially for high-frequency capacitors less than 10 nF. Detailed calculations and simulations can be achieved with EDA tools.

7. Timing Analysis

The calculation and analysis of timing sequences are described in detail in some related literature. The following lists the 8 aspects that need to be set and analyzed:

1. Write build analysis: DQ vs. DQS

2. Write Hold Analysis: DQ vs. DQS

3. Read Build Analysis: DQ vs. DQS

4. Read Hold Analysis: DQ vs. DQS

5. Write Setup Analysis: DQS vs. CLK

6. Write Hold Analysis: DQS vs. CLK

7. Write Setup Analysis: ADDR/CMD/CNTRL vs. CLK

8. Write Hold Analysis: ADDR/CMD/CNTRL vs. CLK

Table 2 presents an example for Write Setup analysis. Some data in the table need to be obtained from the controller and memory manufacturers, the data in the section “Interconnect” is taken from the SI simulation tool. For DDR2, all 8 items above need to be analyzed, while for DDR3, items 5 and 6 do not need to be considered. In PCB design, the length tolerance must ensure that the total margin is positive.

8. PCB Layout

In the actual PCB design, considering the requirements of SI, there are often many compromises. Usually, those with higher requirements for signal integrity need to be prioritized. When drawing a PCB, when considering some relevant factors, then the reliability will be higher for designing the PCB.

1. First, set the topology and related constraints in the relevant EDA tools.

2. Break out the BGA pins and place the ADDR/CMD/CNTRL pins in the middle of the DQ/DQS/DM byte group, due to all these grouping operations, in order to have as few signal crossings as possible, some individual pins may be is switched to other area wiring.

3. From the crosstalk simulation results, try to minimize the length of stubs. Often, stubs can be cut, but not all pins can. It may be achieved with only two sections of traces between the BGA pads and the memory pads, but this trace must be very thin, which will increase the PCB manufacturing cost, and not all traces are only Two stages are required, unless using tiny vias and via-in-disk techniques. Ultimately, given signal integrity tolerances and cost, a compromise may be chosen.

4. Place the decoupling capacitor of Vref close to the Vref pin; place the decoupling capacitor of Vtt at the outer end of the farthest SDRAM; and place the decoupling capacitor of VDD close to the device. Smaller value decoupling capacitors need to be placed closer to the device. In a proper decoupling design, not all decoupling capacitors are placed close to the device. All the pins of the decoupling capacitors need to be routed after the fan-out, which can reduce the impedance. Usually, the fan-out traces at both ends are perpendicular to the capacitor wiring.

5. When switching plane layers, try to match the length and add some ground vias. These should be simulated well in the EDA tool in advance. Usually, in terms of time domain analysis, the delay matching of the two lines in the differential line should be done to ensure that the error is within +/- 2ps, and the other signals should be +/- 10ps.

9. DIMMs

Most of the rules described previously apply to having one or more DIMMs on a PCB, the only exception being that the decoupling factor to be considered in a DIMM differs from that in a DIMM group. In DIMM groups, daisy-chain topologies and tree topologies with few stubs are applicable for the topologies used for ADDR/CMD/CNTRL.

10. Cases

The relevant rules introduced above have been widely used in DDR2 PCB, DDR3 PCB and DDR3-DIMM PCB. In the following case, we use MOSAID’s controller, which provides operation functions for DDR2 and DDR3. In terms of SI simulation, the IBIS model is used, and the memory model comes from MICRON Technolgy, Inc. The model of DDR3 SDRAM provides a rate of 1333 Mbps. Here, data is manipulated at 1600 Mbps. For the unbuffered DIMM (MT_DDR3_0542cc) EBD model is from Micron Technology, all the following waveforms are using the usual test method, and are calculated and simulated at the SDRAM die level. In the 6-layer board shown in Figure 2, only the TOP and BOTTOM layers are wired, and the memory is composed of two pieces of SDRAM in a daisy-chain manner. In the case of DIMMs, only one unbuffered DIMM is used. Figure 9-11 is a flash diagram and signal integrity simulation diagram of the TOP/BOTTOM layer routing.

(The left is the ADDRESS and CLOCK network, the right is the DATA and DQS network, the clock frequency is 800 MHz, the data communication rate is 1600Mbps)

(The left is the ADDRESS and CLOCK network, the right is the DATA and DQS network, the clock frequency is 400 MHz, the data communication rate is 800Mbps)

(On the left is the ADDRESS and CLOCK network, on the right is the DATA and DQS network)

Best of all, Figure 12 shows two compared data signal eye diagrams, one simulated and the other actually measured. In all of the above cases, the level of perfection in the integrity of the waveform is exciting.

11. Conclusion

In this paper, various related factors of SI and PI have been comprehensively introduced for the design of DDR2/DDR3. It is feasible to design 800 Mbps DDR2 and DDR3 in a 4-layer board, but it is very challenging for DDR3-1600 Mbps.

The Links:   G190ETN011 NL6448AC33-15