

# Technical Note

## Point-to-Point System Design: Layout and Routing Tips for LPDDR2 and LPDDR3 Devices

### Introduction

LPDDR2 and LPDDR3 devices require a well-designed environment, package, and PCB to support today's high-speed/low-power applications.

Proven layout and routing techniques are required for mobile designs using unterminated point-to-point interfaces. Derived from electronics theory and Micron design experience, the guidelines in this technical note can enhance signal integrity (SI) and reduce noise for LPDDR2 and LPDDR3 devices in unterminated point-to-point and point-to-multipoint multilayer board designs.

The guidelines and examples in this technical note represent one of several acceptable methods and may not be applicable for all point-to-point designs.

Previous Micron technical notes have centered on design, layout, and simulation techniques focused on standard SDRAM designs. Refer to TN-46-11, "DDR Simulation Process," TN-46-14, "Hardware Tips for Point-to-Point System Design," and TN-46-19, "LPSDRAM Unterminated Point-to-Point System Design" for details.

**Table 1: Definitions**

| Term           | Definition                                                                         |
|----------------|------------------------------------------------------------------------------------|
| DDP            | Dual die package                                                                   |
| Power delivery | Power and ground layout and decoupling techniques used to improve signal integrity |
| SDP            | Single die package                                                                 |
| SSO            | Simultaneous switching outputs                                                     |
| $V_{DDQ}$      | DQ and I/O signal power; the two are equivalent unless otherwise noted             |
| $V_{DD}$       | Digital power for the device core                                                  |
| $V_{REFDQ}$    | Reference for DQ input buffers                                                     |
| $V_{SS}$       | Digital ground                                                                     |
| $V_{SSQ}$      | DQ and signal ground; the two are equivalent unless otherwise noted                |

## LPDDR2 and LPDDR3 Comparison

Minimal architecture differences exist between LPDDR2 and LPDDR3 technologies. LPDDR2 and LPDDR3 devices support a source-synchronous data strobe where data is transferred on both the leading and trailing strobe edges.

When designing a point-to-point memory system, the major differences to be aware of between LPDDR2 and LPDDR3 devices are:

- LPDDR3 devices increase in bandwidth from 1066 MT/s to 2133 MT/s
- LPDDR3 devices support programmable on-die DQ termination (ODT) with dynamic control during writes controlled by the ODT signal

The following table provides a detailed comparison of the devices.

**Table 2: Feature Comparison**

| Feature                           | LPDDR2-S4B Device                                                                            | LPDDR3 Device                                              |
|-----------------------------------|----------------------------------------------------------------------------------------------|------------------------------------------------------------|
| Density                           | Up to 8Gb                                                                                    | Up to 32Gb                                                 |
| Prefetch size                     | 4n (128 bit total)                                                                           | 8n (256 bit total)                                         |
| Core voltage ( $V_{DD}$ )         | 1.2V                                                                                         | 1.2V                                                       |
|                                   | 1.8V                                                                                         | 1.8V                                                       |
| I/O voltage                       | 1.2V                                                                                         | 1.2V                                                       |
| Maximum clock frequency/data rate | 533 MHz/DDR1066                                                                              | 800 MHz/DDR1600; 1066 MHz/DDR2133                          |
| Burst lengths                     | 4, 8, 16                                                                                     | 8                                                          |
| Configurations                    | x16, x32                                                                                     | x16, x32                                                   |
| Address command signals           | 14 pins (multiplexed command address)                                                        | 14 pins (multiplexed command address)                      |
| Address/command data rate         | DDR (rising and falling clock edge)                                                          | DDR (rising and falling clock edge)                        |
| PASR                              | Full, half, or quarter-array with individual bank and segment masking for partial bank modes | Individual bank and segment masking for partial bank modes |
| Drive strength                    | 34Ω                                                                                          | 34Ω                                                        |
|                                   | 40Ω                                                                                          | 40Ω                                                        |
|                                   | 48Ω                                                                                          | 48Ω                                                        |
|                                   | 60Ω                                                                                          | —                                                          |
|                                   | 80Ω                                                                                          | —                                                          |
|                                   | 120Ω                                                                                         | —                                                          |
|                                   | ZQ calibration for ± 10% accuracy                                                            | ZQ calibration for ± 10% accuracy                          |
| Per bank refresh                  | Yes (8-bank device only)                                                                     | Yes                                                        |
| Output driver                     | HSUL_12                                                                                      | HSUL_12                                                    |
| DPD                               | Yes                                                                                          | Yes                                                        |
| DLL/ODT                           | No/No                                                                                        | No/Yes (ODT on DQ, DQS, DM pins)                           |
| Package options                   | POP, MCP, discrete                                                                           | POP, MCP, discrete                                         |
| Temperature grades                | WT (-30°C to 85°C)                                                                           | WT (-30°C to 85°C)                                         |
|                                   | AT (-40°C to 105°C)                                                                          | AT (-40°C to 105°C)                                        |

## On-Die Termination

The best signaling is achieved when driver impedance matches trace impedance.

Like LPDDR2 devices, LPDDR3 devices reduce layout constraints by eliminating the need for discrete termination to  $V_{TT}$  and the need for  $V_{TT}$  generation for the data bus. Unlike LPDDR2 devices, however, LPDDR3 devices support on-die DQ termination (ODT), a feature that enables the device to enable and disable termination resistance for the DQ bus during writes via the ODT control pin.

ODT is designed to improve signal integrity of the memory channel by enabling the DRAM controller to independently turn on and off the internal termination resistance for each DRAM device in the system.

ODT, in conjunction with programmable LPDDR3 output drivers, is supported by additional mode register settings that increase system flexibility and optimize signal integrity compared to previous LPDRAM generations. These benefits include:

- Programmable drive strength for the DQ bus—provides closer impedance matching for point-to-point systems, improving signal quality
- Periodic ZQ calibration—neutralizes voltage and temperature shifts, improving signal quality
- Dynamic ODT—applies desired termination opportunistically during WRITE operations

During power-down, control of termination via the ODT pin is disabled if MR11[2] = 0. (See the LPDDR3 data sheet for the timings associated with power-down entry and exit.)

If MR11[2]=1 and MR11[1:0] are non-zero, ODT is supported during CKE power-down with ODT control through the ODT pin.

$V_{REFDQ}$  is offset depending on the termination selected. Based on the ZQ calibration resistor, two settings are supported:  $240\Omega$  and  $120\Omega$ .

A summary of the LPDDR3 ODT resistors is shown in the table below.

**Note:** ODT consumes additional power when activated. LPDDR2 devices do not support ODT.

## ODT Mode Register

**Table 3: MR11 Opcode Bit Definitions**

| Feature    | Type       | Op-Code | Definition                                                                                          |
|------------|------------|---------|-----------------------------------------------------------------------------------------------------|
| DQ ODT     | Write-only | OP[1:0] | 00b: Disabled (default)<br>01b: Reserved<br>10b: RZQ/2<br>11b: RZQ/1                                |
| PD control | Write-only | OP[2]   | 00b: ODT disabled by DRAM during power-down (default)<br>01b: ODT enabled by DRAM during power-down |

## Programmable Drive Strength and Bus Topology

Because LPDDR2/LPDDR3 devices are designed for mobile point-to-point applications, a programmable drive strength option is provided to match memory DQ/DQS drive strength to the impedance of the memory bus, eliminating the need for a termination voltage supply ( $V_{TERM}$ ) and a series-termination resistor.

For LPDDR2 devices, six drive strengths are supported: 34Ω, 40Ω, 48Ω, 60Ω, 80Ω, and 120Ω. For LPDDR3 devices, three drive strengths are supported: 34Ω, 40Ω, and 48Ω.

In low-power and low-cost applications, it is recommended to avoid using termination resistors to save power and reduce costs. In point-to-point systems, termination resistors can be eliminated if the signal environment and drive impedance are carefully selected.

This section contains simulation results (data eye diagrams) for READ operations that assess signal integrity. Figure 1–Figure 9 show results with the controller and DRAM positioned approximately 25mm, 15mm, and 5mm apart. The interconnect between the controller and the DRAM is designed so that its characteristic impedance is approximately 50Ω.

Because LPDDR2 and LPDDR3 DRAM devices are used in point-to-point applications, data eye diagrams for WRITE operations would be very similar to those shown for READ operations, providing that the drivers for the memory controller are well-matched to the DRAM drivers. System engineers can also take advantage of on-die termination (ODT) options ( $R_{tt} = 120/240$ ) on LPDDR3 devices for higher speed (1600/1866/2133) or for heavily loaded systems to improve signal integrity.

In the figures:

- AptACDC is the AC DC aperture, which is the opening of the data eye
- RailOShoot is the rail overshoot, which measures peak distortion
- RailUShoot is the rail undershoot, which measures peak distortion

Analysis covers all components in the memory/controller signaling channel (including I/O driver, memory package, PCB traces, controller package, and receiver).

**Note:** Signal quality and integrity can be degraded because of many factors; including:

- Crosstalk (within the DRAM and controller packages, as well as with signal bus/traces on the system board)
- Intersymbol interference (ISI), which is related to impedance and bus topology
- Simultaneous switching outputs (SSO)/power delivery noise (SSO-induced power supply noise)

To assess signal integrity at a system-level, these factors are included in the simulation results.

Because this technical note is intended for system board designers, various bus topologies are presented to illustrate their impact on overall system performance:

- Bus length: 25mm/15mm/5mm
- Bus/traces spacing: 0.2mm, 0.15mm
- Load at receiver end (SDP—single load, DDP—dual load)
- Bus termination: no termination vs 240Ω

Figure 1 shows a simulation when the device is configured for drive strengths (DS) of 34Ω, 40Ω, and 48Ω, driving into an ideal 25mm data bus (ideal transmission line with  $Z_0 = 50\Omega$ , no crosstalk for the bus). The eye diagrams show reasonable aperture and voltage margin; signal integrity looks good for this ideal system and serves as a baseline.

**Figure 1: Ideal Channel/No Coupling (SDP; Length = 25mm; Space = N/A; Clock = 800 MHz)**



The next three figures show simulations for non-ideal channels/coupled lines with 25mm, 15mm, and 5mm signal lengths on a single die package (SDP) LPDDR3 platform; the optimal diagrams are shown in Figure 4 (page 8).

Because it is difficult for board designers to design a system with 5mm signal length, Micron recommends the signal length should be somewhere between 10mm to 20mm to reduce signal noise and crosstalk.

**Figure 2: Non-Ideal Channel/Coupled Lines (SDP: Length = 25mm; Space = 0.2mm; Clock = 800 MHz)**



**Figure 3: Non-Ideal Channel/Coupled Lines (SDP; Length = 15mm; Space = 0.2mm; Clock = 800 MHz)**



**Figure 4: Non-Ideal Channel/Coupled Lines (SDP; Length = 5mm; Space = 0.2mm; Clock = 800 MHz)**



Figure 5 (page 9) shows non-ideal channels/coupled lines with 25mm channel length on a dual-die package (DDP) LPDDR3 platform without termination. As shown, the signals are degraded and the eye diagrams do not look optimal.

Signal integrity can be degraded for a number of reasons; including:

- The rising and falling edges of the signal can become distorted in electrically long structures due to dispersion and losses.
- Coupling from neighboring conductors can introduce additional noise.
- Mismatches can occur and the non-ideal nature of the source and load ends of the interconnect structure can affect results.
- Power supply noise and simultaneous switching outputs can affect results.

To greatly reduce degradations, select  $34\Omega$  pull-down/ $40\Omega$  pull-up drive strengths and use an external  $240\Omega$  termination resistor on the same DDP platform (see Figure 5). Micron recommends using an external  $240\Omega$  weak termination resistor at the controller for any DDP LPDDR2 and LPDDR3 side-by-side application

**Note:** A six-layer board with 8Gb LPDDR3 DDP was used for the DDP simulation, where it was assumed one die was active and the other die was idle.

**Figure 5: Non-Ideal Channel/Coupled Lines (DDP; Length = 25mm; Space = 0.2mm; Clock = 800 MHz)**



Figure 6 shows non-ideal channels/coupled lines with 25mm channel length on an SDP LPDDR3 platform with 34Ω pull-down, 40Ω pull-up, and a 240Ω external termination resistor. As shown in the figure, SI looks reasonably good.

**Figure 6: Non-Ideal Channel/Coupled Lines (SDP; Length = 25mm; Space = 0.2mm; R<sub>tt</sub> = 240; Clock = 800 MHz)**



The last figures show non-ideal channels/coupled lines with 25mm channel length on a DDP LPDDR3 platform with 0.15mm space instead of 0.2mm, which is tighter pitch.

**Figure 7: Non-Ideal Channel/Coupled Lines (DDP; Length = 25mm; Space = 0.2mm; R<sub>tt</sub> = 240; Clock = 800 MHz)**



Figure 8 (page 11) shows the simulation result without termination; Figure 9 (page 11) shows the simulation result with an external 240Ω termination resistor at the controller. Micron recommends minimum spacing between the two signals be 0.15mm or greater for the higher LPDDR3 frequencies.

**Figure 8: Non-Ideal Channel/Coupled Lines (DDP; Length = 25mm; Space = 0.15mm; Clock = 800 MHz)**



**Figure 9: Non-Ideal Channel/Coupled Lines (DDP; Length = 25mm; Space = 0.15mm; R<sub>tt</sub> = 240; Clock = 800 MHz)**



## Power Delivery

As clock frequencies increase, timing and noise margins shrink.

The LPDDR2 and LPDDR3 devices are high frequency x32 and x64 (dual channels) devices with multiple I/Os that can perform simultaneous switching output (SSO). When SSO occurs, large amounts of current are sourced from or sunk to the power delivery network (PDN), especially on  $V_{DDQ}$  and  $V_{SSQ}$  lines. If the PDN is not well designed, the SSO will create significant noise on power supply rails causing transistor performance degradation which can translate into timing issues, particularly when full drive strength is selected for the interface. In some extreme cases, the power supply can collapse leading to system failures.

Memory vendors should ensure a robust PDN within the memory package, and controller vendors should ensure a robust PDN within controller packages. System engineers should design a robust PDN for the system board to power all components on the board.

To design a good, solid PDN on a system board:

- Ensure the path impedance from voltage regulator module (VRM) to the memory MCP device and platform processor are as low as possible. Lower path impedance results in lower voltage ripple on the board.
- Use partial plane structures. If not possible, ensure the power deliver routes for the  $V_{DDQ}/V_{SSQ}$  and  $V_{DD}/V_{SS}$  lines are as wide as possible.
- Ensure sufficient decoupling capacitors on board are closely placed to the system processor and the memory MCP devices to absorb high frequency current spikes.
- Run full system simulation to ensure the PDN is robust enough to sustain the peak current demand with comfortable margin.

## Decoupling

Adequate power decoupling on the PCB is necessary to prevent excessive  $V_{DD}$  noise and resulting memory errors in applications where power-supply draw can change by magnitudes in a single clock cycle.

Decoupling serves two purposes:

- Maintains stable supply voltages for the components
- Provides a return path for signal currents

Normal practice is to decouple the power planes around the components because the planes provide a low inductance path between the decoupling capacitors and the components.

For decoupling of internal functions, such as refresh, lower frequencies are involved (around the 1 MHz range). For providing a return path for signals, higher frequencies are involved (up to 50 MHz). Above 50 MHz, external decoupling is less effective because the components rely on internal decoupling that is part of all components.

For internal functions, the amount of capacitance is the larger concern. For return currents, inductance is the primary concern. Suitable capacitors in 0202 (standard)/0505 (metric) or 0201 (standard)/0603 (metric) packages provide low enough inductance. Using capacitors with sufficient capacitance in these packages meets the requirements for the full frequency range. It is not required to have separate capacitors for each range.

**Figure 10: Recommended Decoupling Capacitor Placement for 178-Ball LPDDR3 Package**

Capacitors can be shared between components that are side-by-side or back-to-back. When shared back-to-back, the current demand per capacitor is higher. An effort should be made to have a lower inductance path for the capacitors. This lower inductance can be achieved with increasing the number of vias and using wider traces. Adding vias if possible will be more effective. Via inductance is related to the length of the via barrel. If the plane that is being decoupled is the next layer down from the capacitor, then the inductance is very low and there is little benefit to using a second via at this end on the component. If there is concern with sharing capacitors for components that are back-to-back, add the additional capacitors.

See the following Micron technical notes for detailed decoupling information:  
TN-46-02, “Decoupling Capacitor Calculation for a DDR Memory Channel” and  
TN-00-06, “Bypass Capacitor Selection for High Speed Designs.”

## Layout: Trace Widths, Intragroup Spacing, and Intergroup Spacing

Two types of trace spacings influence system signal integrity: intragroup spacing and intergroup spacing.

Intragroup spacing (S1) is the distance between two adjacent traces within a related set of signals having similar or equivalent functionality. The control signals group, clocks, address bus, data bus, and data/strobes are all signal sets. The data bus is sometimes broken down into data bytes (sets of eight signals) plus the associated strobe and mask signal.

Intergroup spacing (S2) is the distance between the two outermost signals of different signal sets. For example, if the control signal set is routed together and adjacent to the address signal set, intra-pair spacing is the distance between the two individual signals from the control and address sets that are closest together.

The difference between S1, S2, and trace width (S3) using the control and address groups is shown in the figure below.

**Figure 11: S1, S2, and S3 Spacing**



**Note:** White indicates copper traces.

Recommended spacing is dependent of dielectric thickness and routing pitch. A general recommendation is to have the routing pitch be 3X the dielectric height. Closer spacing than the recommended minimum for S1 or S2 can increase crosstalk. Specific guidelines are shown in Table 4 (page 15).

If all signals are routed tighter than the recommended spacing for their full length, crosstalk is likely to disrupt SI; however, if spacing limits are not met for short segments, SI is not likely to suffer much.

Crosstalk is a function of trace spacing, dielectric height, and slew rate; for systems with slew rates <1 V/ns, trace spacing can be closer. Lower-speed systems generally have more timing budget, which accommodates more crosstalk without affecting SI.

## Trace Width (S3) Design Guidelines

Recommended S3 for functional signal sets:

- DQ lines = 4 mil
- DQS lines = 4 mil
- CA (Command/Address) lines = 4 mil
- Clock lines = 4 mil

Supply voltages  $V_{DD}$ ,  $V_{DDQ}$ ,  $V_{SS}$ , and  $V_{SSQ}$  must be composed of planes as much as possible. Short connections (<8 mils) are commonly used to attach vias to planes in Micron designs. Any connections required from supply voltages to vias for device pins or decoupling capacitors should be as short and as wide as possible to minimize trace impedance. Micron recommends the trace width match the via size at the decoupling capacitors.

**Table 4: Intragroup and Intergroup Spacing Design Guidelines**

| Signal Set           | Signals                                               | Spacing Type | 4 Mil Dielectric | 5 Mil Dielectric | Unit | Notes |
|----------------------|-------------------------------------------------------|--------------|------------------|------------------|------|-------|
| Data/Data Strobe     | DQ to DQ                                              | S1           | 8                | 11               | mils |       |
|                      | DQ to DQS                                             | S2           | 8                | 11               | mils |       |
|                      | DQS in a byte lane<br>to DQS in a different byte lane | S1           | 8                | –                | mils | 1     |
|                      | DQ and DM                                             | S2           | 8                | 11               | mils |       |
| CA (Command Address) | Adjacent address lines                                | S1           | 8                | 11               | mils |       |
|                      | Address lines                                         | S2           | 8                | 11               | mils |       |
| Clock                | CK#-to-CK                                             | S1           | –                | –                | mils | 2     |
|                      | CK#-to-DQS line<br>(or CD in group of two)            | S2           | –                | –                | mils | 3     |
|                      | Differential pair (CK, CK#)<br>to any other signal    | S2           | 8                | 11               | mils |       |

- Notes:
1. DQS signals are generally routed in the midst of related nibbles or bytes, so DQS-to-DQS spacing is not relevant.
  2. All CK and CK# signals lines should have differential characteristic impedance ( $Z_{diff}$ ) of 90–100Ω. (120Ω is not practical.)
  3. Generally not an issue as the CK# and DQS lines are not adjacent.

## PCB Stackup

A well-designed PCB stackup is critical in eliminating digital switching noise. The ground plane must provide a low-impedance (Low-Z) return path for digital circuits.

Micron has experienced good results using a PCB design with a minimum of six layers:

- Layers 1 (top) and 6 (bottom) for signals and  $V_{DD1}$
- Layers 2 and 5 for ground ( $V_{SS}$ )/power ( $V_{DD}$ )
- Layers 3, 4, and 6 for signals

The required number of signal layers is determined by the number of signal groups to be routed and the required isolation between them. The number of devices to be routed and the size of the PCB dictate whether three or four signal layers are required. Simulation should be done to provide feedback on signal integrity for a given application. The following figure shows a six-layer PCB with four internal layers.

**Figure 12: Six-layer PCB with Four Internal Layers**



## Four-Layer Board Considerations

Using a four-layer board is not recommended. An adequate solution using four layers may be possible if the design is simple enough and meets the following requirements:

- Power delivery is not compromised
- Referencing is maintained:
  - Address and clock reference to 1.2V ( $V_{DD}$ )
  - DQ references to ground
- Adequate decoupling is provided
- Controller aligns or adapts to the layout that the LPDDR2/LPDDR3 component dictates based on routing in only two signal layers

Compromises to the requirements listed above should be kept to a minimum. The most likely compromise is switching from side to side to allow some signals to cross over. If this is done, it is recommended these transitions occur near decoupling capacitors that conduct the return currents between the planes.

The figure below shows an example four-layer board. If a design can be routed similarly to this example, a four-layer board may be considered.

**Figure 13: Example Four-Layer Board Solution**



- The top layer supports address, control, and clock.
- $V_{DD1}$  is provided on the top layer; all  $V_{DD}$  supplies are connected together with the exception of  $V_{DD1}$ .
- Layer 2 is  $V_{DD}$ ; this layer provides the reference for the address, control, and clock.
- Layer 3 is ground; this layer provides reference for the DQ signals.
- The bottom layer provides for the DQ routing; there is limited flexibility of swapping DQ traces. The controller must accommodate the routing as defined by the LPDDR2/LPDDR3 component. The strobes tend to be on edge of the respective byte lane.

## PCB Dielectric

The dielectric constant of PCB materials for most memory applications is 3.6 to 4.5, varying with frequency, temperature, material, and the resin-to-glass ratio. FR-4, a commonly used dielectric material, averages 4.1 with signaling at 1 GHz. FR-4 is a copper-clad laminate that is adequate for most applications.

## Design with Timing Budget

Suggested practice is to look at the design from a timing budget standpoint to provide flexibility in the routing portion of the design, if there is suitable margin. This starts with simulation. By referencing the eye diagrams in Programmable Drive Strength and Bus Topology (page 4), a setup and hold time can be established. From here, the parameters not included in the simulation must be added.

Typical routing for LPDDR2/LPDDR3 components requires two internal signal layers, two surface signal layers, and two other layers ( $V_{DD}$  and  $V_{SS}$ ) as solid reference planes.

Memory devices have  $V_{DD}$  and  $V_{DDQ}$  pins, which are both typically tied to the PCB  $V_{DD}$  plane. Likewise, component  $V_{SS}$  and  $V_{SSQ}$  pins are tied to the PCB  $V_{SS}$  plane. Each plane provides a low-impedance path to the memory devices to deliver  $V_{SSQ}$ . Sharing a single plane for both power and ground does not provide strong signal referencing. With careful design, it is possible for a split-plane design to work adequately:

- Designs should reference DQ and strobe signals to  $V_{SS}$ ; address, command, control, and clock may reference  $V_{DD}$  or  $V_{SS}$ .
- Generally, to provide adequate power delivery, some signals must be referenced to  $V_{DD}$ . Address, command, control and clock are usually good choices.
- Signals should never reference  $V_{DD1}$  (1.8V).

## Return Path

It is required to have a reference plane for all high-speed signals.

Minimizing loop area reduces transient current noise and electromagnetic interference (EMI). To minimize loop area, the return path should be directly below the signal trace. Where this cannot be achieved, loop area will increase. To minimize loop areas, use a single layer as much as possible for the return path of a specific signal. Referring to Figure 12 (page 16), if layer 5 is the reference plane for a specific signal routing, routing all signals on layer 4 would be a good choice.

Jumping between layer 4 and layer 6 would have minimal disruption on the return path. Jumping between layer 5 and layer 1 would have a more significant disruption. While the reference planes are the same ( $V_{SS}$  or  $V_{DD}$ ) in both cases, the return currents must find a path between these two planes. This will increase the loop area. If this is required, place a  $V_{SS}$  or  $V_{DD}$  via, whichever is appropriate, in the area of the transition to keep the increase in loop area as small as possible. Under components there are always a number of  $V_{SS}$  and  $V_{DD}$  vias, so this area is usually not a concern. If there is a switch from  $V_{SS}$  to  $V_{DD}$  referencing, the path will need to include a capacitor for the plane-to-plane transition. This usually results in a larger increase in loop area so it should be avoided.

## Routing

Standard characteristic impedance ( $Z_0$ ) of 50–60 $\Omega$  is recommended for all traces. The 50–60 $\Omega$  level also provides a good match to the output impedance of the controller/FPGA drivers. Designers are advised to specify  $Z_0$ , enabling board manufacturers to adjust dielectric thickness and line width to achieve the specification.

Though there are many signals on LPDDR2/LPDDR3 components, most of them have similar functionality and work together. Groups of I/O signals have one of four purposes:

- Carry a binary address
- Transmit or receive data
- Relay a command to the device
- Latch in address/data or a command

The command/address inputs provide the command and address inputs according to the command truth table. The control group includes chip select (CS#) and clock enable (CKE). Each data group/lane contains 10 signals: eight DQ (DQ[7:0]), strobe (DQS), and data mask (DM). Devices with x8 bus widths have only one data group, while x16 and x32 bus-width devices have two and four lanes, respectively.

Related functionality makes minimizing skew critical. This requires the signals of each group to be routed to similar electrical lengths assuming the loads are equivalent. Routing entire bus groups on single layers minimizes skew.

A timing budget only requires the margin to be positive for a system to work assuming all parameters are accounted for. Meeting a timing budget is dependent on many things. The higher the speed or more complex a system, the more difficult it is to meet the timing budget. Allowed skew should be considered in terms of whether there is plenty of margin in a timing budget. One inch of trace is approximately 165ps of delay. For a component operating at an 800 MHz clock rate, the bit period is 625ps. It may be more meaningful to consider skew as a percentage of the period: 1% = 6.25ps; 6.25ps = approximately 40 mils. To match lengths to within 1% of the clock period, match the traces to within 40 mils.

For simple systems (one LPDDR2/LPDDR3 component), there should be sufficient margin in the timing budget. Consider matching to 5% or 200 mils. For the most complex systems (four LPDDR2/LPDDR3 components), use 1% or 40 mils. For two components, something between those values (such as 3% or 120 mils) may be acceptable.

Serpentine trace patterns contribute the desired delay, but be aware there is some self-coupling that can change the propagation delay for a signal. Use simulations that include coupling to validate timing.

Vias contribute to timing error. If each signal in a bus group contains the same vias transitioning between the same layers, the vias may be ignored. If there is a mismatch, the additional delay may push the timing margin to the negative side. Simulations will take vias into account; therefore, if the entire bus is simulated, all vias are accounted for. If the simulations do not include the entire bus, additional delay to compensate for the vias should be considered. One formula for the additional delay is 2X the actual via barrel length that the signal uses to change layers.

## Point-to-Point Topology

Placing the clock signals on an internal layer minimizes EMI noise.

Match CK trace length to CK# trace length  $\pm 20$  mils. If multiple clock pairs are transmitted from the controller to components, all clock-pair traces should be equivalent within  $\pm 20$  mils.

**Figure 14: Point-to-Point Topology**



Matching CK to DQ requires some simulation work. Even with the same trace length, there will likely be some skew due to the different loads. To determine the skew requires simulations to be done. The slew rate between the CK and the DQ will be different. This alone will affect the timing relationship between CK and DQ. If the termination is different, the timing relationship will be affected.

Matching CK to address, command, and control requires simulations as well. Differences in slew rate and termination cause skew between CK and the address, command, and control signals.

From simulations for a simple point-to-point configuration, it may be determined there is sufficient margin to allow 3–5% skew in some groups.

For a point-to-point configuration example (Figure 15), the trace impedance is  $60\Omega$  and the routed length is two inches. With an 800 MHz clock, the resulting waveforms for clock, DQ, and address are as shown in Figure 15. For this simulation, there is no offset of any signals. The black trace is the clock; red is address; and blue is DQ. Relative to the clock, there is a small shift in timing for the address and a larger shift (70ps) for the DQ signals.

**Figure 15: Address, Data Eye for Point-to-Point Configuration**



**Figure 16: Address, Data Eye with Clock for Point-to-Point Configuration**



A timing budget example is shown in Table 5. Typically, the slow corner is used for setup and the fast corner is used for hold. This example uses data simulated at typical conditions.

**Table 5: Point-to-Point Timing Budget for Address with 800 MHz Clock**

| Parameter                                                                                                                                   | For Setup (PS) | For Hold (PS) | Notes |
|---------------------------------------------------------------------------------------------------------------------------------------------|----------------|---------------|-------|
| Margins from simulation (slow corner for setup and fast corner for hold)                                                                    | 312            | 254           |       |
| Worst case routing skew from simulated signal (worst case for setup (earliest) will be different from hold (latest) using 2% of the period) | 7              | 7             |       |
| LPDDR3 minimum setup and hold requirements                                                                                                  | 75             | 100           |       |
| Derating based on signal slew rate (may be negative)                                                                                        | 38             | –             | 1     |
| Derating based on clock slew rate                                                                                                           | –              | 25            | 1     |
| Routing skew for clock if appropriate                                                                                                       | 5              | 5             |       |
| Crosstalk—all sources not included in simulation                                                                                            | 50             | 35            |       |
| Controller skew (arbitrary value inserted into example)                                                                                     | 50             | 50            | 2     |
| Clock error (arbitrary value inserted into example) (may be included in other skew parameters)                                              | 10             | 10            | 2     |
| Margin                                                                                                                                      | 77             | 22            |       |

Notes:

1. See the LPDDR2/LPDDR3 data sheet.

2. See the controller data sheet.

To find the worst-case (smallest) setup time, run the simulations at the slow corner. To find the worst-case hold time, run the simulations at the fast corner. In many cases, nominal impedances are used for both the fast and slow corners; however, to use the extreme worst case, the slow corner should use the trace impedance at the maximum tolerance. Use minimum trace impedances at the fast corner. Additionally:

- Routing skew must be included. If all addresses are included in the simulation this value will be zero. If not, any offset from the simulated signal must be included. The ideal case would be to have a simulated value. Line length differences may also be used to estimate the skew.
- Add the minimum setup and hold time requirements for the specific speed. Data sheet values are defined at specific slew rates. The setup and hold values should be adjusted for the slew rates observed in the simulation. The LPDDR2/LPDDR3 data sheets describes how setup and hold are modified to account for actual slew rate.
- If the timing budget is for the address and there is any branching of the clock, the routing difference must be accounted for.
- Add crosstalk. If this is a coupled simulation, it may already be included. This should include all sources of crosstalk that are not accounted for in the data sheets. Crosstalk can be a significant value where the bus is not terminated. The closer the bus is to being fully terminated, the lower the value of crosstalk.
- Controller skew comes from the data sheet.
- Clock error may be several parameters. Generally, the clock needs to be offset from the address (there may be a parameter for this). The duty cycle of the clock is important because both edges are used (there should be a parameter for this). If the clock is adjustable in step sizes, the step size is an error term.

- If the controller is capable of training the clock position, another method may be considered. For this method, the open data window (open address window) is used. All error terms are subtracted from the open window including a clock placement error term. If the result is positive, there is margin.
- The largest open window for a point-to-point unterminated topology will be when the driver impedance matches the transmission line impedance. If the maximum driver impedance is  $48\Omega$ , consider a  $50\Omega$  transmission line.

## Point-to-Two-Point Configurations

Once the system becomes more complex, additional guidelines are useful to ensure an adequate system:

- With two loads, control reflections to minimize their impact on signaling.
- For address and clock, where signals are always in the same direction, balance the route. After the signal from the controller splits to connect with each LPDDR2/LPDDR3 component, control the timing so that the signal reaches each LPDDR2/LPDDR3 component at the same time.

Typically, the trunk and the branches are the same impedance (for example,  $60\Omega$ ). The branches represent a  $30\Omega$  impedance so there is a reflection that returns to the controller. If the controller is the same impedance as the transmission line, there will not be an additional reflection. Making the branches equal in length helps to control reflections. Best result is to have the reflections from the open circuit, that the LPDDR2/LPDDR3 component represents, return to the branch point at the same time (they actually add at this point). If the impedance in the trunk is not half the impedance of the branches, there will be another reflection back toward the components; however, at least there will not be additional reflections. Any signal that continues down the trunk will eventually see the driver. If the driver impedance matches the transmission line impedance of the trunk, there will not be an additional reflection. If the branches are kept short, the reflection that returns to the component from the mismatch in impedance between the trunk and branches will occur during the rise time and will not be observed as a separate event.

- In practice, the driver should be within  $20\Omega$  of transmission line impedance of the trunk. The closer the match, the better the result. Simulations should be used to verify adequate results.
- If there are two LPDDR2/LPDDR3 components, all of the interfaces may not be point-to-two point. The data bus may still be point-to-point. If this is the case, refer to the point-to-point section for the data bus. Clock is another topology to consider. If there are two clocks available from the controller, one may be dedicated to each LPDDR2/LPDDR3 component. In some cases the clock is terminated and the address and DQ bus are not. This will shift the timing some so simulations are required.
- Slew rate is dependent on the capacitance at the load and the source impedance. The source impedance is the driver impedance plus the transmission line impedance (mostly the trunk). To improve slew rate, consider shading the driver impedance to the low side and the trunk impedance on the low side. A  $40\Omega$  driver with  $50\Omega$  trunk and  $60\Omega$  branches may be a good choice. Obviously, wider traces require more space so it is a tradeoff.

## Point-to-Four-Point Configurations

Point-to-four-point configurations contain the heaviest load and lowest bandwidth. If the address is the only bus that requires this configuration, it sets the speed limit for the system. If the DQ bus must be point-to-four point, it also sets the speed limit for the system. This is because the capacitance of the DQ pins is higher, which forces a lower slew rate. Clock has a similar capacitance to address. If the clock is point-to-four-point, it will perform similarly as address. Being only two traces, it is more likely that it could be improved with a lower impedance or termination near the load.

Symmetry is a strong requirement. Each signal may have different lengths; however, within one signal, related branches must be the same length. To have similar timing, the total length controller to LPDDR2/LPDDR3 component should match across all signals. If the branches are different for different signals, the trunks will also be different so that the total is a constant. This formula is an approximation. Simulations are required to verify the result is adequate.

## Clock Routing in Multiple LPDDR2/LPDDR3 Devices

The LPDDR2/LPDDR3 devices require different master CK and CK# clock inputs. All CA input signals are sampled on both the rising and falling edges of CK. CS and CKE input signals are sampled at the rising edge of CK only. Therefore, it is important for the LPDDR2/LPDDR3 devices to have a clean differential clock input.

Ideally, CK and CK# are 180° out-of-phase, such that CK and CK# cross V<sub>REF</sub> at the same point. This balances the output data so that each data word has the same valid time. These signals may be generated directly by the controller chip or by a separate clock chip. Best system margins will be obtained when CK and CK# are exactly 180° out-of-phase.

Proper terminations on the lines may provide clean differential clock input to the LPDDR2/LPDDR3 devices.

The figure below shows the waveform of clocks with various termination values.

**Figure 17: 667 MHz Clock to Four LPDDR3 Devices with  $200\Omega$ ,  $1000\Omega$ , and  $1M\Omega$  Terminations**



**Figure 18: 667 MHz Clock to Four LPDDR3 Devices with  $50\Omega$ ,  $100\Omega$ , and  $150\Omega$  Terminations**



- $R_{TT} = 150\Omega$  and  $R_{TT} = 200\Omega$  provide good clean eyes and adequate voltage swing.
- $R_{TT} = 150\Omega$  is recommended for best results.

**Note:**  $R_{TT} = 1M\Omega$  (open) may cause too much reflection and is not recommended. Choose unterminated differential CK/CK# at your own discretion.

For LPDDR2 and LPDDR3 systems, match CK trace length to CK# trace length  $\pm 20$  mil. If multiple clock pairs are transmitted from the controller to components, all clock-pair traces should be equivalent within  $\pm 20$  mil. Simulations should be run to determine an appropriate relationship between CK and DQS length. After an optimal number is determined, a tolerance of  $\pm 50$  mils is acceptable.

Recommended LPDDR3 routing topology for clock pairs is shown in the following figures.

If the trace lengths from split-point to LPDDR2/LPDDR3 components are less than approximately 1in (25mm), use a single 150–200 $\Omega$  resistor ( $R_{TT}$ ) at the split-point, as shown below.

**Figure 19: Single CK/CK# Differential Resistor Placement at Split-Point**



If the trace lengths from the split-point to LPDDR2/LPDDR3 components are greater than approximately 1in (25mm), use two resistors located near the respective components, as shown below. These resistors are in parallel, so each  $R_{TT}$  should be 300–400 $\Omega$  to keep the effective resistance near 150 $\Omega$ .

**Figure 20: Dual CK/CK# Differential Resistor Placement at Component**



Use simulations to determine the optimal relationship between CK and address. Select a tolerance based on the margin in the timing budget. At clock rates of 667 MHz and above, little margin is available.

## Control, Address, and Data Routing to Multiple LPDDR Components

If an application requires additional memory, multiple devices can be used. The additional load and routing will affect signal integrity. With standard LPDDR2 and LPDDR3 applications, the termination scheme would likely change to end termination. With LPDDR4, other techniques may be feasible without using end termination.

When routing the interconnect between the controller and the memory devices, the use of a balanced-T topology is recommended (see the figure below). This will maintain an equal flight time for the signals going to each of the devices. The drive strength selection should still be chosen to match the trace impedance; however, signal integrity and timing margins could improve if the drive strength is set to a lower impedance.

**Figure 21: PCB Layout (Balanced-T Topology) for Multiple Components**



It is important to simulate any multipoint designs. When defining the source impedance, the selections available may not be ideal for your application. In such cases, it may be necessary to add a series termination resistor to achieve the exact source impedance required:

- For unidirectional signals such as address and control lines, the series resistor should be placed as close to the driver as possible.
- For bidirectional signals, the series resistor can be placed at the midpoint of the trace, providing some benefit for all drivers on the network.

In a multipoint system design, address and control lines can take advantage of the unidirectional bus. The equivalence simplification of circuits applies to transmission lines, so at the point of the T where the trace splits, the equivalent impedance of the parallel combined trace is one half that of the original impedance. This impedance discontinuity will cause some disturbance on the signals. To reduce this effect, the impedance of the trace from the split to the device can be increased so that the parallel combination of the traces is closer to the original impedance of the trace. This reduces noise on the signal and improves signal integrity.

## Miscellaneous Routing Recommendations

A 40 mil difference in address-, command-, or signal-group trace lengths equates to  $0.04\text{in} \times (167\text{ps per inch})$ , or a skew of approximately 6.7ps. If the timing budget can absorb this minor amount of lane-to-lane skew and other routing delays, the system will perform normally. Total routing-based delays must meet  $t_{DQSK}$ , controller DQS recovery limits, and other data sheet AC timing parameters.

Regardless of bus type, all signal groups must be properly referenced to a solid  $V_{SSQ}$  or  $V_{DDQ}$  plane. For both READ and WRITE operations, the key relationship is between CK/CK#, DQ, DM, and DQS signals (the LPDDR2/LPDDR3 data group), which operates at twice the speed of other signal groups and makes SI more critical. DQ, DQS, and clock lines are best referenced to  $V_{SSQ}$  to minimize noise. If a  $V_{SSQ}$  layer is not easily accessible, address and command lines can reference a  $V_{DDQ}$  layer.

Keep traces as short as possible. If trace length (from controller pad to LPDDR4 pad) is <1in (2.5cm) for both LPDDR2/LPDDR3 and LPDDR4 point-to-point applications, routing is simpler and signal quality usually increases in proportion. In most cases, trace lengths >2in (5cm) lead to more signal undershoot, overshoot, and ringing—all of which are detrimental to SI.

## Additional Trace-Length Design Guidelines

- Match different DQ byte lanes to within one-tenth of an inch (2.5mm) of each other. A 1in trace-length difference equates to approximately 167ps of propagation delay. Thus, the timing budget must be able to absorb 16.7ps for a .1in difference in byte-lane matching.
- Within a byte lane, match all DQ and DQS traces to within  $\pm 50$  mils and route data groups next to a  $V_{SS}$  plane to minimize the return path/loop length.
- Maintain a solid ground reference (no split planes, and so on) for each group to provide a low impedance return path; high-speed signals must not cross a plane split.

## Simulation

During the layout phase for a new or revised design, Micron strongly recommends simulating I/O performance at regular intervals. Optimizing an interface through simulation can help decrease noise and increase timing margins before building prototypes. Issues are often resolved more easily when found in simulation, as opposed to those found later that require expensive and time-consuming board redesigns or factory recalls.

Micron has created many types of simulation models to match the different tools in use. Component simulation models currently on [micron.com](http://micron.com) include IBIS, Verilog, VHDL, Hspice, Denali, and Synopsys.

Verifying all simulated conditions is impractical, but there are a few key areas to focus on: DC levels, signal slew rates, undershoot, overshoot, ringing, and waveform shape. Also, verifying the design has sufficient signal-eye openings to meet both timing and AC input voltage levels is extremely important.

## Conclusion

Signal integrity, power delivery, routing, and decoupling are all major concerns when designing LPDDR2 and LPDDR3 applications.

Mobile LPDDR2 and LPDDR3 designs provide an attractive alternative to traditional DRAM designs when used for mobile applications. The option to control the drive strength to match the impedance of the memory bus enables removal of the termination voltage ( $V_{TERM}$ ) and series termination resistors. Mobile LPDDR designs can be used to reduce memory cost and power consumption in mobile applications.

Mobile applications, when properly designed and validated through simulations, can realize superior functionality and stability.

## Revision History

### Rev. C – 6/14

- Table 4 typo showing 1 mil instead of 11 mil for Command Control 5 mil dielectric. Table 4 LPDDR3 minimum setup and hold requirements values swapped between columns. Corrected addressing to be consistent with LPDDR2/3 scheme

### Rev. B – 10/13

- Corrected table 1, VDD vs VDDQ definition correction. Page 29 typo correction to change 1in to .1 in

### Rev. A – 8/13

- Initial release

8000 S. Federal Way, P.O. Box 6, Boise, ID 83707-0006, Tel: 208-368-3900  
[www.micron.com/productsupport](http://www.micron.com/productsupport) Customer Comment Line: 800-932-4992  
Micron and the Micron logo are trademarks of Micron Technology, Inc.

All other trademarks are the property of their respective owners.

This data sheet contains minimum and maximum limits specified over the power supply and temperature range set forth herein. Although considered final, these specifications are subject to change, as further product development and data characterization sometimes occur.