

# Challenges and Emerging Solutions in Testing HBM IO & Systems

Salem Abdennadher  
*Intel Corporation*  
Folsom, USA

Michael Altmann  
*Intel Corporation*  
Folsom, USA

Bin Xue  
*Intel Corporation*  
Hillsboro, USA

**Abstract**— with advances in VLSI technology, process, packaging and architecture, System on a Chip (SoC) continue to increase in complexity. This has resulted in an unprecedented increase in design errors, manufacturing flaws and customer returns in modern VLSI systems related to High Speed I/O circuits. The situation will be exacerbated in future systems with smaller form factors, higher integration complexity, Embedded I/O's, and more complex manufacturing process. Systems using High Bandwidth Memory (HBM) with embedded DRAM interconnected via a high density substrate with interposer-like technologies (2.5D packaging) are being introduced in a broad array of products. In this work we define a new testability methodology for 2.5D products. As opposed to conventional testing, 2.5D IC test flows are more complex and new DFT methodologies will be presented that provide good coverage and visibility to isolate failures in Production Manufacturing Tests.

**Keywords:** EMIB, HBM PHY, HBM DRAM, 3D, 2.5D

## 1. INTRODUCTION

In response to the increasing challenges in maintaining technology advancements through traditional scaling at a pace consistent with or exceeding Moore's Law, alternative methods to achieve enhanced system level performance are becoming increasingly important. Three-dimensional integrated circuit (3D IC) and 2.5D IC with Si interposer as shown in Figure 1 are regarded as promising candidates to overcome the limitations of Moore's law because of their advantages of lower power consumption, smaller form factor, higher performance, and higher function density [1-3].



Figure 1: Progression of 3D IC's.

Intel announced its Embedded Multi-die Interconnect Bridge (EMIB) in August 2014. This is a lower cost and a simpler 2.5D packaging approach than silicon interposers with Through-silicon via (TSV) for very high density interconnects between heterogeneous dies on a single package. It involves embedding a passive silicon interconnect chip into the package, enabling localized high-density die-to-die connections, and eliminating the die size restrictions of

interposer-based solutions. Figure 2 presents a subsystem representation with EMIB connectivity.



Figure 2: A subsystem with EMIB connectivity

One of the challenges for 2.5D and 3D technology adoption is the insufficient understanding of 3D testing issues and the lack of existing Design for Test (DFT) solutions. This work presents some new design and test techniques to meet the increasing test cost and complexity challenges of these embedded Inputs/Outputs (I/O). The reliability of HBM at the System in Package (SiP) level is also increased. The specific features are described in more details in section 2. Fault Isolation in an HBM based solution will be a key customer enabler. We would like to identify where the failure occurred (i.e. DRAM Stack, interposer interconnect, die attach, via delamination, or the application-specific integrated circuit (ASIC) die. A well-thought-out 2.5D test flow enabling improved testability and fault isolation of these components is presented in section 3.

## 2. TEST FLOW AND HBM PHY DFT FEATURES

2.5D ICs present manufacturing challenges to optimize yield and cost. Multiple tests need to be performed at different stages of the manufacturing flow. Tests before assembly can be performed before or after wafer thinning. After HBM DRAM, EMIB and SoC package assembly, tests can be performed on either partial assembly and/or complete assembly. Once packaged, final packaged test can be performed. A hierarchical test architecture is designed to meet product test coverage and fault isolation requirements. In addition, the HBM Physical layer (PHY) in the host SoC must support specific DFT features. Table 1 summarizes the host HBM PHY DFx features.

Table 1: HBM PHY DFT feature summary

| DFx Features           | Details                                                                                          |
|------------------------|--------------------------------------------------------------------------------------------------|
| Internal Loopback BIST | <ul style="list-style-type: none"><li>Pad side and core side loopback support</li></ul>          |
| Interface BIST         | <ul style="list-style-type: none"><li>Uses DRAM Loopback to test, remap and train DRAM</li></ul> |
| Controller BIST        | <ul style="list-style-type: none"><li>Loopback between controller and the PHY</li></ul>          |

|                         |                                                                                                             |
|-------------------------|-------------------------------------------------------------------------------------------------------------|
| DRAM BIST               | <ul style="list-style-type: none"> <li>Mission Mode write and reads into DRAM</li> </ul>                    |
| Lane Repair             | <ul style="list-style-type: none"> <li>Redundant lane usage to increase yield</li> </ul>                    |
| Error Injection         | <ul style="list-style-type: none"> <li>Inject errors during all BIST (lane failures, false pass)</li> </ul> |
| Margining               | <ul style="list-style-type: none"> <li>Voltage and Timing margining capabilities</li> </ul>                 |
| Digital Delay Line BIST | <ul style="list-style-type: none"> <li>Digital Delay Line oscillation test mode</li> </ul>                  |
| Test Monitor            | <ul style="list-style-type: none"> <li>Digital and analog signal debug on test bumps</li> </ul>             |
| PLL Test                | <ul style="list-style-type: none"> <li>PLL Lock, Bypass and test signal monitors</li> </ul>                 |
| Scan                    | <ul style="list-style-type: none"> <li>At speed and Transition scan support</li> </ul>                      |
| IEEE 1500 Support       | <ul style="list-style-type: none"> <li>Internal IEEE 1500 and a pass through mode</li> </ul>                |

## 2.1 Internal and Controller Loopback BIST

To minimize the reliance on ATE testing, we will use on-chip BIST structures and loopback modes for both Address/Command and Data IO pins. For Address/Command BIST, selected data patterns use the mission mode transmit path and dedicated loopback receive paths for checking against expected data. For Data loopback BIST, the data traverses the mission mode transmit and receive paths of the respective signal. The HBM PHY also supports Controller BIST with loopback to isolate the connectivity between the Controller and the HBM PHY. As the DFI interface does not have a receive path for Address and Command Signals, the Address/Command Loopback data comparison is localized in the HBM PHY. For the data path, data generation and comparison is done by the HBM controller. These loopback paths are illustrated in Figure 3.



Figure 3: BIST Features in HBM subsystem

## 2.2 Interface BIST

The JEDEC HBM spec [4] specifies a Multiple-input Shift (MISR) and a Linear Shift Register (LFSR) within the HBM Command (AWORD) and 32 DQ bits (DWORD) I/O blocks. Functionally, these blocks are for testing the link between the HOST and the HBM DRAM. The HBM PHY will incorporate a 20-bit MISR/LFSR for each DWORD byte, and a 30-bit MSIR/LFSR for the AWORD circuit. Identical polynomials are supported in the HBM DRAM die to enable Interface BIST. Additional standardized control hooks sync the host and DRAM LFSRs.

The Interface BIST will enable direct identification of failing, or marginal, signal connections between the Host and HBM die. This External Loopback function between memory and HBM PHY is designed for I/O link testing, for repair and interface timing training after assembly as described in more details in section 2.5. Additionally, all Loopback BIST and Interface BIST have Voltage/Timing margining and digital error injection capability. Error injection in the HBM PHY enables faults to be injected on individual signals, to check against false passes for all available BIST in the PHY (PHY loopback BIST, Controller BIST, Interface BIST, DRAM BIST) and is also beneficial in lane repair validation especially if no broken lanes are encountered (i.e. to unmask a false pass) where it is used to mimic lane failures during lane repair and remap.

## 2.3 DRAM BIST

The HBM PHY supports a local BIST engine to test the external memories by executing writes and reads without a controller. This DRAM access can be done through a string of back to back writes followed by a string of back to back reads or as a string of write/read access. In addition, all current HBM vendors use an interface die at the bottom of the DRAM stack to provide signal redistribution and other functions [5][6]. One common function is a DRAM-resident memory BIST (MBIST) engine. MBIST is used to capture thermal-mechanical damage induced failures during assembly in production test and repair/re-test. Typically, the MBIST test suite is a reduced package compared with the wafer or stack level production DRAM test suite.

Per JEDEC Spec, there are 60 Direct Access (DA) I/O pins in the HBM Stack that are provided for direct access test. They must be routed directly to external package I/O pins. There are 20 Direct Access (DA) pins that are designated to connect point to point and 18 DA pins are designated to connect in parallel to multiple HBM devices on a multi drop bus. The function of each of the pins is not defined and is specific to each supplier.

## 2.4 IEEE 1500 Support

The IEEE 1500 standard interface [11] is the main interface to access and activate the test modes within the HBM DRAM. The IEEE 1500 interface is accessed from the host (ASIC) PHY and memory controller. IEEE 1500 is the protocol standard used for testing in the system level. Using this protocol, DRAM can support PHY level test including basic connection test functions like EXTEST TX/RX which are intended for DC I/O connectivity testing similar to board level boundary scan, chip reset, internal VREF control and AC link test with PRBS data traffic. In addition, it also enables memory array testing with DRAM embedded memory BIST engine, and SiP-level test function after assembly, such as DRAM cell repair.

A Pass-Through 1500 Interface DFx feature in the ASIC PHY will allow an external controller to issue 1500 instructions to the DRAM. This is a pass-through interface in the PHY where it multiplexes the interface with an internal 1500 controller that is used for PHY built-in hardware functions. Once in

Pass-Through mode, the external controller must implement a full set of IEEE 1500 instructions to allow full testing of the DRAM dies.

### 2.5 HBM lane Repair

Previous DRAM technologies have concentrated on yield improvement to the memory array through redundancy, however these are relatively low I/O count. HBM DRAM with a 4X to 32X IO-count increase, will include support features to enable DRAM Lane Repair via Lane redundancy. In the current HBM spec there are mechanisms to identify broken or marginal connections between DRAM and Host PHY through DC connectivity tests using the IEEE 1500 Test Port EXTEST\_TX, EXTEST\_RX commands. DWORD AC marginality can be exposed via either the mission mode interface (Row & Col IO pins) or the IEEE 1500 test port, by enabling LFSR/MISR functions. AWORD AC marginality can be exposed using the IEEE 1500 test port and the AWORD-specific LFSR/MISR functions. HBM DRAM and the HBM Host PHY have a suite of built-in test functions to support system diagnostic requirements at a DWORD level of granularity. Each DWORD within a channel has built-in link test generation and verification logic built around defined LFSR patterns. Lane redundancy can be enabled from results of any of these test hook, or can be invoked after (or part of) link training. Link training on power-on or reset exit itself can be used as a mechanism to detect the health of each lane. The lane redundancy technique adopted by JEDEC has a minimal impact on area and performance and consists of shifting traffic from ‘bad’ links to redundant lanes. Per channel, JEDEC defines 8 specific bumps as “Redundant Data” pins, two bumps as “Redundant Row/Col” pins and all 16 DBI ac pins may be used as “Redundant Data” pins (while giving up DBI power savings, if used). The lane-repair options are flexible, and lane redundancy could be enabled, for example, in production test or in the field during power-on training.

In HBM Systems, we see continuously reduced test access due to today’s progress in silicon (interposer, EMIB ...), at component level (pin numbers and pitch) as well as in HBM manufacturing (buried layers, embedded IO’s). Besides, signal transmission becomes more and more complex due to modern high speed demands. HBM has more than one thousand PHY I/O’s and micro bumps for mission-mode operation that they cannot be directly accessed for test because their pitches are too small to be probed during testing and in SiP these I/O’s are connected through interposer. All the described DFT features in this section can be used for subsequent global failure site isolation techniques as illustrated in Section 3.

## 3. FAULT ISOLATION AND DEBUG FLOW

Although all components will go through rigorous pre-assembly component level test. During assembly however defects will be introduced. So failure analysis is needed to isolate defects. We can conveniently break down the system into subsections and discuss their respective coverage.

### 3.1 ASIC PHY IO Coverage

Internal Loopback Built in Self-Test (BIST) on the application-specific integrated circuit (ASIC) PHY side is the most effective way to screen the PHY I/O. Figure 4 (left) conceptually illustrates the test in the system context. Note in this test the DRAM PHY will be in Tristate so it does not interfere with the test. This particular BIST engine supports both core-side and pad-side modes, which help isolate faults in the AFE.

The controller BIST extends the coverage to/from controller. The caveat is that the AWORD error checking must be done within the PHY since there is no AWORD return data to controller. Since this test only reach out to the PHY I/O pad, EMIB, and DRAM pad, failure of the test may be attributed to defects on these components.



**Figure 4:** Internal Loopback BIST (left) and controller BIST (right)

### 3.2 EMIB coverage

Post assembly, the EMIB is completely enclosed in the system with limited visibility and must rely on ASIC and DRAM side tests to provide coverage including: EXTEST\_TX/RX, Interface BIST, and DRAM BIST. The test coverage of these tests are visualized in Figure 5 below.



**Figure 5:** EMIB coverage: EXTEST (left), Interface BIST (middle) and controller/PHY DRAM BIST (right)

### 3.3 HBM DRAM Coverage

The HBM base die is a logic die, unlike conventional DRAM dies. Per the JEDEC HBM specification [4] the base die is IEEE1500 compliant. In this base die, the DRAM built in MBIST algorithms are vendor specific but most include

common SCAN and March patterns with a few debug algorithm/tests. The HBM DRAM BIST can be effectively used to test the interface between the PHY and DRAM as well as basic validity of the DRAM bit cells. DRAM coverage is also complimented by the ASIC PHY DRAM BIST and ASIC Controller DRAM BIST which are remotely located in the ASIC [7]. They provide full coverage of the mission mode path with real world DRAM transactions. Controller DRAM BIST has more algorithms flexibility. The built-in MBIST, DRAM BIST and controller DRAM BIST and their respective coverage are illustrated in Figure 6.

In addition the HBM 60-pin Direct Access bus that is routed to external package I/O pins from the HBM stack can be effectively used for further debug of the HBM stack. This testing is not defined in JEDEC, and it is required that each vendor communicate their own DA bus functionality to product teams.



**Figure 6:** DRAM coverage: built-in MBIST (left), DRAM BIST (middle) and controller DRAM BIST (right)

### 3.4 Failure analysis/Isolation challenges

All DFx and tests intended coverage is tabulated in Table 2. In addition, for the DRAM I/O's one of the purposes of the base die is to support localized test functions. We may have access to some of Vendor's DRAM I/O DFT [8] features per agreement. The caveat is it's only for debug and failure analysis/isolation, and the defective part may need to be separated (dissembled).

**Table 2:** DFx/Tests and their respective coverage

| DFx/Tests      | ASIC<br>PHY | EMIB | DRAM<br>PHY | DRAM<br>TSV | DRAM<br>Cell |
|----------------|-------------|------|-------------|-------------|--------------|
| Controller     | ✓           |      |             |             |              |
| BIST           |             |      |             |             |              |
| Loopback       | ✓           |      |             |             |              |
| BIST           |             |      |             |             |              |
| Interface BIST | ✓           | ✓    | ✓           |             |              |
| EXTEST         | ✓           | ✓    | ✓           |             |              |
| MBIST          |             |      |             | ✓           | ✓            |
| DRAM BIST      | ✓           | ✓    | ✓           | ✓           | ✓            |
| Controller     | ✓           | ✓    | ✓           | ✓           | ✓            |
| DRAM BIST      |             |      |             |             |              |
| Vendor I/O     |             |      | ✓           |             |              |
| DFT            |             |      |             |             |              |

Still, the fault isolation is a daunting task due to the system level complexity and interactions among the components. For example a defect in EMIB is very difficult to isolate with planned DFx/tests. Such defect could fail multiple tests and since these tests have large overlap (ASIC PHY, EMIB, DRAM PHY) it's hard to pinpoint the exact failure location. An ultimate fault isolation analysis may require complete disassembly and component level analysis. However this is obviously a tedious and expensive process. The goal is to conduct extensive test data analysis and learning, along with DRAM vendor and PHY supplier, so that we can built a robust debug and fault isolation flow with available DFx/test. A secondary goal is to minimize costs associated with rework or scrapping from localized I/O, DRAM or EMIB defects. With access to DRAM vendor's full I/O DFT/tests, the debug burden may be further alleviated. The challenge in 2.5D IC failure analysis remain in the lack of effective solution for evaluating the interface status between the ASIC and HBM DRAM stack inside SiP [9].

### SUMMARY

HBM provides memory bandwidth far beyond what is available with traditional memory devices (ex. DDR4) at attractive power and cost. But HBM testing presents new challenges. In this paper we presented emerging DFT techniques and methodologies to address these challenges and ensure high yielding products. With integration of multiple dies and designs from different vendors, fault isolation is increasingly crucial to product success and cost control. We presented how the DFT techniques deployed in the different ingredients of the HBM system can help isolate these failures.

### REFERENCES

- [1] Koester SJ, Young AM, Yu RR, Purushothaman S, Chen KN, La Tulipe DC, Rana N, Shi L, Wordeman MR, Sprogis EJ. Wafer-level 3D integration technology. IBM J. Res. Dev. 2008;52:583–597
- [2] Chen KN, Tan CS. - Integration Schemes and Enabling Technologies for Three-Dimensional Integrated Circuits. Very Large Scale Integr. (VLSI) Syst. 2011; Vol5 : pg160–168.
- [3] Lau JH (2012) - Recent advances and new trends in nanotechnology and 3D integration for semiconductor industry.3D Systems Integration Conference:1-23
- [4] JEDEC Solid State Technology Association - JESD 235, JEDEC Spec, High Bandwidth Memory (HBM) DRAM Specification Rev 1.30. Jan 2015
- [5] Samsung Semiconductor - 8Gb B-die HBM datasheet: KHA821201B, KHA841801B, KHA881901B
- [6] SKHynix Inc. - 8Gb HBM M-die datasheet, Stacked Die: H5VR16ESM2H-xxC, H5VR16ESM4H-xxC, H5VR16ESM8H-xxC
- [7] Kevin Tran (SKHynix Inc.) Paul Silvestri (Amkor Technology Inc.), Bill Isaacson (eSilicon Corporation), Brian Daellenbach (Northwest Logic) - High-Bandwidth Memory White Paper, Start Your HBM/2.5D Design Today, 201
- [8] Hongshin Jun, Sangkyun Nam, Hanho Jin, Jong-Chern Lee, Yong Jae Park, and Jae Jin Lee SK hynix Inc High-Bandwidth Memory (HBM) Test Challengesand Solutions, 2016 EEE Design&Test
- [9] Dr. Hongshin Jun, SK hynix Inc HBM (High Bandwidth Memory) for 2.5D, Semicon 2015 Tawian.
- [10] DesignWare HBM2 PHY Datasheet, July 25, 2017
- [11] IEEE std 1500, Embedded Core Test, 30 June 2005 IEEE P1500 Web Site.