

- ① Testing: is a process of identifying faults in actual chip design.
- Quality of manufacturing metric.
  - exhaustive done for every chip
  - So the device is error free.
- ② Verification: Predictive analysis which ensures design, when manufactured, will perform the given I/O function.
- functional correctness as per specifications is emphasized.
  - measures quality of design
  - A single design verified for millions of IC
- ③ Validation: is a process of checking the functional correctness of chip.
- ↓  
↳ pre-Si ↔ Verification  
↳ post-Silicon
- So pre-Si validation is overall validation & verification effort prior to sending design for fabrication. It involves activities such as - functional test, assertion coverage, code coverage.
- Post refers to check few samples of Si & ensure that design is ready for mass production, & usually done at target design clock speeds.
- So it validates functional aspects + non-functional aspects like power, NM, voltages, character.
- works on complete customer's useable system level. (Test may be step product of valid.)
- Testing: - Wafer sort, Burn in stages, fault test.

# VLSI TESTING AND VERIFICATION

- Testing
  - faults / defect
  - process of identifying faults in actual chip design
  - measures Quality of many fault vs. defect millions of I/Os
- Verification
  - process of functional correct w.r.t. implementation (logical + syntactic)
  - Quality of design vs. cost
  - Single domain

- formal Verification Verify correctness of design without exhaustive I-O testing of all possible combinations.

## FLOW OF VLSI DESIGN VERIFICATION



Most commonly used lang. for verification - System

## verification approaches

### Black box

- Lack of visibility & observability
- not generally followed

### white box

- good knowledge

- test bed to implementation within component

specifically 2 approaches

grey box

## Levels of verification



## TESTING

### → Testing principle



Testing can happen at diff. levels.

- VLSI development process of chip.
- manufacturing - <sup>wafer</sup> chip, board
- and may be at system level.
- Testing acts a filter.



Generally Testing follows rule of 10  $\Rightarrow$  it costs 10 times more to test a device as we move to higher level in product manufacturing process.

So what is actually role of Testing?

Role of testing in VLSI Design process is to identify find if DUT/CUT is performing as expected i.e. to ensure its quality. If not to detect fault, diagnose the reason and determine correction/removal of errors at mini cost. Also it encompasses.

FMEA i.e. detection of manufacturing process errors that may have caused defects on DUT/CUT.

Testing is able to identify.

- ↳ Design error
- ↳ test vector error

## Challenges in VLSI TESTING.

→ Industry follows MOORE's LAW.

but we already have Koomey's Law.

- ① Industry currently facing 3 challenges
- ↓ in power consump.
  - ↓ " delay (fast)
  - ↑ in Xistor density.

### 1. Sub nm TECHNOLOGY CHALLENGES.

We already are working at 5nm technology with 7nm already commercialising.

But as we go down with feature size diff types of defects surface which may be classified on

Defects

| PROCESS DEFECTS              | MATERIAL DEFECTS | AGING DEFECTS | DESIGN DEFECTS                                         |
|------------------------------|------------------|---------------|--------------------------------------------------------|
| - include process variations | - PIBTI (nMOS)   | - HCI         | - Worst corner defect.<br>(Temp immersion.)<br>- DVFS. |

→ RDF

→  $W/L, T_{ox}$  variations

-  $V_{TH}$  variations

Voltage drop  
(molten  
scratch)  
b/w drop  
(ppb)

- voltage islands.

Temp immersion: As Temp ↑,  $\mu \downarrow \Rightarrow I_{DS} \downarrow \Rightarrow$  delay ↑.  
This is normal dependence.

but as Temp ↑,  $V_{TH} \downarrow \Rightarrow I_{DS} \uparrow \Rightarrow$  delay ↓ (REVERSE)

So worst corner plots (Shmoo) help identify extreme dependence b/w diff environment variables as.  
 $V_{CC}$  vs T, Delay vs T, etc.

DVFS: Dynamic Voltage & freq. Scaling. of technique  
 $P = CV^2 f$  to handle

so to reduce power.  $V \downarrow$  or  $f \downarrow$

Voltage islands: (non uniform power consumption leads to non-uniform voltage drop)

Though Tx feature size has reduced, interconnect length scaled at same pace. So interconnect delays dominate DSI era. Under given to number low R, alternate two problem making capacitive & inductive coupling b/w adjacent

2. CROSSTALK As we lower the current driving of devices current it becomes sensitive to noise-induced errors - cross talk. As complexity  $\uparrow$ , it becomes diff. to test each transistors & wire, making it more susceptible to errors. This is further accumulated due to high competition forces.

### a) TEST GENERATION:

As we progress towards complex chips with high resistor density, exhaustive testing is not possible so structural testing is used. Efficiency/Effectiveness of structural testing is measured w.r.t. fault coverage.

$$\text{faultcov} = \frac{\text{No. of detected faults}}{\text{TOTAL NO. OF FAULTS}} \times 100\%.$$

i) Firstly test patterns of sub-assemblies interfere with each other when both main & sub assembly are stimulated.

ii) The test pattern generation time rises exponentially with  $N$  in no of binary I/Os & with no. of on-chip I/Os

$$N_p = K \sqrt{Nt} \rightarrow N_t \stackrel{\text{no. of resistors placed}}{\propto} \frac{1}{d^2} \text{ (chip area)}$$

↓  
no. of pins  $\propto 1/d$  (chip area)

Now  $\uparrow$  in resistor on chip  $\uparrow$  faster relative of No. of pins.

$$\therefore \text{test complexity} \propto \frac{N_t}{N_p}$$

In general, test complexity  $\uparrow$  every 5 to 6 yrs.

→ use of test-vector may cause excessive power dissipation. ∴ it must be adjusted for power reduction.

Also as feature size  $\downarrow$   $V_{th} \downarrow$  causing  $I_D \uparrow$  & requires  $I_{DDQ}$  testing.

- 3) Integration of analog & digital devices on single chip.  
This requires diff. testing tech. for analog devices & digital devices.  
This approach is used for tiny speed & reducing cost.  
Analog testing technique: DSPhase for A/D.  
BIST,  
chip by chip testing. manufacture report.

#### 4) Rising clk rates:

1997  $\rightarrow$  2012 (15 yrs)

200 MHz  $\rightarrow$  1830 MHz (10 times)  
almost

& Fault simulation models are effective at only operating speed. They need to be performed at operating speed or higher.

This requires FTE to adapt to higher clk rates as the of clk. The cost of this is almost \$3000 for every pin.

This not only adds huge cost to test but additionally takes 2-3 yrs of time to build such m/c.

(GHz)

also at tiny speeds, chip is more susceptible to noise. This is primarily as inductive effect b/w wiring becomes active at higher freq. This is in turn causes:

- Ringing in signal xkons along wiring.
- Interference due to capacitive & inductive coupling.
- Delay of Test path becomes difficult.

## ~~Slide~~: Test Economics:

Testing is imp for design of reliable devices but comes at an additional cost which may include

- cost of equip (ATE) - initial & running costs
- cost of development of CAD tools
- test vector generation,
- test programming

~~No understanding test economics yet understand the basic quantity we~~  
yield: is fraction of acceptable parts out of total fabricated parts.

$$Y = \frac{\text{no. of acceptable parts}}{\text{Total no. of fabricated}}$$

may be further defined as Wafer Yield / Process yield.

(good chips fault) measures avg. no. of account  
Yield loss → catastrophic (random defect) defects  
In general Testing follows rule of 10 → parameter (process variation) in many

reject rate is also a way measure  
 $0 < Y < 1$  (measured as ppm) =  $\frac{\text{no. of faulty parts passing}}{\text{Total no. of parts passing}}$

Let's suppose testing gives Fault coverage of F.C.

$$0 < FC < 1 \quad FC = \frac{\text{detect faults}}{\text{Total detectable faults}}$$

$$= \frac{\text{detect faults}}{\text{Total faults - undetectable faults}}$$

So Defect Level can be computed which gives an appx. % of cks/parts which will be faulty.

$$D.L. = \frac{1}{10} \left( \frac{1-FC}{1-FC} \right)^{100}$$

=  $(\frac{DPM}{10^6})^{100}$   
 detectable faults  
 measure value of  
 faulty parts

$$0 < DL \leq 1 - Y$$

$$Q_L^L (\text{Quality Level}) = 1 - \Delta L \quad (\text{measure of good cap})$$

✓ Further ; if 'd' is defect density - am. no of defects / area.  
clustering parameter 'λ'

This is based on the fact that probability of occurrence of clustered faults is quite high than the single faults

so for area 'A' am. no of defects - Ad.

for a random chip having defect 'x' has probability

$$\begin{aligned} p(x) &= \text{prob. (no. of defects in chip = } x) \\ \text{uses.} &= \frac{\Gamma(\lambda+x)}{x! \Gamma(\lambda)} \left(Ad/d\right)^x \end{aligned}$$

$$\Gamma(x) = \int_0^\infty e^{-y} y^{x-1} dy$$

(GAMMA  
FUNCTION)

$$\begin{aligned} E(x) &= Ad \\ \text{mean} & \\ \sigma^2(x) &= Ad(1+Ad/\lambda) \end{aligned} \quad \left. \begin{array}{l} \\ \\ \end{array} \right\} \text{determined experiment.}$$

$$\begin{aligned} \text{Let } Y &= p(0) \quad \therefore Y = (1+Ad/\lambda)^{-\lambda} \\ \text{and } \lambda &\rightarrow \infty \\ \text{Test-} \\ \downarrow \text{acceptability} \\ ( &< 10^{10} \text{ ppm} ) \end{aligned}$$

$$\begin{aligned} \text{Cost of chip} &= \text{cost of fabricating & testing a wafer} \\ &\quad \times \text{Yield} \times \text{No. of chip sites on wafer.} \end{aligned}$$

- Diagnosis & Repair:

- Test Coverage =  $\frac{\text{No. of fault detect}}{\text{Testable fault}}$

## Testing Tech.

→ ATE

ATPG - D-Algo / PODEM

↳ Fault Simulat. ↳ speed up

↳ Parametric Testing - digit / Anal / Mixed

DFT ↳ SCAN

↳ BIST

↳ addn DFT

↳ Board Testing + (Boundary Scan)

We know there are diff types of defect types which may be random or systematic in nature

caused by process variations, ST & DI issues

~~ref. di (f.m.)~~

Testing problem

Given a set of faults in CUT, how to have test patterns which gives high fault coverage. So a test process is identified

1. Fault Modeling

2. Test pattern gener

3. Fault Simulat (measure of F.C.)

4. ATE / BIST

\* T.M. why.

T.M. are necessary for generating & evaluating a set of test vectors needed for Fault Simulat as well as test generation.

VLSI testing can be classified into four types depending upon the specific purpose it accomplishes

#### **Characterization**

Also known as *design debug* or *verification testing*, this form of testing is performed on a new design before it is sent to production. The purpose is to verify that the design is correct and the device will meet all specifications. Functional tests are run and comprehensive AC and DC measurements are made. Probing of internal nodes of the chip, commonly not done in production testing, may also be required during characterization. Use of specialized tools such as *scanning electronmicroscopes* (SEM) and electron beam testers, and techniques such as *artificial intelligence* (AI) and *expert systems*, can be effective. A *characterization test* determines the exact limits of device operating values. We generally test for the worst case, because it is easier to evaluate than average cases and devices passing this test will work for any other conditions. We do this by selecting a test that results in a chip pass/fail decision. We then select a statistically significant sample of devices, and repeat the test for every combination of two or more environmental variables, and plot results. This essentially means repetitively applying functional tests and measuring various DC or AC parameters, as we vary different variables such as (the supply voltage.) This data is plotted as a *Shmoo plot*.

*characterization test* determines the exact limits of device operation values.

#### **DC Parameter tests**

- Measure steady-state electrical characteristics
- For example, threshold test
  - $0 < V_{OL} < V_{IL}$
  - $V_{IH} < V_{OH} < V_{CC}$

#### **AC parametric tests**

- Measure transient electronic characteristics
- For example:
  - Rise time & fall time tests

### **PRODUCTION TESTING:**

Every fabricated chip is subjected to production tests, which are less comprehensive than characterization tests yet they must enforce the quality requirements by determining whether the device meets specifications. The vectors may not cover all possible functions and data patterns but must have a high coverage of modeled faults. The main driver is cost, since every device must be tested. Test time (and therefore cost) must be absolutely minimized. Fault diagnosis is not attempted and only a *go/no-go* decision is made. Production tests are typically short but verify all relevant specifications of the device. It is an outgoing inspection test of each device, and is not repetitive. We test whether some *device-under-test* (DUT) parameters are consistent with the device specifications under normal operating conditions. We test either at the speed required by the application of the device or at the speed guaranteed by the supplier.

### **BURN IN TESTING**

All devices that pass production tests are not identical. When put to actual use, some will fail very quickly while others will function for a long time. *Burn-in* ensures reliability of tested devices by testing, either continuously or periodically, over a long period of time, and by causing the bad devices to actually fail. Correlation studies show that the occurrence of potential failures can be accelerated at elevated temperatures.

Briefly, two types of failures are isolated by burn-in: *Infant mortality failures*, often caused by a combination of sensitive design and process variation may be screened out by a short-term burn-in (10-30 hours) in a normal or slightly accelerated working environment. *Freak failures*, i.e., the devices having the same failure mechanisms as the reliable devices, require long burn-in time (100-1,000 hours) in an accelerated environment. During *burn-in*, we subject the chips to a combination of production tests, high

## ATRG - V-AGO - PRODEM

temperature, and over-voltage power supply. Helps climate the weak devices which fail to perform under stress.



- Stage 1: Infant Mortality/Early Life – This is the period where early failures show up in a component. These are due to lack of control in manufacturing processes at the molecular level. During this period components fail at a high rate but this rate decreases with time. (Curve in blue shows failure rate due to early fails)
- Stage 2: Normal/Useful Life – This is the period where rate of failure is nearly constant, and due to randomly occurring faults. (shown with green curve)
- Stage 3: Wear Out/End of Life – Period marked by increase in failure rate due to aging of components; this period marks the end of the useful life span of a device. These fails are due to critical paths in the device getting worn out. (Curve in red shows failure rate due to ageing).

### **Advantages:**

- Delivered product has higher reliability. Fewer customer returns.
- Ability to estimate the product's useful life period.

### **Disadvantages:**

- Higher cost (Burn-in boards degrade over time and must be replaced).
- Mechanical and EOS/ESD damage to parts.
- Non-uniform distribution of stress on device (Inability to put 100% of the device under stress).
- Efficiency of Burn-in test impacted by voltage scaling and power consumption.

### **Faults detected**

Burn-in testing detects faults that are generally due to imperfections in manufacturing and packaging processes, which are becoming more common with the increasing circuit complexity and aggressive technology scaling. Traditional stuck-at testing does not detect these types of faults because they may be dormant and need to be stressed to manifest as "fails" (during burn-in).

The root cause of fails detected during burn-in testing can be identified as dielectric failures, conductor failures, metallization failures, electromigration, mouse-bites, etc. These faults are dormant and randomly manifest into device failures during device life-cycle. With burn-in testing, we stress the device, accelerating these dormant faults to manifest as failures.

## **INCOMING INSPECTION**

System manufacturers perform incoming inspection on the purchased devices before integrating them into the system. Depending upon the context, this testing can be either similar to production testing, or more comprehensive than production testing, or even tuned to the specific systems application. Also, the incoming inspection may be done for a random sample with the sample size depending on the device quality and the system requirement. The most important purpose of this testing is to avoid placing a defective device in a system assembly where the cost of

diagnosis may far exceed the cost of incoming inspection

### Type of Tests

**Types of Test**s. Actual test selection depends upon the manufacturing level (processing, wafer, or package) being tested. Although some testing is done during device fabrication to assess the integrity of the process itself, most device testing is performed after the wafers have been fabricated. The first test, known as *wafer sort or probe*, differentiates potentially good devices from defective ones.

After this, the wafer is scribed and cut, and the potentially good devices are packaged. Also, during wafer sort, a *test site characterization* is performed. Specially designed tests are applied to certain test sites containing specific test patterns. These are designed to characterize the processing technology through measurement of parameters such as gate threshold, polysilicon field threshold, bypass, metal field threshold, poly and metal sheet resistances, contact resistance, etc.

In general, each chip is subjected to two types of tests:

(1) *Parametric Tests*. DC parametric tests include shorts test, opens test, maximum current test, leakage test, output drive current test, and threshold levels test. AC parametric tests include propagation delay test, setup and hold test, functional speed test, access time test, refresh and pause time test, and rise and fall time test. These tests are usually technology-dependent. CMOS voltage output measurements are done with no load while TTL devices require current load.

(2) *Functional Tests*. These consist of the input vectors and the corresponding responses. They check for proper operation of a verified design by testing the internal chip nodes. Functional tests cover a very high percentage of modeled (e.g., stuck type) faults in logic circuits and their generation is the main topic of this tutorial. Often, functional vectors are understood as verification vectors, which are used to verify whether the hardware actually matches its specification. However, in the ATE world, any vectors applied are understood to be functional fault coverage vectors applied during manufacturing test. These two types of functional tests may or may not be the same.

Functional tests may be applied at an elevated temperature to guarantee specifications.

For example, testing may be done at 85° C to guarantee 70° operation. This is called *guardbanding*. Another application is in *speed binning* to grade the chips according to performance. This may be done by applying the tests at several voltages and at varying timing conditions (e.g., clock frequency.) The scenario described above represents a generality. The actual test plan for a VLSI device varies depending upon the specific application it is intended for, the manufacturer's test philosophy, available test equipment, and test economics.

While the general testing methodology is applicable to memory chips as well, there are some notable differences. Memory tests are *functional*; no stuck-type fault coverage is evaluated for these tests, which are designed to check functional attributes such as address uniqueness, address decoder speed, cell coupling, column and row coupling, data sensitivity, write recovery, and refresh. Elaborate testing may require long vector sequences. To increase throughput, memory testers sometimes have parallel testing capability. Specialized testers may even be able to repair redundant cells used in large memories (256 M bits) to enhance yield. VLSI chip testing, in many ways, resembles the testing of other digital equipment such as circuit boards [509], but with differences. Circuit boards consist of previously-tested components. A primary objective of board testing is to check the printed wiring and the contacts between wires and components. It is possible to perform a bare-board testing of interconnections before the components are inserted. After the components (such as chips) are in place, *in-circuit testing* [69] (through a *bed-of-nails* fixture) is often used to verify the performance of individual components, although the bed-of-nails fixture is becoming obsolete. Finally, functional testing determines whether or not individual components, possibly designed with different

Further Testing can be ONLINE / OFFLINE.

OLT

- generally monitoring based
- concurrent with system operation
- ∵ at speed of operation
- embedded ATG

↓

Offline

- when device out of operation
- generally once
- slower speed helps save power.
- use ext. chip equip.

(BIST)

- helpful for unmodeled fault

ATE

- eg manufacturing test
- good for modeled faults.