

# Low-Power Programmable PRPG With Test Compression Capabilities

Michał Filipek, Grzegorz Mrugalski, *Senior Member, IEEE*, Nilanjan Mukherjee, *Senior Member, IEEE*,  
 Benoit Nadeau-Dostie, *Senior Member, IEEE*, Janusz Rajski, *Fellow, IEEE*, Jędrzej Solecki,  
 and Jerzy Tyszer, *Fellow, IEEE*

**Abstract**—This paper describes a low-power (LP) programmable generator capable of producing pseudorandom test patterns with desired toggling levels and enhanced fault coverage gradient compared with the best-to-date built-in self-test (BIST)-based pseudorandom test pattern generators. It is comprised of a linear finite state machine (a linear feedback shift register or a ring generator) driving an appropriate phase shifter, and it comes with a number of features allowing this device to produce binary sequences with preselected toggling (PRESTO) activity. We introduce a method to automatically select several controls of the generator offering easy and precise tuning. The same technique is subsequently employed to deterministically guide the generator toward test sequences with improved fault-coverage-to-pattern-count ratios. Furthermore, this paper proposes an LP test compression method that allows shaping the test power envelope in a fully predictable, accurate, and flexible fashion by adapting the PRESTO-based logic BIST (LBIST) infrastructure. The proposed hybrid scheme efficiently combines test compression with LBIST, where both techniques can work synergistically to deliver high quality tests. Experimental results obtained for industrial designs illustrate the feasibility of the proposed test schemes and are reported herein.

**Index Terms**—Built-in self-test (BIST), low-power (LP) test, pseudorandom test pattern generators (PRPGs), test data volume compression.

## I. INTRODUCTION

ALTHOUGH over the next years, the primary objective of manufacturing test will remain essentially the same—to ensure reliable and high quality semiconductor products—conditions and consequently also test solutions may undergo a significant evolution. The semiconductor technology, design characteristics, and the design process are among the key factors that will impact this evolution. With new types of defects that one will have to consider to provide the desired test quality for the next technology nodes such as 3-D, it is appropriate to pose the question of what matching design-for-test (DFT) methods will need to be deployed. Test compression,

Manuscript received November 14, 2013; revised May 5, 2014; accepted June 9, 2014. Date of publication July 14, 2014; date of current version May 20, 2015.

M. Filipek, J. Solecki, and J. Tyszer are with the Faculty of Electronics and Telecommunications, Poznań University of Technology, Poznań 60-965, Poland (e-mail: michał.piotr.filipek@gmail.com; jędrzej.solecki@gmail.com; jerzy.tyszer@put.poznan.pl).

G. Mrugalski, N. Mukherjee, B. Nadeau-Dostie, and J. Rajski are with Mentor Graphics Corporation, Wilsonville, OR 97070 USA (e-mail: grzegorz\_mrugalski@mentor.com; nilanjan\_mukherjee@mentor.com; benoit\_nadeau-dostie@mentor.com; janusz\_rajski@mentor.com).

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/TVLSI.2014.2332465

introduced a decade ago, has quickly become the main stream DFT methodology. However, it is unclear whether test compression will be capable of coping with the rapid rate of technological changes over the next decade. Interestingly, logic built-in self-test (LBIST), originally developed for board, system, and in-field test, is now gaining acceptance for production test as it provides very robust DFT and is used increasingly often with test compression. This hybrid approach seems to be the next logical evolutionary step in DFT. It has potential for improved test quality; it may augment the abilities to run at-speed power aware tests, and it can reduce the cost of manufacturing test while preserving all LBIST and scan compression advantages.

Attempts to overcome the bottleneck of test data bandwidth between the tester and the chip have made the concept of combining LBIST and test data compression a vital research and development area. In particular, several hybrid BIST schemes store deterministic top-up patterns (used to detect random pattern resistant faults) on the tester in a compressed form, and then use the existing BIST hardware to decompress these test patterns [6], [7], [20]–[22], [27], [30], [51]. Some solutions embed deterministic stimuli by using compressed weights or by perturbing pseudorandom vectors in various fashions [16], [17], [29], [31], [46], [54], [55]. If BIST logic is used to deliver compressed test data, then underlying encoding schemes typically take advantage of low fill rates, as originally proposed in LFSR coding [24], which subsequently evolved first into static LFSR reseeding [10], [15], [18], [26], [50], [52], and then into dynamic LFSR reseeding [2], [39]. Thorough surveys of relevant test compression techniques can be found, for example, in [23] and [44].

As with conventional scan-based test, hybrid schemes, due to the high data activity associated with scan-based test operations, may consume much more power than a circuit-under-test was designed to function under. With overstressing devices beyond the mission mode, reductions in the operating power of ICs in a test mode have been of concern for years. Full-toggle scan patterns may draw several times the typical functional mode power, and this trend continues to grow, particularly over the mission mode's peak power. This power-induced over-test may result in thermal issues, voltage noise, power droop, or excessive peak power over multiple cycles which, in turn, cause a yield loss due to instant device damage, severe decrease in chip reliability, shorter product lifetime,

or a device malfunction because of timing failures following a significant circuit delay increase, for example. Abnormal switching activity may also cause fully functional chips to fail during testing because of phenomena, such as IR-drop, crosstalk, or  $di/dt$  problem.

Numerous schemes for power reduction during scan testing have been devised [14]. Among them, there are solutions specifically proposed for BIST to keep the average and peak power below a given threshold. For example, the test power can be reduced by preventing transitions at memory elements from propagating to combinational logic during scan shift. This is achieved by inserting gating logic between scan cell outputs and logic they drive [9], [19]. During normal operations and capture, this logic remains transparent. Gated scan cells are also proposed in [3] and [56]. A synergistic test power reduction method of [57] uses available on-chip clock gating circuitry to selectively block scan chains while employing test scheduling and planning to further decrease BIST power in the Cell processor. A test vector inhibiting scheme of [11] masks test patterns generated by an LFSR as not all produced vectors, often very lengthy, detect faults. Elimination of such tests can reduce switching activity with no impact on fault coverage.

The advent of low-transition test pattern generators has added a new dimension to power aware BIST solutions [5], [32], [42]. For example, a device presented in [49] employs an LFSR to feed scan chains through biasing logic and T-type flip-flop. Since this flip-flop holds the previous value until its input is asserted, the same value is repeatedly scanned into scan chains until the value at the output of biasing logic (e.g., a  $k$ -input AND gate) becomes 1. Depending on  $k$ , one can significantly reduce the number of transitions occurring at the scan chain inputs. A dual-speed LFSR of [48] consists of two LFSRs driven by normal and slow clocks, respectively. The switching activity is reduced at the circuit inputs connected to the slow-speed LFSR, while the whole scheme still ensures satisfactory fault coverage. Mask patterns mitigate the switching activity in LFSR-produced patterns as shown in [41], whereas a bit swapping of [1] achieves the same goal at the primary inputs of CUT. A gated LFSR clock of [12] allows activating only half of LFSR stages at a time. It cuts power consumption as only half of the circuit inputs change every cycle. Combining the low transition generator of [49] (handling easy-to-detect faults) with a 3-weight pseudorandom test pattern generator (PRPG) (detecting random pattern resistant faults) can also reduce BIST switching activity, as demonstrated in [47]. The schemes of [25], [36], and [43] suppress transitions in LFSR-generated sequences by either statistical monitoring or injecting intermediate and highly correlated patterns. Finally, a random single-input change generator can produce low power patterns in a parallel BIST environment, as shown in [13].

As the BIST power consumption can easily exceed the maximum ratings when testing at speed, scan patterns must be shifted at a programmable low speed, and only the last few cycles and the capture cycle are applied at the maximum frequency. In the burst-mode approach presented in [35], typically five consecutive clock cycles are used. The first four cycles serve shifting purposes, whereas the last one is designated

for capture. The objective is to stabilize the power supply before the last shift and capture pulses are applied, which are critical for at-speed tests. To reduce the voltage droop related to a higher circuit activity, a burst clock controller slows down some of the shift cycles. It allows a gradual increase of the circuit activity, thereby reducing the  $di/dt$  effect. The controller can gate the shift clocks, depending on the needs for gradually warming up of the circuit. Low power (LP) test compression schemes [28], [33], [41], [53] adapt again LFSR reseeding to reduce scan-in transitions as the low fill rates make it possible to deliver identical test data to scan chains for a number of shift cycles directly from the decompressor, thereby reducing the number of transitions.

In this paper, we propose a PRPG for LP BIST applications. The generator primarily aims at reducing the switching activity during scan loading due to its preselected toggling (PRESTO) levels. It can assume a variety of configurations that allow a given scan chain to be driven either by a PRPG itself or by a constant value fixed for a given period of time. Not only the PRESTO generator allows loading scan chains with patterns having low transition counts, and thus significantly reduced power dissipation, but it also enables fully automated selection of its controls such that the resultant test patterns feature desired, user-defined toggling rates. We will demonstrate that this flexible programming can be further used to produce tests superior to conventional pseudorandom vectors with respect to a resultant fault-coverage-to-test-pattern-count ratio. This paper culminates in showing that the PRESTO generator can also successfully act as a test data decompressor, thus allowing one to implement a hybrid test methodology that combines LBIST and ATPG-based embedded test compression. This is the first LP test compression scheme that is integrated in every way with the BIST environment and lets designers shape the power envelope in a fully predictable, accurate, and flexible fashion. As a result, it creates an environment that can be used to arrive at an efficient hybrid solution combining advantages of scan compression and logic BIST. In addition, both techniques can complement each other to address, for example, a voltage drop caused by a high switching activity during scan testing, constraints of at-speed ATPG-produced test patterns, or new fault models.

This paper is organized as follows. Section II introduces the basic operational principles of the PRESTO generator, while Section III presents all architectural details of its structure with a brief discussion of the generator's abilities to produce patterns with various toggling rates. Section IV demonstrates how the PRESTO generator can be programmed in order to yield pseudorandom test patterns of desired switching activity. Experiments validating this technique are discussed in Section V. In addition, a method to achieve higher BIST fault coverage with shorter test application time by deploying the native PRESTO features is described in Section VI. A PRESTO-based LP test data decompressor is introduced in Section VII, which is followed by the presentation of the corresponding test data-encoding algorithm in Section VIII. This paper concludes with a variety of comprehensive experimental results related to both performance of the low power PRPG and the power aware test



Fig. 1. Basic architecture of a PRESTO generator.

data compression (Section IX) and finally, it wraps up with Section X.

## II. BASIC ARCHITECTURE

Fig. 1 shows the basic structure of a PRESTO generator. An  $n$ -bit PRPG connected with a phase shifter feeding scan chains forms a kernel of the generator producing the actual pseudorandom test patterns. A linear feedback shift register or a ring generator can implement a PRPG. More importantly, however,  $n$  hold latches are placed between the PRPG and the phase shifter. Each hold latch is individually controlled via a corresponding stage of an  $n$ -bit toggle control register. As long as its enable input is asserted, the given latch is transparent for data going from the PRPG to the phase shifter, and it is said to be in the toggle mode. When the latch is disabled, it captures and saves, for a number of clock cycles, the corresponding bit of PRPG, thus feeding the phase shifter (and possibly some scan chains) with a constant value. It is now in the hold mode. It is worth noting that each phase shifter output is obtained by XOR-ing outputs of three different hold latches. Therefore, every scan chain remains in a low-power mode provided only disabled hold latches drive the corresponding phase shifter output [40].

As mentioned previously, the toggle control register supervises the hold latches. Its content comprises 0s and 1s, where 1s indicate latches in the toggle mode, thus transparent for data arriving from the PRPG. Their fraction determines a scan switching activity. The control register is reloaded once per pattern with the content of an additional shift register. The enable signals injected into the shift register are produced in a probabilistic fashion by using the original PRPG with a programmable set of weights. The weights are determined by four AND gates producing 1s with the probability of 0.5, 0.25, 0.125, and 0.0625, respectively. The OR gate allows choosing probabilities beyond simple powers of 2. A 4-bit register *Switching* is employed to activate AND gates, and allows selecting a user-defined level of switching activity. For example, the switching code 0100 will set to 1, on the average,

25% of the control register stages, and thus 25% of hold latches will be enabled. Given the phase shifter structure, one can assess then the amount of scan chains receiving constant values, and thus the expected toggling ratio.

An additional 4-input NOR gate detects the switching code 0000, which is used to switch the LP functionality off. It is worth noting that when working in the weighted random mode, the switching level selector ensures statistically stable content of the control register in terms of the amount of 1s it carries. As a result, roughly the same fraction of scan chains will stay in the LP mode, though a set of actual low toggling chains will keep changing from one test pattern to another. It will correspond to a certain level of toggling in the scan chains. With only 15 different switching codes, however, the available toggling granularity may render this solution too coarse to be always acceptable. Section III presents additional features that make the PRESTO generator fully operational in a wide range of desired switching rates.

## III. FULLY OPERATIONAL GENERATOR

Much higher flexibility in forming low-toggling test patterns can be achieved by deploying a scheme presented in Fig. 2. Essentially, while preserving the operational principles of the basic solution, this approach splits up a shifting period of every test pattern into a sequence of alternating hold and toggle intervals. To move the generator back and forth between these two states, we use a T-type flip-flop that switches whenever there is a 1 on its data input. If it is set to 0, the generator enters the hold period with all latches temporarily disabled regardless of the control register content. This is accomplished by placing AND gates on the control register outputs to allow freezing of all phase shifter inputs. This property can be crucial in SoC designs where only a single scan chain crosses a given core, and its abnormal toggling may cause locally unacceptable heat dissipation that can only be reduced due to temporary hold periods. If the T flip-flop is set to 1 (the toggle period), then the latches enabled through the control register can pass test data moving from the PRPG to the scan chains.

Two additional parameters kept in 4-bit Hold and Toggle registers determine how long the entire generator remains either in the hold mode or in the toggle mode, respectively. To terminate either mode, a 1 must occur on the T flip-flop input. This weighted pseudorandom signal is produced in a manner similar to that of weighted logic used to feed the shift register. The T flip-flop controls also four 2-input multiplexers routing data from the Toggle and Hold registers. It allows selecting a source of control data that will be used in the next cycle to possibly change the operational mode of the generator. For example, when in the toggle mode, the input multiplexers observe the Toggle register. Once the weighted logic outputs 1, the flip-flop toggles, and as a result all hold latches freeze in the last recorded state. They will remain in this state until another 1 occurs on the weighted logic output. The random occurrence of this event is now related to the content of the Hold register, which determines when to terminate the hold mode.

A scan switching profile when deploying the PRESTO generator in a hypothetical environment with 15 scan chains



Fig. 2. Fully operational version of PRESTO.



Fig. 3. Switching activity in scan chains.

is shown in Fig. 3 for two test patterns. Blue (0s) and red (1s) stripes make up the low power-toggling pattern, while gray areas correspond to periods of toggling. All-blue and all-red scan chains are fed by the constant values only. Note that their quantity does not change between patterns though they are not exactly the same in each case. As can be seen, test patterns are divided into hold and toggle intervals of random length, while LP scan chains remain still for the entire duration of a single test pattern.

When using the PRESTO generator with an existing DFT flow, all LP registers are either loaded once per test or every test pattern. The registers loaded only once act as test data registers or are parts of an IJTAG network, and are initialized by the test setup procedure. They are triggered using a slow scan shift clock and operate at a very low speed thereby imposing no timing constraints. Although the remaining registers are loaded once per test pattern (also at the scan shift speed), timing is not compromised because of shallow logic generating bits to be loaded serially into the registers. With the help of shadow registers, values remain unchanged during capture.

Clearly, it suits LBIST applications, where the shift speeds are quite high. The LP registers are also added during embedded deterministic test (EDT) IP generation and insertion [39]. The associated logic is integrated into the design along with the EDT logic. Since the EDT logic (including LP) is only added in the scan paths, there is no impact on the functional mode of operation.

#### IV. AUTOMATIC SELECTION OF CONTROLS

As shown in the previous sections, performance of the PRESTO generator depends primarily on the following three factors (note that in the BIST mode they are delivered only once, at the very beginning of the entire test session):

- 1) the switching code (kept in the switching register);
- 2) the hold duty cycle (HC);
- 3) the toggle duty cycle (TC).

Given the size of PRPG, the number of scan chains and the corresponding phase shifter, the switching code as well as HC and TC values can be selected automatically in such

a way that the entire generator will produce pseudorandom test patterns having a desired level of toggling  $T$  provided the scan chains are balanced. The procedure of selecting these parameters consists of the following steps.

- 1) For each switching code  $k$ ,  $k = 1, \dots, 15$ , determine the corresponding probability  $p_k$  of injecting a 1 into the shift register. These values are as follows:  $p_1 = 0.5$ ,  $p_2 = 0.25$ ,  $p_3 = 0.625$ ,  $p_4 = 0.125$ ,  $p_5 = 0.5625$ ,  $p_6 = 0.34375$ ,  $p_7 = 0.671875$ ,  $p_8 = 0.0625$ ,  $p_9 = 0.53125$ ,  $p_{10} = 0.296875$ ,  $p_{11} = 0.6484375$ ,  $p_{12} = 0.1796875$ ,  $p_{13} = 0.58984375$ ,  $p_{14} = 0.38476563$ , and  $p_{15} = 0.69238281$ .
- 2) As can be seen in Fig. 2, the values  $p_k$  obtained in step 1 determine as well the probability of asserting the T flip-flop input for each hold (toggle) code  $k$ , and then the corresponding duration  $h_k$  ( $t_k$ ) of the hold (toggle) duty cycle. Clearly,  $h_k = t_k = 1/p_k$ .
- 3) Given the size  $n$  of PRPG, determine, for each switching code  $k$ , the average number  $n_k$  of 1s occurring in the control register. As can be easily verified,  $n_k = p_k \times n$ .
- 4) For each value of  $n_k$  (the number of enabled hold latches), find the average number  $a_k$  of active scan chains, i.e., scan chains that are not in the LP mode. This number is determined by the phase shifter architecture, and it also depends on the actual locations of 1s in the control register. Therefore, 1000  $n$ -bit random combinations having exactly  $n_k$  1s are generated to obtain the number of active scan chains in each case, and finally the number  $a_k$  of active scan chains is averaged over all 1000 samples.
- 5) Given a desired level of toggling  $T$  (%), one can determine the resultant (hypothetical) number  $A$  of active scan chains from the following equation:

$$A = (T \times S)/50 \quad (1)$$

where  $S$  is the total number of scan chains. The above proportion assumes that if all  $S$  scan chains are active, then the resultant toggling is about 50%.

- 6) For each switching code  $k$ , and thus the resulting number  $a_k$  of active scan chains, determine how many additional scan chains should be disabled. In each case, this quantity is given by  $d_k = a_k - A$ . If  $d_k \leq 0$ , then disregard the next steps, as the switching code  $k$  does not guarantee even the smallest (required) number of active scan chains.
- 7) Since disabling extra scan chains cannot be implemented through the control register, this action is carried out by equivalent disabling—with help of HCs—of selected cells belonging to active scan chains. The value of  $d_k$  is therefore converted into the number of corresponding cells in active scan chains (see gray areas in Fig. 4). Let  $L$  be the scan chain length. Then, we have

$$d_k \times L = (a_k - A) \times L = a_k \times h_k \times v \quad (2)$$

$$(h_k + t_k)v = L \quad (3)$$

where  $v$  accounts for the number of hold (toggle) duty



Fig. 4. Toggling (WTM) for five designs and 33-bit PRPG.

cycles. From the above formulas, we get that

$$r = h_k/t_k = (a_k/A) - 1. \quad (4)$$

- 8) Ratio  $r$  is now evaluated for each value of  $h_k$  and  $t_k$  (in total  $15 \times 15 = 225$  combinations) to find the best matching between the actual value of  $r$  and the theoretical value of the expression  $(a_k/A) - 1$ .
- 9) Values of switching, hold, and toggle codes that yield ratio  $r$  with the smallest deviation from the theoretical value are selected as the PRESTO setup parameters.

Results presented in Section V demonstrate that despite certain simplifications [in particular, we assume that hold and TCs are of the same constant size; furthermore, certain cell locations, typically used in computing weighted transition metrics (WTMs), are not taken into account when looking for active scan cells to be disabled], the above procedure yields controls that allow producing pseudorandom patterns with switching activities tracking very closely the desired levels of toggling.

## V. VALIDATING EXPERIMENTS

The approach presented in Section IV has been validated by experiments run on five different scan architectures ( $203 \times 300$ ,  $122 \times 104$ ,  $84 \times 416$ ,  $128 \times 353$ ,  $160 \times 541$ ) used in five industrial designs, and with a 33-bit ring generator implementing a primitive polynomial  $x^{33} + x^{25} + x^{16} + x^8 + 1$  and feeding 33-input phase shifter for 10 000 pseudorandom test patterns. The average toggling rates measured by means of the WTM are plotted in Fig. 4 against successive values of a desired switching activity (the requested toggling). The standard deviation is used to assess a possible dispersion from the average toggling values. Clearly, the lower the values of the standard deviation, the smaller the spread of toggling activity with respect to the desired level of switching activity. The plot of Fig. 4 consists of four different curves. The central red line represents the average value of the toggling ratio computed over all examined designs and all test patterns for successive values of the desired (user-selected) toggling rate varying from 1% to 45% in steps of 1%. Two black lines correspond to



Fig. 5. Toggling (WTM) for five designs with 32- and 33-bit PRPGs.

standard deviation bounding the average value curve from the top and the bottom. The last (blue) curve represents maximal values (averaged over maximal values obtained for all examined designs) recorded for each toggling rate. As can be seen, the resultant switching activity follows closely, with small values of standard deviation, the requested rates.

Fig. 5 gathers experimental results similar to those of Fig. 4 but obtained in a slightly different way. Before plotting the actual values of toggling rates and the remaining statistics, experiments for every single toggling rate were performed for 32- and 33-bit PRPGs (the 32-bit ring generator uses a primitive polynomial  $x^{32} + x^{25} + x^{15} + x^7 + 1$ ). Note that phase shifters are separately synthesized in each case. The resultant toggling rates were compared, and switching activity with a smaller absolute dispersion from the expected value was chosen as the final result. It appears that in certain cases it is preferable to pick a 32-bit PRPG rather than a 33-bit one, or *vice versa*. This strategy yields virtually a straight line with respect to toggling rates, as shown in Fig. 5, hence offering an accurate mapping between the user-selected values of switching activity and the actual circuit response. One can also observe reduced maximal values and smaller standard deviations in this case.

The objective of the second group of experiments was to evaluate tests produced by a 32-bit PRESTO and determine their fault coverage for various requested toggling rates. The results for one of the industrial designs deployed in this paper are shown in Fig. 6. Similar outcomes for a BIST-ready design are shown in Fig. 7. The curves correspond to (requested) toggling rates from 5% to 25% in steps of 5%. In each test case, an additional red curve reports a reference fault coverage obtained by applying purely pseudorandom test patterns with the effective toggling rates around 50%. One result is clear: performance of the PRESTO generator remains highly predictable. In particular, with the increasing switching activity single stuck-at fault coverage increases as well. In fact, in some designs (Fig. 7) fault coverage of certain LP tests can be higher than that of conventional pseudorandom patterns. Typically, however, one may observe a gap between PRESTO-produced tests and their random counterparts. Fortunately, PRESTO has



Fig. 6. Fault coverage for different toggling rates.



Fig. 7. Fault coverage for a BIST-ready design.

ability to reduce this gap by a proper selection of the control register content as we demonstrate in Section VI.

## VI. IMPROVING FAULT COVERAGE GRADIENT

A quest to achieve higher BIST fault coverage with shorter test application time generated an immense amount of research in the past. Typically, LFSR-based pseudorandom test sequences were modified either by placing a mapping logic between the PRPG outputs and inputs of a circuit under test [4], [45], or by adjusting the probabilities of outputting 0s and 1s so that the resultant vectors capture characteristics of test patterns for hard-to-detect faults, as done in various forms of weighted-random testing [34], [38], [54]. Test patterns leaving a PRPG can also be transformed in a more deterministic fashion as shown, for example, in [37], [46], and [55]. Along the same lines, we will demonstrate that PRESTO-produced LP test patterns are also capable of visibly improving a fault-coverage-to-pattern-count ratio.

Assuming that the toggle control register can also be driven by deterministic test data (see location of an additional multiplexer in the front of a shift register in Fig. 8), test patterns can be produced with better-than-average fault coverage. The proposed method begins by computing the PRESTO parameters,



Fig. 8. LP decompressor—modules in gray are disabled. The red items have been added.

as described in Section IV. Subsequently, ATPG is repeatedly invoked until either a desired PRESTO pattern count or a fault coverage limit is reached. The ATPG produces test cubes in one per fault fashion. The number of generated test cubes is limited (in each iteration) for performance reasons. As confirmed by many experiments, the properly selected limit has a negligible impact on test quality. The obtained test cubes are now deployed to arrive with the content of the control register, as described in the following.

Given the PRESTO switching code, our goal is now to find the corresponding distribution of 1s in the control register that maximizes the fault detection probability. The procedure starts by reducing each ATPG-produced test cube to a set of scan chains containing more than one specified bit. This set will be further referred to as a base. For example, let a test cube feature the following specified scan cells:  $\{(s, c): (4, 13), (4, 2), (13, 34), (13, 31), (45, 11)\}$ , where  $s$  is a scan chain, and  $c$  is a cell location within the scan chain. The base is thus given by  $\{4, 13\}$ ; note that chain 45 is not included as it features only one specified scan cell. A good chance (50%) of producing a given logic value in a purely pseudorandom fashion is a rationale behind excluding from any base scan chains hosting a single specified bit. As a result, more bases can be subsequently combined together to produce a single control setting.

Given the phase shifter architecture, one can determine, for each base, the minimal number of phase shifter inputs—or equivalently the number of 1s in the toggle control register—required to activate the specified scan chains. These inputs are obtained by solving the minimum hitting set problem, where we find, in a greedy fashion, the minimal set of phase shifter inputs that intersects all subsets of phase shifter inputs capable of activating specified scan chains of a given base. Recall that the number of such inputs (and thus the number of 1s in the control register) is further constrained by the preselected switching code. For example, the switching code

0100 sets the limit on the number of 1s in the 32-bit control register to 8. Hence, if a base exceeds the limit, it is excluded from subsequent steps of the procedure. Finally, each base is assigned weight  $w$ , which is simply the number of specified bits in the corresponding test cube. It is worth noting that a reciprocal of  $w$  can be regarded as the likelihood of yielding the test pattern by a generator of purely pseudorandom vectors.

Let  $C$  be an initially empty set of bases. Once all weights are determined, we add to  $C$  a minimum-weight base. Next, every remaining base  $B$  is assigned a cost value, which is equal to the smallest number of 1s in the control register that would be required to activate all scan chains in  $\{C \cup B\}$ . A minimum-cost base (or a minimum-weight base if there are two or more bases with the same minimal cost) is then added to  $C$ , and costs associated with the remaining bases are recomputed accordingly. The procedure continues until either the limit of 1s in the control register is reached or all bases are already in  $C$ . The control register content that activates all scan chains from  $C$  is then provided to PRESTO.

For each control register setting, PRESTO is run to produce a certain number of pseudorandom test patterns. These patterns are subsequently fault-simulated, and detected faults are dropped from the list. Experimental results demonstrating feasibility of this method can be found in Section IX.

## VII. LP DECOMPRESSOR

In order to facilitate test data decompression while preserving its original functionality, the circuitry of Fig. 2 has to be rearchitected. This is shown in Fig. 8. The core principle of the decompressor is to disable both weighted logic blocks ( $V$  and  $H$ ) and to deploy deterministic control data instead. In particular, the content of the toggle control register can now be selected in a deterministic manner due to a multiplexer placed in front of the shift register. Furthermore, the Toggle and Hold registers are employed to alternately preset a 4-bit binary down counter, and thus to determine durations of the hold and toggle phases. When this circuit reaches the value of zero, it causes a dedicated signal to go high in order to toggle the T flip-flop. The same signal allows the counter to have the input data kept in the Toggle or Hold register entered as the next state.

Both the down counter and the T flip-flop need to be initialized every test pattern. The initial value of the T flip-flop decides whether the decompressor will begin to operate either in the toggle or in the hold mode, while the initial value of the counter, further referred to as an offset, determines that mode's duration. As can be seen, functionality of the T flip-flops remains the same as that of the LP PRPG (see Section III) but two cases. First of all, the encoding procedure (Section VIII) may completely disable the hold phase (when all hold latches are blocked) by loading the Hold register with an appropriate code, for example, 0000. If detected (No Hold signal in the figure), it overrides the output of the T flip-flop by using an additional OR gate, as shown in Fig. 8. As a result, the entire test pattern is going to be encoded within the toggle mode exclusively. In addition, all hold latches have to be properly initialized. Hence, a control signal First cycle



Fig. 9. Transitions (arrows) in a test cube.

produced at the end of the ring generator initialization phase reloads all latches with the current content of this part of the decompressor.

Finally, external ATE channels (feeding the original PRPG) allow one to implement a continuous flow test data decompression paradigm such as the dynamic LFSR reseeding. Given the size of PRPG, the number of scan chains and the corresponding phase shifter, the switching code, the offset, as well as the values kept in the Toggle and Hold registers, the entire decompressor will produce deterministic (decompressed) test patterns having a desired level of toggling provided the scan chains are balanced. The corresponding encoding procedure, including an appropriate selection of the aforementioned parameters, consists of steps described in Section VIII.

### VIII. ENCODING ALGORITHM

The decompressor architecture presented in Section VII is tightly coupled with a compression procedure. It partitions a given test pattern into several blocks corresponding alternately to hold and toggle periods. Recall that in the hold mode, all phase shifter inputs are frozen due to disabled hold latches, whereas the toggle mode allows certain inputs of the phase shifter to receive data from the ring generator provided the corresponding bits of the toggle control register are asserted. Since this register is updated once per pattern, scan chains driven only by disabled hold latches are loaded with constant values, and thus remain in the LP mode for the entire pattern. The remaining chains receive either constant values (the HCs) or results of XOR-ing certain outputs of PRPG (during the TCs) among which at least one is enabled.

The actual toggle rate (TR) percentage, measured as a weighted transition metric, is given by

$$TR = 50(n/S)(T/(T + H)) \quad (5)$$

where  $n$  is the number of scan chains driven by at least one enabled phase shifter input,  $S$  is the total number of scan chains, and  $T$  and  $H$  correspond to the durations of toggle and hold periods, respectively. It is also assumed that switching at the level of 50% corresponds to an LP mode turned off. The values of  $T$  and  $H$ , the offset cycles, as well as the content of the toggle control register form LP templates (LPTs). They are determined prior to further encoding steps based on the analysis of test cubes forming a cube pool. As a result, they allow merging and encoding successive test cubes in an incremental fashion, with no repetitions in a flow, as explained in the following.

First,  $c$  test cubes from the cube pool are used to initialize  $c$  LPT. We begin by mapping the test cubes into lists of transitions. Each transition is determined by two successive specified bits of the opposite logic values located in the same

Fig. 10. Steps to determine  $H$ ,  $T$ , and  $O$ .

scan chain. In addition to its flanking bits  $x$  and  $y$ , each transition is characterized by a span, i.e., the number of clock cycles separating  $x$  from  $y$ . It is worth noting that some specified bits contribute to two transitions, whereas other bits are not involved in forming any transitions, as shown in Fig. 9.

Having instantiated a given empty template, the corresponding list of transitions is used to arrive with the initial durations of the toggle ( $T$ ), hold ( $H$ ), and offset ( $O$ ) periods. These values are chosen conservatively such that the ratio  $T/H$  is minimal, and there are no transitions within a single hold period. The former condition ensures that the template can still accommodate some of newly produced test cubes. The latter condition can be rephrased as follows: for each transition either its span is greater than  $H$  or at least one of its flanking bits lies within a toggle period. The actual algorithm to yield the desired values of  $T$ ,  $H$ , and  $O$  can be summarized as follows (Fig. 10).

- 1) Given a test cube and its transitions, find the earliest transition ending point  $e$  (a black triangle in the figure) and assign a single bit toggle phase ( $T = 1$ ) to cycle  $e$ .
- 2) Mark all transitions crossing  $e$ , as they will not end up within a single hold period.
- 3) Increase the toggle period by extending it up to the next unmarked transition starting point. Repeat this step as long as the duration of the toggle period does not exceed a certain threshold (in this paper, ten cycles).
- 4) Find the next unmarked transition ending point  $e'$ —it determines a duration  $H$  of the hold period unless  $H$  is larger than a certain threshold. In the former case go to step 6, otherwise invoke step 5.
- 5) Find the value of  $H$  that minimizes the ratio  $T/H$  and, by adding new hold and toggle phases, keeps the cycle  $e'$  within a toggle period.
- 6) Set the offset period  $O$  to  $e \bmod (T + H) - H$ , if we begin with an incomplete toggle period, and  $O = e \bmod (T + H)$ , otherwise.
- 7) Adjust the values of  $H$ ,  $T$ , and  $O$  if some of the remaining unmarked transitions lie entirely within a single hold period (Fig. 10 shows this phenomenon for

a newly added red transition that must not stay within the hold period). Ensure that the sum  $T + H$  remains unchanged. The ratio  $T/H$ , on the other hand, may vary, thus its minimizing can guide this step toward an optimal solution. Note that, for example, enlarging the toggle period reduces the length of the hold period and it may also impact the number of offset cycles.

Once the above procedure completes, one has to make sure that all scan chains hosting transitions are enabled. This can be achieved as long as there is at least one enabled phase shifter input that feeds a given scan through an XOR gate within the phase shifter. Finding the minimal subset of the control register stages needed to activate the required scan chains is equivalent to solving the minimum hitting set problem. Furthermore, the switching activity associated with the template is checked by using formula (5) and compared against the desired toggling ratio  $\tau$ . If the resultant toggling is below  $\tau$ , then the test cube can be finally accepted as a part of the template. Otherwise, the test cube is not compressible given power constraints and is discarded. The template returns to its initial status.

When all templates have been initialized, we attempt to link them with the remaining (new) test cubes. If a template cannot accommodate certain transitions featured by a newly picked test cube, then the durations of toggle, hold, and offset periods can be further adjusted in a similar fashion to that of step 7 of the algorithm presented above. If the cube fits to the template, and new active scan chains are known, then we recalculate both the content of the toggle control register and the toggling rate. Again, if the toggling is above  $\tau$ , then the template returns to its previous form, while the test cube is passed to the next template. In addition, if none of the existing templates can accommodate the cube, it remains in the pool until another set of templates is generated such that this particular cube can be eventually assigned to its designated LPT.

The compression of test cubes treats the external test data as Boolean variables used to create linear expressions filling conceptually all scan cells. However, an equation assigned to a given scan cell depends not only on what is yielded by the ring generator, but also on whether a given phase shifter input is enabled or not. If a scan chain is disabled, then a single expression, produced during the first shift-in cycle, represents all of its cells. On the other hand, if a cell belongs to an active scan chain, then its equation is formed by XOR-ing: 1) the corresponding outputs of the ring generator if they are enabled through the hold latches; and 2) expressions produced during the first shift-in cycle on the disabled ring generator outputs. This expression will be used provided a scan cell is in the toggle mode. If it enters the hold mode, then its equation is going to be the same as that of the preceding and nearest cell which is in the toggle mode and belongs to the same chain. Since we only use 3-input XOR gates to create a phase shifter, there are seven different scenarios with at least one XOR tap enabled. Consequently, prior to any compression actions and to save CPU time, we prepare all possible equations for each scan cell, and subsequently select an appropriate expression when working with a particular LPT.

Having prepared all necessary equations, one can proceed with the test cube encoding. This is carried out in a manner

TABLE I  
CIRCUIT CHARACTERISTICS—128K RANDOM PATTERNS

|    | Gates | # scans | longest chain | TC [%] | EP     | WTM load | WSA [%] |
|----|-------|---------|---------------|--------|--------|----------|---------|
| D1 | 590K  | 175     | 137           | 90.59  | 7,583  | 49.84    | 21.80   |
| D2 | 830K  | 84      | 416           | 91.07  | 13,161 | 49.75    | 25.04   |
| D3 | 500K  | 128     | 353           | 85.36  | 9,362  | 49.71    | 18.94   |
| D4 | 1.4M  | 160     | 541           | 93.06  | 10,688 | 49.84    | 15.05   |
| D5 | 1.3M  | 203     | 300           | 91.18  | 17,066 | 49.67    | 22.51   |
| D6 | 220K  | 122     | 104           | 92.63  | 3,450  | 49.06    | 15.27   |
| D7 | 1.9M  | 524     | 258           | 85.89  | 19,929 | 49.60    | 28.42   |
| D8 | 3.6M  | 104     | 3,218         | 84.51  | 16,458 | 49.98    | 11.98   |

similar to that of the conventional EDT flow. It is worth noting, however, that participation of a given test cube in a template does not guarantee its actual merging and compression because of either conflicts on certain specified bits with other test cubes or limited encoding capabilities. Another notable difference between the presented approach and the traditional EDT scheme is the way compression aborts are reported. Typically, a test cube is regarded uncompressible if it cannot be encoded when merged as the first component of a test pattern. Here, the test cube is first employed, with other cubes, to form a template, which in turn modifies equations. Hence, an abort is reported only if the cube is used to make up a LPT, is then chosen as the first component of a test pattern, and its encoding fails. All compressed test cubes are removed from the cube pool, which is subsequently refilled. The algorithm continues by creating a new set of templates as long as the pool is not empty.

## IX. EXPERIMENTAL RESULTS

This section presents experimental results obtained for the PRESTO generator and several industrial designs whose characteristics are given in Table I. For each test case, the table provides the number of gates, the number of scan chains, and the size of the longest scan chain. Furthermore, the column TC reports the resultant test coverage after applying 128K pseudorandom test patterns produced by the PRESTO generator with its LP features disabled. The next column (EP) lists the corresponding number of test patterns that effectively contributed to that level of fault coverage. Finally, the last two columns provide the WTM load for scan shift-in operations and the weighted switching activity (WSA) during the capture operation. As can be seen, WTM remains close to 50%, as typically observed in scan vectors produced in a pseudorandom fashion.

The primary objective of the experiments was to measure test coverage as a function of several parameters, including:

- 1) the number of test patterns;
- 2) the switching activity code;
- 3) the duration of Toggle ( $T$ ) period;
- 4) the duration of Hold ( $H$ ) period.

The actual results are presented in Tables II and III for the industrial designs of Table I. In all experiments reported here,

TABLE II  
FAULT COVERAGE—128K LOW TOGLGING TEST PATTERNS

|    | Requested WTM |       |       |       |       |
|----|---------------|-------|-------|-------|-------|
|    | 5%            | 10%   | 15%   | 20%   | 25%   |
| D1 | 83.13         | 84.08 | 84.29 | 84.58 | 84.74 |
| D2 | 89.13         | 89.76 | 90.03 | 90.10 | 90.14 |
| D3 | 85.55         | 86.21 | 86.07 | 86.52 | 86.16 |
| D4 | 86.37         | 88.50 | 90.20 | 92.37 | 92.63 |
| D5 | 85.61         | 87.41 | 88.16 | 89.64 | 89.45 |
| D6 | 89.68         | 90.97 | 91.26 | 91.73 | 92.07 |
| D7 | 81.78         | 83.73 | 84.56 | 85.59 | 85.80 |
| D8 | 83.53         | 84.27 | 84.47 | 85.25 | 85.12 |

TABLE III  
LOW TOGLGING TEST PATTERN COUNT VERSUS RANDOM VECTORS

|                          | Requested WTM |       |      |      |      |
|--------------------------|---------------|-------|------|------|------|
|                          | 5%            | 10%   | 15%  | 20%  | 25%  |
| After 16K test patterns  |               |       |      |      |      |
| D1                       | 7.35          | 4.72  | 3.52 | 2.55 | 1.71 |
| D2                       | 1.51          | 1.43  | 0.64 | 0.69 | 0.70 |
| D3                       | 1.39          | 0.96  | 0.97 | 0.85 | 0.81 |
| D4                       | 6.41          | 3.62  | 2.58 | 1.72 | 1.41 |
| D5                       | 13.89         | 7.58  | 4.72 | 2.78 | 2.36 |
| D6                       | 4.90          | 2.98  | 2.17 | 1.72 | 1.34 |
| D7                       | 6.76          | 3.47  | 2.25 | 1.52 | 1.29 |
| D8                       | 2.16          | 1.67  | 1.52 | 0.96 | 0.90 |
| After 128K test patterns |               |       |      |      |      |
| D1                       | 9.62          | 0.70  | 0.59 | 0.41 | 0.29 |
| D2                       | 4.18          | 2.74  | 2.30 | 2.20 | 2.11 |
| D3                       | 0.90          | 0.61  | 0.70 | 0.57 | 0.64 |
| D4                       | 10.00         | 6.13  | 3.89 | 1.58 | 1.36 |
| D5                       | 27.78         | 12.66 | 8.10 | 3.34 | 3.92 |
| D6                       | 8.44          | 3.19  | 2.39 | 1.71 | 1.27 |
| D7                       | 11.83         | 4.44  | 2.66 | 1.23 | 1.04 |
| D8                       | 2.17          | 1.25  | 1.04 | 0.60 | 0.64 |

we have used the PRESTO generator with a 32-bit ring generator producing 128K pseudorandom test patterns in a LP mode. Table II is vertically partitioned into columns corresponding to five different (target) toggling rates. Switching activity codes as well as parameters  $H$  and  $T$  were selected automatically, as shown in Section IV. The columns of Table II list the fault coverage for successive test cases. As can be seen, the resultant fault coverage remains close to the reference coverage reported in Table I, while the switching activity is reduced to the desired levels of toggling. Note that some results indicate higher fault coverage if the scan chains receive the low toggling patterns rather than conventional pseudorandom vectors. Even if this is a circuit-specific feature, it nevertheless appears to be the case across several designs.

The objective of the analysis summarized in Table III was to determine the impact of our LP test generator performance on a pattern count. Alternatively, we would like to assess how long it takes to match fault coverage of purely pseudorandom test patterns (shown in the middle column of Table I) with vectors produced by the PRESTO generator. Let  $L(p)$  and  $R(p)$  denote fault coverage obtained by applying  $p$  low toggling and purely random test patterns, respectively. Clearly,

there are two possible scenarios: either  $L(p) < R(p)$  or  $L(p) > R(p)$ . In the first case, we can assess a pseudorandom test length  $q$  to get fault coverage  $L(p)$ , where  $q < p$ . The other case is symmetrical; we need to find the number of LP test patterns  $r$  that suffice to match fault coverage  $R(p)$ , where  $r < p$ . The entries of Table III, corresponding directly to those of Table II, are ratios  $v$  that (depending on one of the above scenarios) are either equal to  $p/q$  or  $r/p$ . Clearly,  $v < 1$  indicates cases where an LP test is shorter than its random counterpart. If  $v > 1$ , then the presented values are indicative of how many additional LP test patterns must be applied to obtain  $R(p)$ . In Table III, two horizontal segments present results for two values of  $p$ : 16K and 128K. As an example, the entry 2.78 for design D5, 16K vectors, and WTM = 20% indicates that the resultant fault coverage due to 16K low toggling test patterns can be reached almost three times faster by using pseudorandom tests. On the other hand, the entry 0.57 for design D3, 128K vectors, and WTM = 20% indicates that LP tests can offer the same fault coverage as that of 128K random patterns in approximately half shorter test time. One may also observe that for some test cases the ratio  $v$  is quite large. It occurs either for aggressively low toggling rates or in some designs where certain groups of faults are much more difficult to detect by means of test patterns with relatively low diversity of binary sequences.

The objective of the second group of experiments is to assess effectiveness of the scheme described in Section VI, i.e., to measure a degree of test time reduction that one can achieve when using a precomputed deterministic content of the control register as compared with application of pseudorandom patterns with otherwise similar power constraints. We present experimental results for industrial designs D1–D6 whose characteristics are given in Table I.

All experiments are conducted using 32-bit PRESTO generator producing 1K test patterns for each of 128 predetermined control register settings. Hence, the total amount of control data is limited to  $32 \times 128 = 4096$  b for 128K patterns. The number of test cubes generated in each iteration was set to 1000 resulting in typically three different control register settings per iteration (Section VI). In addition, in order to minimize the average number of specified bits occurring in test cubes, ATPG used a SCOAP-based decision order.

The experimental results for 10% toggle rate represented by the WTM are shown in Fig. 11. The presented curves correspond to the designs of Table I as follows. For BIST-ready designs D1 and D2, we depict their individual curves, while (in addition to their individual curves) a bold red line is averaging results over test cases D3, D4, D5, and D6. Given a number  $t$  of LP pseudorandom PRESTO-generated test patterns (and hence the corresponding fault coverage  $C$  not shown in the figure), a single entry in these plots demonstrates a difference (or equivalently a gain)  $t-g$ , where  $g$  is the number of test patterns applied by a deterministically controlled PRESTO to arrive at the fault coverage  $C$ . For example, consider circuit D2 and its gain curve. As can be seen, we need roughly 70K fewer vectors to reach the same fault coverage as that of 100K PRESTO-produced pseudorandom test patterns with the same switching activity. Clearly,



Fig. 11. Pattern count savings for 10% WTM.

test application time is reduced in this case by more than half. In the large majority of test cases, the deterministic control data allowed us to reduce the number of test patterns, and thus test application time, in a similar fashion. In particular, BIST-ready designs with a moderate number of scan chains witness considerably steep gain curves. We have also noticed little improvement in test time reduction for a few non-BIST-ready circuits. It appears that these designs have featured a large number of scan chains driven by a relatively small phase shifter. Increasing the number of phase shifter inputs typically alleviates the situation.

Fig. 12 plots fault coverage results obtained for two BIST-ready designs D1 and D2 while choosing different toggling rates and sweeping the number of applied test patterns. As can be seen, in all examined cases fault coverage of test patterns generated by a deterministically controlled PRESTO (solid lines) is visibly improved over the baseline results (dashed lines) obtained for PRESTO-produced pseudorandom patterns with a similar switching activity. The improvement in fault coverage occurs systematically across all toggling rates, and the deterministically controlled PRESTO outperforms its conventional counterpart for virtually all examined test durations.

Eventually, we experimentally assess performance of the compression scheme of Sections VII and VIII. Experiments are run on industrial designs whose characteristics are given in Table IV. Table V presents results of experiments conducted with 64-bit decompressors and the desired scan shift-in switching level set to 5%, 10%, and 15%. Again, the average WTM estimates the resultant switching activity for scan shift operations, while the average WSA measures toggling in the capture mode by observing the switching activity at each gate in the circuit. All experiments are conducted in such a way that the original EDT-based test coverage is always preserved.

As can be seen, in all examined test cases the resultant scan shift-in switching activity (WTM load) remains very close to the requested one. We have also observed a similar trend for other switching rates, for which results are not reported in Table V. It is worth noting that reducing the load switching has a positive impact on the switching activity during capture and unloading of scan chains. Hence, the corresponding two figures of merit are included in the table



Fig. 12. Fault coverage for two BIST-ready designs.

TABLE IV  
CIRCUIT CHARACTERISTICS

| Design | Gates | Scan cells | Scan chains | The longest chain | EDT inputs |
|--------|-------|------------|-------------|-------------------|------------|
| C1     | 1.4M  | 86.4K      | 160         | 541               | 2          |
| C2     | 2.0M  | 127K       | 523         | 256               | 2          |
| C3     | 3.6M  | 297K       | 104         | 3,488             | 10         |
| C4     | 1.3M  | 60.5K      | 203         | 300               | 2          |
| C5     | 1.0M  | 75K        | 160         | 470               | 2          |
| C6     | 226K  | 16.8K      | 122         | 138               | 2          |
| C7     | 1.1M  | 110K       | 861         | 128               | 2          |

as Capture WSA and WTM unload. It is also worth observing that the proposed solution is the first LP compression scheme that offers a mechanism to shape the power envelope in such a flexible and accurate fashion.

The last column reports the ratio  $V_P/V_F$ , where  $V_P$  is the volume of test data used to control the proposed scheme, and  $V_F$  is the corresponding amount of data used up by the LP EDT-based scheme presented in [8]. In addition to the actual seed variables,  $V_P$  comprises bits employed to feed the toggle control register, the Hold and Toggle registers, and the offset. Similarly,  $V_F$  includes seed variables and data necessary to control a broadcast scheme delivering gating signals to individual scan chains in a LP mode. Our solution gulps on the average only slightly more (1.05 times) test data than [8] for otherwise similar test coverage and switching activity. At the same time, the proposed technique delivers substantial functionality gains, as it is inherently capable of working as a programmable LP PRPG.

TABLE V  
EXPERIMENTAL RESULTS

| Design               | WTM load [%] | Capture WSA [%] | WTM unload [%] | Data volume versus [8] |
|----------------------|--------------|-----------------|----------------|------------------------|
| Target toggling: 5%  |              |                 |                |                        |
| C1                   | 5.16         | 15.16           | 12.29          | 1.29                   |
| C2                   | 7.24         | 25.96           | 11.22          | 1.30                   |
| C3                   | 5.69         | 11.72           | 10.66          | 1.32                   |
| C4                   | 6.41         | 21.73           | 26.12          | 1.14                   |
| C5                   | 4.94         | 6.13            | 5.97           | 0.87                   |
| C6                   | 5.72         | 18.30           | 16.42          | 1.59                   |
| C7                   | 5.63         | 28.69           | 27.03          | 0.84                   |
| Target toggling: 10% |              |                 |                |                        |
| C1                   | 9.84         | 14.33           | 16.39          | 1.18                   |
| C2                   | 11.64        | 26.28           | 14.25          | 0.91                   |
| C3                   | 9.48         | 13.00           | 14.63          | 1.42                   |
| C4                   | 9.59         | 21.28           | 26.97          | 1.14                   |
| C5                   | 8.80         | 6.65            | 9.60           | 1.07                   |
| C6                   | 10.07        | 18.97           | 21.04          | 0.88                   |
| C7                   | 9.48         | 29.51           | 28.56          | 0.67                   |
| Target toggling: 15% |              |                 |                |                        |
| C1                   | 15.05        | 13.66           | 20.64          | 1.16                   |
| C2                   | 15.55        | 26.53           | 16.97          | 0.91                   |
| C3                   | 14.21        | 12.04           | 18.93          | 0.75                   |
| C4                   | 14.60        | 20.76           | 29.14          | 0.96                   |
| C5                   | 13.61        | 6.28            | 14.07          | 1.16                   |
| C6                   | 14.98        | 19.31           | 25.11          | 0.76                   |
| C7                   | 14.52        | 29.77           | 30.47          | 0.87                   |

TABLE VI  
AREA OVERHEAD

|                 | Scheme  | Comb.  | Non-comb. | Total  | Ratio |
|-----------------|---------|--------|-----------|--------|-------|
| 100 scan chains | PRPG-32 | 621    | 307       | 928    | 1.00  |
|                 | F1-32   | 3,708  | 1,335     | 5,043  | 5.43  |
|                 | F2-32   | 3,703  | 1,567     | 5,270  | 5.68  |
|                 | F8-32   | 3,790  | 1,661     | 5,451  | 5.87  |
|                 | PRPG-64 | 825    | 613       | 1,438  | 1.00  |
|                 | F1-64   | 4,189  | 1,902     | 6,091  | 4.24  |
|                 | F2-64   | 4,246  | 2,324     | 6,570  | 4.57  |
|                 | F8-64   | 4,333  | 2,418     | 6,751  | 4.69  |
| 500 scan chains | PRPG-64 | 2,746  | 613       | 3,359  | 1.00  |
|                 | F1-64   | 10,854 | 2,473     | 13,327 | 3.97  |
|                 | F2-64   | 11,096 | 2,894     | 13,990 | 4.16  |
|                 | F8-64   | 11,177 | 2,985     | 14,162 | 4.22  |

The silicon real estate taken up by the proposed test logic amounts to an equivalent area of 2-input NAND gates, as shown in Table VI. It provides the actual area costs computed with a commercial synthesis tool for three architectures shown in Figs. 1, 2, and 8 by using 32- and 64-bit ring generators (in the table denoted as F1-32, F2-64, and so on) feeding either  $n = 100$  (the upper part) or  $n = 500$  (the lower part) scan chains. All components of our test logic were synthesized using a 90-nm CMOS standard cell library under 3.5-ns timing constraint. The table reports the resultant silicon area with respect to combinational and noncombinational devices. The total area is then compared with the corresponding area

occupied by a conventional PRPG (typically, the XOR network of a phase shifter consists of  $n$  3-input gates in addition to  $m$  flip-flops forming the ring generator—this reference area is reported in rows labeled as PRPG). For example, a 64-bit LP generator of Fig. 2 is 4.57 times larger than its standard counterpart, whereas it offers exceptional LP features. Consequently, the numbers of Table VI make the proposed scheme attractive as far as its silicon cost is concerned.

## X. CONCLUSION

As shown in the paper, PRESTO—the LP generator—can produce pseudorandom test patterns with scan shift-in switching activity precisely selected through automated programming. The same features can be used to control the generator, so that the resultant test vectors can either yield a desired fault coverage faster than the conventional pseudorandom patterns while still reducing toggling rates down to desired levels, or they can offer visibly higher coverage numbers if run for comparable test times. This LP PRPG is also capable of acting as a fully functional test data decompressor with the ability to control scan shift-in switching activity through the process of encoding. The proposed hybrid solution allows one to efficiently combine test compression with logic BIST, where both techniques can work synergistically to deliver high quality test. It is therefore a very attractive LP test scheme that allows for trading-off test coverage, pattern counts, and toggling rates in a very flexible manner.

## REFERENCES

- [1] A. S. Abu-Issa and S. F. Quigley, "Bit-swapping LFSR for low-power BIST," *Electron. Lett.*, vol. 44, no. 6, pp. 401–402, Mar. 2008.
- [2] C. Barnhart *et al.*, "Extending OPMISR beyond 10x scan test efficiency," *IEEE Design Test*, vol. 19, no. 5, pp. 65–73, Sep./Oct. 2002.
- [3] S. Bhunia, H. Mahmoodi, D. Ghosh, S. Mukhopadhyay, and K. Roy, "Low-power scan design using first-level supply gating," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 13, no. 3, pp. 384–395, Mar. 2005.
- [4] M. Chatterjee and D. K. Pradham, "A novel pattern generator for near-perfect fault-coverage," in *Proc. 13th IEEE Very Large Scale Integr. (VTSI) Test Symp.*, Apr./May 1995, pp. 417–425.
- [5] F. Corno, M. Rebaudengo, M. S. Reorda, G. Squillero, and M. Violante, "Low power BIST via non-linear hybrid cellular automata," in *Proc. 18th IEEE Very Large Scale Integr. (VTSI) Test Symp.*, May 2000, pp. 29–34.
- [6] D. Das and N. A. Touba, "Reducing test data volume using external/LBIST hybrid test patterns," in *Proc. Int. Test Conf. (ITC)*, 2000, pp. 115–122.
- [7] R. Dorsch and H. Wunderlich, "Tailoring ATPG for embedded testing," in *Proc. Int. Test Conf. (ITC)*, 2001, pp. 530–537.
- [8] M. Filipek *et al.*, "Low power decompressor and PRPG with constant value broadcast," in *Proc. 20th Asian Test Symp. (ATS)*, Nov. 2011, pp. 84–89.
- [9] S. Gerstendorfer and H. Wunderlich, "Minimized power consumption for scan-based BIST," in *Proc. Int. Test Conf. (ITC)*, 1999, pp. 77–84.
- [10] V. Gherman, H. Wunderlich, H. Vranken, F. Hapke, M. Wittke, and M. Garbers, "Efficient pattern mapping for deterministic logic BIST," in *Proc. Int. Test Conf. (ITC)*, Oct. 2004, pp. 48–56.
- [11] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, "A test vector inhibiting technique for low energy BIST design," in *Proc. 17th IEEE VLSI Test Symp. (VTS)*, Apr. 1999, pp. 407–412.
- [12] P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, and H.-J. Wunderlich, "A modified clock scheme for a low power BIST test pattern generator," in *Proc. 19th IEEE VLSI Test Symp. (VTS)*, May 2001, pp. 306–311.
- [13] P. Girard, C. Landrault, S. Pravossoudovitch, A. Virazel, and H.-J. Wunderlich, "High defect coverage with low-power test sequences in a BIST environment," *IEEE Design Test*, vol. 19, no. 5, pp. 44–52, Sep. 2002.

- [14] P. Girard, N. Nicolici, and X. Wen, Ed., *Power-Aware Testing and Test Strategies for Low Power Devices*. New York, NY, USA: Springer-Verlag, 2010.
- [15] A.-W. Hakmi, S. Holst, H. Wunderlich, J. Schloffel, F. Hapke, and A. Glowatz, "Restrict encoding for mixed-mode BIST," in *Proc. 27th IEEE VLSI Test Symp. (VTS)*, May 2009, pp. 179–184.
- [16] A.-W. Hakmi *et al.*, "Programmable deterministic built-in self-test," in *Proc. IEEE VLSI Test Symp. (VTS)*, Oct. 2007, pp. 1–9, paper 18.1.
- [17] S. Hellebrand, H.-G. Liang, and H. Wunderlich, "A mixed mode BIST scheme based on reseeding of folding counters," in *Proc. Int. Test Conf. (ITC)*, 2000, pp. 778–784.
- [18] S. Hellebrand, J. Rajski, S. Tarnick, S. Venkataraman, and B. Courtois, "Built-in test for circuits with scan based on reseeding of multiple-polynomial linear feedback shift registers," *IEEE Trans. Comput.*, vol. 44, no. 2, pp. 223–233, Feb. 1995.
- [19] A. Hertwig and H.-J. Wunderlich, "Low power serial built-in self-test," in *Proc. Eur. Test Workshop (ETS)*, May 1998, pp. 49–53.
- [20] K. Ichino, T. Asakawa, S. Fukumoto, K. Iwasaki, and S. Kajihara, "Hybrid BIST using partially rotational scan," in *Proc. 10th Asian Test Symp. (ATS)*, 2001, pp. 379–384.
- [21] A. Jas, C. V. Krishna, and N. A. Touba, "Hybrid BIST based on weighted pseudo-random testing: A new test resource partitioning scheme," in *Proc. 19th IEEE Proc. VLSI Test Symp. (VTS)*, 2001, pp. 2–8.
- [22] A. Jas, C. V. Krishna, and N. A. Touba, "Weighted pseudorandom hybrid BIST," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 12, pp. 1277–1283, Dec. 2004.
- [23] R. Kapur, S. Mitra, and T. W. Williams, "Historical perspective on scan compression," *IEEE Design Test*, vol. 25, no. 2, pp. 114–120, Mar./Apr. 2008.
- [24] B. Koenemann, "LFSR-coded test patterns for scan designs," in *Proc. Eur. Test Conf. (ETC)*, 1991, pp. 237–242.
- [25] Y. Kim, M.-H. Yang, Y. Lee, and S. Kang, "A new low power test pattern generator using a transition monitoring window based on BIST architecture," in *Proc. Asian Test Symp. (ATS)*, Dec. 2005, pp. 230–235.
- [26] C. V. Krishna and N. A. Touba, "Reducing test data volume using LFSR reseeding with seed compression," in *Proc. Int. Test Conf. (ITC)*, 2002, pp. 321–330.
- [27] C. V. Krishna and N. A. Touba, "Hybrid BIST using an incrementally guided LFSR," in *Proc. 18th IEEE Int. Symp. Defect Fault Tolerance (DFT) VLSI Syst.*, Nov. 2003, pp. 217–224.
- [28] J. Lee and N. A. Touba, "LFSR-reseeding scheme achieving low-power dissipation during test," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 26, no. 2, pp. 396–401, Feb. 2007.
- [29] L. Lei and K. Chakrabarty, "Test set embedding for deterministic BIST using a reconfigurable interconnection network," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 23, no. 9, pp. 1289–1305, Sep. 2004.
- [30] L. Lei and K. Chakrabarty, "Hybrid BIST based on repeating sequences and cluster analysis," in *Proc. Design Autom. Test Eur. (DATE)*, Mar. 2005, pp. 1142–1147.
- [31] H.-G. Liang, S. Hellebrand, and H.-J. Wunderlich, "Two-dimensional test data compression for scan-based deterministic BIST," in *Proc. Int. Test Conf. (ITC)*, 2001, pp. 894–902.
- [32] X. Lin and J. Rajski, "Adaptive low shift power test pattern generator for logic BIST," in *Proc. 19th IEEE Asian Test Symp. (ATS)*, Dec. 2010, pp. 355–360.
- [33] X. Liu and Q. Xu, "On simultaneous shift- and capture-power reduction in linear decompressor-based test compression environment," in *Proc. Int. Test Conf. (ITC)*, Nov. 2009, pp. 1–10, paper 9.3.
- [34] F. Muradali, V. K. Agarwal, and B. Nadeau-Dostie, "A new procedure for weighted random built-in self-test," in *Proc. Int. Test Conf. (ITC)*, Sep. 1990, pp. 660–669.
- [35] B. Nadeau-Dostie, K. Takeshita, and J.-F. Cote, "Power-aware at-speed scan test methodology for circuits with synchronous clocks," in *Proc. Int. Test Conf. (ITC)*, Oct. 2008, pp. 1–10, paper 9.3.
- [36] M. Nourani, M. Tehranipoor, and N. Ahmed, "Low transition LFSR for BIST-based applications," in *Proc. 14th Asian Test Symp. (ATS)*, Dec. 2005, pp. 138–143.
- [37] S. Pateras and J. Rajski, "Generation of correlated random patterns for the complete testing of synthesized multi-level circuits," in *Proc. Design Autom. Conf. (DAC)*, Jun. 1991, pp. 347–352.
- [38] I. Pomeranz and S. M. Reddy, "3-weight pseudo-random test generation based on a deterministic test set for combinational and sequential circuits," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 12, no. 7, pp. 1050–1058, Jul. 1993.
- [39] J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, "Embedded deterministic test," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 23, no. 5, pp. 776–792, May 2004.
- [40] J. Rajski, J. Tyszer, G. Mrugalski, and B. Nadeau-Dostie, "Test generator with preselected toggling for low power built-in self-test," in *Proc. IEEE 30th VLSI Test Symp. (VTS)*, Apr. 2012, pp. 1–6.
- [41] P. M. Rosinger, B. M. Al-Hashimi, and N. Nicolici, "Low power mixed-mode BIST based on mask pattern generation using dual LFSR reseeding," in *Proc. IEEE Int. Conf. Comput. Design (ICCD)*, Sep. 2002, pp. 474–479.
- [42] T. Saraswathi, K. Ragini, and C. G. Reddy, "A review on power optimization of linear feedback shift register (LFSR) for low power built in self test (BIST)," in *Proc. 3rd Int. Conf. Electron. Comput. Technol. (ICECT)*, Apr. 2011, pp. 172–176.
- [43] B. Singh, A. Khosla, and S. Bindra, "Power optimization of linear feedback shift register (LFSR) for low power BIST," in *Proc. IEEE Int. Adv. Comput. Conf. (IACC)*, Mar. 2009, pp. 311–314.
- [44] N. A. Touba, "Survey of test vector compression techniques," *IEEE Design Test*, vol. 23, no. 4, pp. 294–303, Apr. 2006.
- [45] N. A. Touba and E. J. McCluskey, "Transformed pseudo-random patterns for BIST," in *Proc. 13th IEEE VLSI Test Symp. (VTS)*, Apr./May 1995, pp. 2–8.
- [46] N. A. Touba and E. J. McCluskey, "Bit-fixing in pseudorandom sequences for scan BIST," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 20, no. 4, pp. 545–555, Apr. 2001.
- [47] S. Wang, "Generation of low power dissipation and high fault coverage patterns for scan-based BIST," in *Proc. Int. Test Conf. (ITC)*, 2002, pp. 834–843.
- [48] S. Wang and S. K. Gupta, "DS-LFSR: A BIST TPG for low switching activity," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 21, no. 7, pp. 842–851, Jul. 2002.
- [49] S. Wang and S. K. Gupta, "LT-RTPG: A new test-per-scan BIST TPG for low switching activity," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 25, no. 8, pp. 1565–1574, Aug. 2006.
- [50] Z. Wang and K. Chakrabarty, "Test data compression for IP embedded cores using selective encoding of scan slices," in *Proc. Int. Test Conf. (ITC)*, Nov. 2005, pp. 581–590.
- [51] P. Wohl, J. A. Waicukauski, S. Patel, and M. B. Amin, "X-tolerant compression and application of scan-ATPG patterns in a BIST architecture," in *Proc. Int. Test Conf. (ITC)*, Sep./Oct. 2003, pp. 727–736.
- [52] P. Wohl, J. A. Waicukauski, S. Patel, F. DaSilva, T. W. Williams, and R. Kapur, "Efficient compression of deterministic patterns into multiple PRPG seeds," in *Proc. Int. Test Conf. (ITC)*, Nov. 2005, pp. 916–925.
- [53] M.-F. Wu, J.-L. Huang, X. Wen, and K. Miyase, "Reducing power supply noise in linear-decompressor-based test data compression environment for at-speed scan testing," in *Proc. Int. Test Conf. (ITC)*, Oct. 2008, pp. 1–10, paper 13.1.
- [54] H. Wunderlich, "Multiple distributions for biased random test patterns," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 9, no. 6, pp. 584–593, Jun. 1990.
- [55] H.-J. Wunderlich and G. Kiefer, "Bit-flipping BIST," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD)*, Nov. 1996, pp. 337–343.
- [56] X. Zhang and K. Roy, "Power reduction in test-per-scan BIST," in *Proc. 6th IEEE Int. On-Line Test. Workshop (OLTW)*, Jul. 2000, pp. 133–138.
- [57] C. Zoellin, H. Wunderlich, N. Maeding, and J. Leenstra, "BIST power reduction using scan-chain disable in the cell processor," in *Proc. Int. Test Conf. (ITC)*, Oct. 2006, pp. 1–8, paper 32.3.



**Michał Filipek** received the B.S. degree in electronics and telecommunications from Wrocław University of Technology, Wrocław, Poland in 2011 and the M.S. degree in computer science from Poznań University of Technology, Poznań, Poland, in 2013.

He is currently with Tequila Mobile S.A., Wrocław, Poland.



**Grzegorz Mrugalski** (M'06–SM'13) received the M.S. and Ph.D. degrees in electrical engineering from Poznań University of Technology, Poznań, Poland, in 1995 and 2002, respectively.

He joined Mentor Graphics Corporation, Wilsonville, OR, USA, in 2002. He was with the Institute of Electronics and Telecommunications, Poznań University of Technology. Since 2009, he has been a Manager with the Research and Development Laboratory for the Design-for-Test Products, Mentor Graphics Polska, Poznań, Poland.

He has co-authored over 40 technical papers in the area of VLSI testing and holds 21 U.S. patents. His current research interests include computer-aided design of digital circuits, design for testability, built-in self-test, and test compression.

Dr. Mrugalski was a co-recipient of the 2011 Best Paper Award at the IEEE European Test Symposium and the 2012 IEEE International Test Conference Most Significant Paper Award.



**Nilanjan Mukherjee** (S'87–M'89–SM'14) received the B.Tech. (honors) degree in electronics and electrical communication engineering from IIT Kharagpur, Kharagpur, India, in 1989 and the Ph.D. degree from McGill University, Montreal, QC, Canada, in 1996.

He is currently the Software Development Director of the Design-to-Silicon Division at Mentor Graphics Corporation, Wilsonville, OR, USA. He is a co-inventor of the EDT Technology, and was a Lead Developer for the leading test compression

tool in the industry, TestKompress. He has authored more than 45 technical articles, and is a co-inventor of 39 U.S. patents. His current research interests include next-generation test methodologies for deep submicrometer designs, test data compression, test synthesis, memory testing, and fault diagnosis.

Dr. Mukherjee was a co-recipient of the Best Paper Award at the 1995 IEEE VLSI Test Symposium, the Best Paper Award at the 2009 VLSI Design Conference, the 2006 IEEE Circuits and Systems Society Donald O. Pederson Outstanding Paper Award recognizing the paper on embedded deterministic test published in the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, and the 2012 IEEE International Test Conference Most Significant Paper Award. He served on the program committees of several IEEE conferences.



**Benoit Nadeau-Dostie** (M'79–SM'95) received the Ph.D. degree in electrical engineering from Université de Sherbrooke, Sherbrooke, QC, Canada, in 1985.

He was an Advisory Engineer at Bell-Northern Research (BNR), Ottawa, ON, Canada, from 1986 to 1994. He was the main Architect of BNRs Design-for-Testability strategy. Since 2009, he has been the Chief Architect at Mentor Graphics Corporation, Wilsonville, OR, USA. He was with the LogicVision, San Jose, CA, USA, as a Chief Scientist for

15 years. He edited one book and published several papers, and holds 42 U.S. patents related to memory, logic, and board testing.

Dr. Nadeau-Dostie was involved in many of the IEEE activities over the last 25 years of his career, including the Program Committees of International Conferences, the Working Group of the 1149.1 Standard, and an Editorial Board Member of the IEEE DESIGN AND TEST OF COMPUTERS MAGAZINE.



**Janusz Rajski** (A'87–SM'10–F'11) received the M.S. degree in electrical engineering from the Technical University of Gdańsk, Gdańsk, Poland, in 1973 and the Ph.D. degree in electrical engineering from Poznań University of Technology, Poznań, Poland, in 1982.

He was a Faculty Member with the Poznań University of Technology from 1973 to 1984. In 1984, he joined McGill University, Montreal, QC, Canada, where he became an Associate Professor in 1989. In 1995, since 1995, he has been a Chief Scientist at Mentor Graphics Corporation, Wilsonville, OR, USA. He has authored more than 150 research papers, and is a co-inventor of 84 U.S. patents. He is also the Principal Inventor of the Embedded Deterministic Test Technology used in the first commercial test compression product, TestKompress. His current research interests include the testing of VLSI systems, design for testability, built-in self-test, and logic synthesis.

Dr. Rajski was a co-recipient of the Best Paper Award for the paper on logic synthesis published in the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS in 1993, the Best Paper Awards at the IEEE VLSI Test Symposium in 1995 and 1998, the Honorable Mention Awards at the IEEE International Test Conference in 1999 and 2003, the IEEE Circuits and Systems Society Donald O. Pederson Outstanding Paper Award recognizing the paper on embedded deterministic test published in the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS in 2006, the Best Paper Award at the VLSI Design Conference in 2009, the Best Paper Award at the IEEE European Test Symposium in 2011, and the IEEE International Test Conference Most Significant Paper in 2011. He has served on technical program committees of various conferences.



**Jędrzej Solecki** received the B.S. and M.S. degrees in computer science from Poznań University of Technology, Poznań, Poland, in 2010 and 2011, respectively, where he is currently working toward the Ph.D. degree with the Faculty of Electronics and Telecommunications.

Mr. Solecki was a co-recipient of the 2011 Best Paper Award at the IEEE European Test Symposium.



**Jerzy Tyszer** (M'91–SM'96–F'13) received the M.S. and Ph.D. degrees in electrical engineering from Poznań University of Technology, Poznań, Poland, in 1981 and 1987, respectively.

He was a Faculty Member of the Poznań University of Technology from 1982 to 1990. In 1990, he joined McGill University, Montreal, QC, Canada, where he was a Research Associate and an Adjunct Professor. Since 1996, he has been a Professor with the Faculty of Electronics and Telecommunications, Poznań University of Technology. He has authored eight books, more than 120 research papers in the above areas, and is a co-inventor of 55 U.S. patents. His current research interests include the design automation and testing of very large-scale integration (VLSI) systems, design for testability, built-in self-test, embedded test, and computer simulation of discrete event systems.

Prof. Tyszer has served on technical program committees of various conferences. He was a co-recipient of the Best Paper Awards at the IEEE VLSI Test Symposium in 1995 and 1998, the Honorable Mention Award at the IEEE International Test Conference in 2003, the IEEE Circuits and Systems Society Donald O. Pederson Outstanding Paper Award recognizing the paper on embedded deterministic test published in the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS in 2006, the Best Paper Award at the VLSI Design Conference in 2009, the Best Paper Award at the IEEE European Test Symposium in 2011, and the IEEE International Test Conference Most Significant Paper Award in 2012.