

# Letters to the Editor

## High Speed Stored Product Recursive Digital Filters

DANIEL DUBOIS AND WILLEM STEENAART

*Abstract* — A stored product digital filter (SPDF) architecture is developed that will allow for high speed signal processing, for the application of a single address code simultaneously to all product memories, and for the application of either linear binary or nonlinearly coded (PCM) signal codes. The application of this technique will extend real-time digital signal processing capabilities into the megahertz range.

### I. INTRODUCTION

Distributed arithmetic [1] achieves multiplications by the storing of all possible binary sums of the filter coefficients followed by additions. This method has proved its advantage with respect to power dissipation, speed, and cost, in particular when data are processed serially. Structures for parallel processing using this method are also available, but when applied to recursive filters there is a limitation in speed due to the propagation of the

Manuscript received February 13, 1981; revised June 16, 1981, September 14, 1981, and October 27, 1981. This work was supported by the Natural Sciences and Engineering Research Council under Grant A8572, and by a Research Fellowship received from the "Fonds National Suisse de la Recherche Scientifique."

D. Dubois was with the University of Ottawa, Ottawa, Canada. He is now with the Department of Electrical Engineering, Royal Military College, Kingston, Ont., Canada.

W. Steenaart is with the Department of Electrical Engineering, University of Ottawa, Ottawa, Ont K1N 6N5, Canada.

intermediate results through an adder tree of  $\log_2 B$  stages where  $B$  is the signal wordlength.

The stored product digital filter (SPDF) architecture as formulated in [2] and [3] presents an alternative for the elimination of the multipliers through the use of ROM's. Nonlinear quantization of the signal has been proposed [3] to control the number of product bits to be stored.

The purpose of this paper is to present a general configuration for recursive filters involving a minimum number of arithmetic operations to be performed in one cycle. Another way for the reduction of the memory capacity is also proposed, so that linear address codes can be used effectively.

Examples show that high speed, 10 MHz or higher, word throughput rates for parallel operations in two's complement fixed-point arithmetic are feasible with reasonable memory size and standard logic devices.

The high speed possible, combined with the elimination of the coefficient quantization effect [4] makes the SPDF very attractive for digital signal processing of video, radar, and high speed data signals.

### II. MINIMUM NUMBER OF ARITHMETIC OPERATIONS IN RECURSIVE FILTER STRUCTURES

Digital filter networks are collections of interconnected unit delays, adders, and multipliers. The state of the filter, at instant  $nT$ , is entirely determined by the signal values appearing at the output of each unit delay. A linear combination (i.e., a sum of

0098-4094/82/0600-0390\$00.75 ©1982 IEEE



Fig. 1.  $N$ -order canonic realization.



Fig. 2. General  $N$ th-order filter structure.

products) of these signal values, including the input signal  $x(n)$ , will yield the input signals to each unit delay element. At the next time  $(n+1)T$ , these resulting signals will constitute the new state of the filter. Thus the maximum speed will be determined by the duration of the longest sequence of arithmetic operations to be performed in one cycle among all the possible paths from the output of a unit delay element to the next input. These operations will be composed of, at least, one addition and one multiplication. Thus the minimal operation time possible and the speed of the digital filter is determined. In the following, we will derive the configuration for which all paths contain not more than one multiplier and one adder. In addition, the structure is arranged so that all ROM's are addressed simultaneously by the same address code, so that a single input, multiple output ROM can be used and all memory elements can be located on a single chip.

### III. FILTER STRUCTURE FOR MAXIMUM SPEED OF OPERATION

Starting from the canonic  $N$ th-order structure, one separates the recursive and nonrecursive parts, as in Fig. 1. Applying the transposition rules of a graph [5], and connecting the recursive and nonrecursive parts at node  $C$  (dotted line) instead of  $A$ , the structure represented in Fig. 2 is obtained which has the following transfer function:

$$\frac{Y(z)}{X(z)} = \frac{a_0 + a_1 z^{-1} + \cdots + a_N z^{-N}}{1 + b_1 z^{-1} + \cdots + b_N z^{-N}} z^{-N}. \quad (1)$$

The structure of Fig. 2 has no more than one multiplication and one addition in each path between unit delay elements, which limits the cycle time to the minimum possible for parallel arithmetic operations. There are in total  $2N$  adders and  $2N+1$



Fig. 3. General second-order filter section.

multipliers which constitutes the minimum number of arithmetic elements corresponding to the transfer function (1). In addition a single address code is applied to all ROM's.

In the following section, the second-order filter will be evaluated further, as this is a basic building block for any general signal processor. This may then be compared to existing second-order realizations.

### IV. SECOND-ORDER FILTER REALIZATION

For a second-order filter (1) reduces to

$$\frac{Y(z)}{X(z)} = \frac{a_0 + a_1 z^{-1} + a_2 z^{-2}}{1 + b_1 z^{-1} + b_2 z^{-2}} z^{-2}. \quad (2)$$

In (2) the second-order transfer function is multiplied by  $z^{-2}$  which is equivalent to an overall delay of two sample times. The circuit is represented in Fig. 3. A slight modification has been introduced at the input of the circuit. The unit delay element



Fig. 4. Split ROM multiplier.



Fig. 5. Block diagram of the hardware implementation of a 16-bit second-order filter section.

following the first adder in Fig. 2, has been moved to both input lines before it. This transformation, which does not change the transfer function (2), simplifies the cascade connection of second-order sections so that the rule of no more than one multiplication and one addition applies also in the interconnection between sections. The use of two adders in sequence at the input stage is allowed as the first addition takes place simultaneously with the  $-b_1$  coefficient memory access.

#### V. PRACTICAL DESIGN CONSIDERATIONS

An important property of the configurations shown in Figs. 2 and 3 can be noted. Since all ROM elements are connected at a common node, the address words for each memory at a given sample time or clock time are identical. This property may be used in several ways.

I) A single large memory including all the multipliers may be implemented on one chip. This possibility would constitute a very

interesting application for the concentrated efforts of manufacturers on gaining speed, ease of fabrication and lower consumption, along with density [6].

II) For second-order transfer functions having zeros on the unit circle, the coefficients  $a_2$  and  $a_0$  in (2) are identical, thus only one ROM may be used for both products. For the  $N$ th-order filter (1) we have  $a_N = a_0$  and also a reduction of one memory element.

III) A reduction of the length of the words stored in the memory may be achieved by wiring some bits in common, i.e., a single sign bit may be used commonly for all the products of the same sign.

Each stored product, generally of equal length, may be rounded separately according to a specific strategy. Thus in addition to the coefficient accuracy property [4], there is also the possibility of controlling limit cycles by controlled rounding, without additional hardware.

TABLE I  
TYPICAL PERFORMANCES REPORTED FROM MANUFACTURERS' CATALOGS

|                |                                | delay time (ns) |         |         |
|----------------|--------------------------------|-----------------|---------|---------|
|                |                                | 10 bits         | 12 bits | 16 bits |
| Addition       | 74S283 4-bit parallel adder    | 23              | 23      | 30      |
|                | 74S283 4-bit parallel adder    | 23              | 23      | 30      |
| Multiplication | 74S470 256 × 8 PROM            | --              | --      | 50      |
|                | 74186 64 × 8 PROM              | --              | 50      | --      |
|                | 74S188 32 × 8 PROM             | 25              | --      | --      |
| Storage        | 74S374 8 F-F registers         | 10              | 10      | 10      |
|                | Min. cycle time (ns)           | 81              | 106     | 120     |
|                | Max. throughput wordrate (MHz) | 12.3            | 9.4     | 8.3     |

## VI. REDUCING THE NUMBER OF BITS STORED

Basic to the SPDF technique is that all the possible products are stored and consequently the memory capacity required appears to be large, i.e.,  $B \cdot 2^B$  per coefficient where  $B$  is the signal wordlength. For example, for  $B \geq 12$ , at this time MOS technology is required where large memory capacities are available.

A first alternative to this problem has already been proposed [3], which makes use of a nonlinear quantization of the signal to obtain a substantial reduction of the number of bits to be stored. A nonlinear quantizer  $Q$  would have to be introduced in the structure before the common address node in Figs. 2 or 3.

Another method is to split the signal wordlength  $B$  into two parts of length  $B/2$ . Then all the products of each part by the same coefficient are stored separately and added as shown in Fig. 4. This method reduces the storage requirement to  $\frac{3}{2}B \cdot 2^{B/2}$  but requires one supplementary parallel adder for each ROM multiplier. The increase in the cycle time due to that addition will be largely compensated for by the smaller memories and the subsequent reduction in address time.

## VII. PERFORMANCE

Let us estimate the speed of operation and the hardware requirement of the SPDF.

Three types of elements are interconnected, i.e., parallel adders, PROM's or ROM's, and storage registers.

For recursive filters, as shown in Section III, one addition, one multiplication, and one storage in a register must be achieved in every cycle. The cycle time is either  $(2t_a + t_m + t_r)$  for split ROM multipliers or  $(t_a + t'_m + t_r)$  for ROM multipliers with linear address codes of length  $B \leq 12$ . Since  $t'_m > t_m$  the total cycle time may be about the same in the two cases. Since a fully parallel implementation is used, the clock frequency is the same as the sample frequency of the input signal. A higher order digital filter may simply be obtained by connecting in cascade sections, as in Fig. 3, where all the storage registers are driven by the same unique clock signal. Another solution would be to realize the filter of Fig. 2 directly as an  $N$ th-order filter.

In Table I are shown some actual values for a second-order filter section processing 10, 12, and 16 bits including a sign bit with split ROM multipliers in TTL Bipolar technology. A block

diagram of the hardware required for the 16-bit filter is represented in Fig. 5.

## VIII. CONCLUSION

The conditions imposed on digital filters allowing the realization of maximum processing speed for a minimum hardware complexity have been investigated. A general configuration for digital filters using stored products in read only memory elements (ROM's) has been proposed. With this configuration high speed (i.e., 10 MHz or higher word rates are achieved) with a simple hardware implementation requiring standard devices and only a single clock signal for driving the storage registers. Two methods for the reduction of the required memory capacity have been proposed.

Another important property of the SPDF is the elimination of the errors due to finite coefficient wordlength [4]. This make the SPDF a very attractive method for achieving high speed as well as high accuracy in digital signal processing.

## REFERENCES

- [1] A. Peled and B. Liu, "A new hardware realization of digital filters," *IEEE Trans. Acoust. Speech, Signal Processing*, vol. ASSP-22, pp. 456-462, Dec. 1974.
- [2] O. Monkewich and W. Steenaart, "Companding for digital filters" in *Proc. 1975 IEEE ISCAS*, pp. 68-71.
- [3] ——, "Stored product digital filtering with non-linear quantization" in *Proc. 1976 IEEE ISCAS*, pp. 157-160.
- [4] W. Steenaart, D. Dubois, and O. Monkewich, "Stored-product digital filtering, structures, potential and applications" in *European Conf. on Circuit Theory and Design*, Aug. 25-28, 1981, The Hague, The Netherlands) Conf. Proc., pp. 118-126.
- [5] A. Fettweis, "A general theorem for signal-flow networks, with applications," *Arch. Elektron. Uebertrag.*, vol. 25, pp. 557-561, 1971.
- [6] J. A. Ludwig, "A 50kbit Schottky cell bipolar read only memory," *IEEE J. Solid-State Circuits*, vol. SC-15, pp. 816-820, Oct. 1980.