

# Computer Aided Digital Design

- Hitesh Pranav

# Unit - 1 Combinational Logic Design

Levels of abstraction for an electronic computing system

- The critical technique for managing complexity is abstraction
- Abstraction : hiding details which aren't important
- System can be viewed from many different levels of abstraction
- 9 levels are there

Application Software ⇒ Uses the facilities provided by operating system to solve a problem for the user



Operating System ⇒ Handles low level details like accessing hard drive or managing memory



Architecture ⇒ Particular architecture can be implemented by one of many different microarchitectures with different price / performance / power trade-offs



Microarchitecture ⇒ Links logic & architecture levels of abstraction

Involves combining logic elements to execute the instructions defined by architecture



Logic ⇒ Complex Structures like adders / memories are built from digital circuits



Digital Circuits ⇒ Logic gates restrict voltages to discrete ranges which are indicated by 0 & 1



Analog Circuits ⇒ Devices are assembled to create components like amplifiers



Devices ⇒ System is constructed from electronic devices called transistors



Physics ⇒ Motion of electrons



## Number System

→ Decimal Numbers

→ Referred as base 10

$$\text{ex: } 9742_{10} = 9 \times 10^3 + 7 \times 10^2 + 4 \times 10^1 + 2 \times 10^0$$

Nine Thousands      Seven Hundreds      Four Tens      Two Ones

→ Binary Numbers

→ Bits represents one of 2 values, 0 & 1 joined together to form binary numbers

→ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048... column weights

$$\text{ex: } 10110_2 = 1 \times 2^4 + 0 \times 2^3 + 1 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 = 22_{10}$$

one sixteen      no eight      one four      one two      no one

## → Decimal to Binary

ex:  $84_{10}$  → Binary ?

### Method 1

$$\begin{array}{r} 84_{10} \\ 84 - 64 = 20 \\ 20 - 16 = 4 \\ \hline 1010100_2 \end{array} \quad \begin{array}{r} 64 \times 1 \\ 16 \times 1 \\ 4 \times 1 \end{array}$$

### Method 2

$$\begin{array}{r} 84 \\ 2 | 84 \quad 0 \\ 2 | 42 \\ 2 | 21 \\ 2 | 10 \\ 2 | 5 \\ 2 | 2 \\ 2 | 1 \\ \hline 0 \end{array} \Rightarrow 1010100_2$$

## → Hexadecimal Numbers

- Uses digits 0 to 9 and letters A - F
- Columns in base 16 - 1, 16, 256, 4096 ...

|          |          |
|----------|----------|
| 0 - 0000 | 8 - 1000 |
| 1 - 0001 | 9 - 1001 |
| 2 - 0010 | A - 1010 |
| 3 - 0011 | B - 1011 |
| 4 - 0100 | C - 1100 |
| 5 - 0101 | D - 1101 |
| 6 - 0110 | E - 1110 |
| 7 - 0111 | F - 1111 |

## → Hexadecimal to Binary & Decimal

ex:  $2ED_{16}$

$$\begin{array}{l} 2_{16} = 0010 \\ E_{16} = 1110 \\ D_{16} = 1101 \end{array} \quad \left. \right\} \quad 2ED_{16} = 001011101101_2$$

$$2ED_{16} = 2 \times 16^2 + E \times 16^1 + D \times 16^0 = 749_{10}$$

Two  
Two hundred  
fifty six's      Fourteen  
Sixteens      Thirteen  
                       Ones

## → Binary to Hexadecimal

ex:  $1111010_2$  → Hexadecimal ?

Read from right

$$\begin{array}{r} 0111 \quad 1010 \\ \downarrow \quad \downarrow \\ 7 \quad A \end{array} \Rightarrow 7A_{16}$$

## → Decimal to Hexadecimal

ex:  $333_{10}$  → Hexadecimal ?

$$\begin{array}{r} 333 \quad 256 \times 1 \\ 333 - 256 = 77 \quad 16 \times 4 \\ 77 - 64 = 13 \quad 14 D_{16} \end{array}$$

## → Bytes

- Group of 8 bits is a byte
- $2^8 = 256$  possibilities
- Size of objects in computer usually measured in bytes

## → Nibble

- Group of 4 bits is a nibble
- $2^4 = 16$  possibilities
- One hexadecimal digit stores 1 nibble
- Two hexadecimal digits store 1 byte

Q.  $(41.6875)_{10} = ?$

A.

$$\begin{array}{r} 2 | 41 \\ 2 | 20 \\ 2 | 10 \\ 2 | 5 \\ 2 | 2 \\ 2 | 1 \\ \hline 0 \end{array}$$

$$\begin{array}{l} 0.6875 \times 2 = 1.375 \rightarrow 1 \\ 0.375 \times 2 = 0.75 \rightarrow 0 \\ 0.75 \times 2 = 1.5 \rightarrow 1 \\ 0.5 \times 2 = 1 \rightarrow 1 \end{array}$$

$$(41.6875)_{10} = (101001.1011)_2$$

<https://www.rapidtables.com/convert/number/decimal-to-binary.html?x=41.6875>

use for decimal to binary conversion checking

$2^{10} = 1024 \Rightarrow 1 \text{ Kilo}$   
 $2^{20} = 1 \text{ million} \Rightarrow 1 \text{ Mega}$   
 $2^{30} = 1 \text{ billion} \Rightarrow 1 \text{ Giga}$   
 1024 bytes = 1 KB → Most used  
 1024 bits = 1 Kb  
 Communication Speed in bits/sec

## → Binary Addition

### → Simple rules

$$0 + 0 = 0$$

$$1 + 0 = 1$$

$$0 + 1 = 1$$

$$1 + 1 = 0 \text{ with 1 Carry}$$

$$1 + 1 + 1 = 1 \text{ with 1 Carry}$$

ex:  $(0\ 1\ 1\ 1)_2 + (0\ 1\ 0\ 1)_2$

carry  
 $\begin{array}{r} 1 & 1 \\ 0 & 1 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ \hline 1 & 1 & 0 & 0 \end{array}$

$$\Rightarrow 7 + 5 = 12$$

## → Addition with overflow

→ Addition is said to overflow if result is too big to fit in available digits

ex: carry  
 $\begin{array}{r} 1 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 \\ \hline 1 & 0 & 0 & 1 & 0 \end{array}$

$$13 + 5 = 18$$

## → Signed Binary Numbers

→ Several schemes exist to represent binary numbers but the 2 most widely used are

i) Sign / Magnitude Numbers

ii) Two's Complement

## → Sign / Magnitude Number

→ Uses most significant bit as the sign & remaining  $N-1$  bits as magnitude

→ 0 indicates positive

1 indicates negative

ex: 5 & -5  
 $\begin{array}{c} 0101 \\ \swarrow \quad \searrow \\ 1101 \end{array}$

## → Two's Complement

→ Identical to unsigned binary numbers except that most significant bit position has a weight of  $-2^{N-1}$  instead of  $2^{N-1}$

→ Process consists of inverting all of the bits in the number & then adding 1 to least significant bit position

ex: To get  $-2_{10}$ , invert  $2_{10}$  and add 1

$$2_{10} = 0010_2$$

$$\text{invert} \Rightarrow 1101_2$$

$$\text{add 1} \Rightarrow 1110_2 = -2_{10}$$

ex:  $3_{10} - 5_{10}$

$$= 3_{10} + (-5)_{10}$$

$$3_{10} = 0011$$

$$5_{10} = 0101 \Rightarrow -5_{10} = 1010 + 1 = 1011$$

$$3_{10} + (-5)_{10} = \begin{array}{r} 0011 \\ 1011 \\ \hline 1110 \end{array} = -2_{10}$$

$$\begin{aligned} \text{ex: } -7_{10} + 7_{10} \\ = 1001_2 + 0111_2 = 10000_2 \end{aligned}$$

5th bit is discarded, leaving correct 4 bit result  $0000_2$

- N-bit two's complement numbers represent one of  $2^N$  possible values but the values are split b/w +ve & -ve numbers
- So, a 4 bit two's complement number represents 16 values: -8 to 7 in general  $[-2^{N-1}, 2^{N-1}-1]$
- Adding 2 N-bit +ve numbers or -ve numbers may cause overflow if result  $> 2^{N-1}-1$  or result  $< -2^{N-1}$   
But adding a +ve number to a -ve number never causes overflow

$$\text{ex: } 4_{10} + 5_{10} = 0100_2 + 0101_2 = 1001_2 = 9_{10} \text{ or } -7_{10}$$

The result overflows range of 4-bit +ve two's complement numbers, producing an incorrect negative result. If computation had been done using 5 or more bits, the result  $01001_2 = 9_{10}$  would have been correct

- When a two's complement number is extended to more bits, the sign bit must be copied into the most significant bit positions. This is called **sign extension**

## Comparison of Number System

| System           | Range                         |
|------------------|-------------------------------|
| Unsigned         | $[0, 2^N - 1]$                |
| Sign / Magnitude | $[-2^{N-1} + 1, 2^{N-1} - 1]$ |
| Two's Complement | $[-2^{N-1}, 2^{N-1} - 1]$     |

- In digital electronics, a circuit is a network that processes discrete-valued variables
- 



→ A black box with one or more discrete-valued input terminals one or more discrete-valued output terminals a functional specification describing the relationship b/w inputs & outputs a timing specification describing the delay b/w inputs changing and outputs responding

- Circuits have nodes & elements

An element itself is a circuit with inputs, outputs & specification

A node is a wire, whose voltage conveys a discrete-valued variable



## → Combinational Circuit

→ For a circuit to be combinational:

→ Every circuit element is itself combinational

→ Every node of circuit is designated as an input to the circuit or connects to exactly one output terminal of a circuit element

→ The circuit contains no cyclic paths

ex:



## → Multilevel Combinational Logic

→ Logic in sum-of-products form is called two-level logic because it consists of literals connected to a level of AND gates connected to a level of OR gates

→ Hardware reduction:

→ Some logic functions require enormous amount of hardware when built using two-level logic

→ 8 input XOR using 2 level techniques requires 128 eight input AND gates and one 128 - input OR gate for a two-level sum-of-products implementation but better to use a tree of two-input XOR gates

## → Bubble pushing

→ Shortcut method for forming equivalent logic circuits, based on De-Morgan's Theorem

$$\overline{AB} = \overline{A} \cdot \overline{B}$$

$$\overline{A+B} = \overline{A} \cdot \overline{B}$$

$$\overline{A \cdot B} = \overline{A} + \overline{B}$$



$$= \overline{A \cdot B} \cdot \overline{A \cdot B \cdot C} = \overline{A \cdot B} \cdot \overline{C} = \overline{A \cdot B} + \overline{C} = \overline{A \cdot B \cdot C} + \overline{D}$$

## → X's and Z's

→ Boolean algebra is limited to 0's and 1's, however, real circuits can also have illegal & floating values, represented symbolically by X and Z

## → Illegal Value : X

→ X indicates that the circuit node has unknown or illegal value

→ Commonly appears if it is driven to both 0 and 1 at same time



→ This is called contention which is considered error & must be avoided

Usually voltage on a contented node is between 0 and V<sub>DD</sub>

It is often, but not always, in forbidden zone

→ Contention also causes large amounts of power to be flow between the gates & circuit getting hot & possibly damaged

→ X values are also sometimes used by circuit simulators to indicate an uninitialized value and digital designers also use 'X' to indicate "don't care" values in truth tables

## → Floating value : Z

→ Z indicates node is being driven neither HIGH nor LOW

which means node is said to be floating, high impedance, or high Z

→ In reality, floating node might be 0, 1 or at some voltage in between, depending on the history of the system

→ A floating node doesn't always mean there is an error in the circuit

→ One common way to produce a floating node is to forget to connect a voltage to a circuit input, or to assume that an unconnected input is same as an input with the value of 0.

→ Tristate buffer has 3 outputs : 0, 1, z



| A | E | Y |
|---|---|---|
| 0 | 0 | Z |
| 1 | 0 | Z |
| 0 | 1 | 0 |
| 1 | 1 | 1 |



| A | E | Y |
|---|---|---|
| 0 | 0 | 0 |
| 1 | 0 | 1 |
| 0 | 1 | Z |
| 1 | 1 | Z |

→ Commonly used on busses connecting multiple chips

→ For a system, each chip can connect to a shared memory bus using tristate buffers but only one chip is allowed to assert its signal to drive a value onto the bus at a time.

Other chips must produce floating values to avoid contention.

ex:



## Karnaugh Maps

→ They are graphical method for simplifying Boolean equations

3 Input function:

a) Truth Table

| A | B | C | Y |
|---|---|---|---|
| 0 | 0 | 0 | 1 |
| 0 | 0 | 1 | 1 |
| 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 0 |

b) K-map



$$Y = \overline{A} \overline{B} \overline{C} + \overline{A} \overline{B} C$$

$$= \overline{A} \overline{B}$$

c) K-map showing minterms



- Simply circle all the rectangular block of 1's in the map, using fewest no. of circles and each circle should be as large as possible, then read off the implicants that were circled
- A boolean equation is minimized when it is written as sum of the fewest no. of prime implicants and each circle on K-map represents a prime implicant
- Each circle must span a rectangular block that is a power of 2 (1, 2, 4, 8)

Q. Suppose we have the function  $Y = F(A, B, C)$  with K-map shown, Minimise the equation using the K-map



A.



$$A'B' + AC' + AB' = \underline{\underline{AC' + B'}}$$

prime implicants

Use for checking if your answer is correct or not  
<http://www.32x8.com/index.html>

Q.  $F(A, B, C) = \Sigma(0, 1, 5)$

A.



$$\Rightarrow \overline{AB} + \overline{BC}$$

Q.  $F(A, B, C) = \Sigma(0, 1, 5, 7)$

A.



$$= \overline{AB} + AC$$

## ex: SEVEN SEGMENT DISPLAY DECODER

→ 7-segment decoder takes a 4-bit data input  $D_{3:0}$  and produces 7 outputs to control LEDs to display digit from 0 to 9. The 7 outputs are often called segments 'a' to 'g', or  $S_a - S_g$ .



Q. Write truth table for outputs & use K-map to find boolean equation for outputs  $S_a \Delta S_b$  (Assume illegal input values (10-15) produce blank readout)

A.

| $D_{3:0} (D_3 D_2 D_1 D_0)$<br>Input | $S_a$ | $S_b$ | $S_c$ | $S_d$ | $S_e$ | $S_f$ | $S_g$ |
|--------------------------------------|-------|-------|-------|-------|-------|-------|-------|
| 0000 - 0                             | 1     | 1     | 1     | 1     | 1     | 1     | 0     |
| 0001 - 1                             | 0     | 1     | 1     | 0     | 0     | 0     | 0     |
| 0010 - 2                             | 1     | 1     | 0     | 1     | 1     | 0     | 1     |
| 0011 - 3                             | 1     | 1     | 1     | 1     | 0     | 0     | 1     |
| 0100 - 4                             | 0     | 1     | 1     | 0     | 0     | 1     | 1     |
| 0101 - 5                             | 1     | 0     | 1     | 1     | 0     | 1     | 1     |
| 0110 - 6                             | 1     | 0     | 1     | 1     | 1     | 1     | 1     |
| 0111 - 7                             | 1     | 1     | 1     | 0     | 0     | 0     | 0     |
| 1000 - 8                             | 1     | 1     | 1     | 1     | 1     | 1     | 1     |
| 1001 - 9                             | 1     | 1     | 1     | 0     | 0     | 1     | 1     |
| others (10-15)                       | 0     | 0     | 0     | 0     | 0     | 0     | 0     |



$$S_a = \overline{D}_3 D_1 + D_3 \overline{D}_2 \overline{D}_1 + \overline{D}_2 \overline{D}_1 \overline{D}_0 + \overline{D}_3 D_2 D_1$$

$$S_b = \overline{D}_3 \overline{D}_2 + \overline{D}_3 \overline{D}_1 \overline{D}_0 + \overline{D}_3 D_1 D_0 + D_3 \overline{D}_2 \overline{D}_1$$

## → Karnaugh maps: Don't cares

- They are indicated by the symbol X, which means entry can be either 0 or 1
- Don't cares also appear in truth table outputs where output value is unimportant or the corresponding input combination can never happen
- They can be treated as either 0's or 1's allowing for logic minimization
- They can be circled if they help & don't have to be circled if not helpful.

ex: SEVEN-SEGMENT DISPLAY DECODER WITH DON'T CARES



$$S_a = D_3 + D_2 + D_2 D_0 + \bar{D}_2 \bar{D}_0$$

$$S_b = \bar{D}_2 + \bar{D}_1 \bar{D}_0 + D_1 D_0$$

## → Combinational Building Blocks

- Combinational logic is often grouped into larger building blocks to build more complex systems
- This is an application of the principle of abstraction, hiding the unnecessary gate-level details to emphasize the function of the building block

## → Multiplexers

- They are the most commonly used combinational circuits

- They choose an output from among several possible inputs based on the value of a select signal

### → 2:1 Multiplexer



$$Y = D_0 \bar{S} + D_1 S$$



Schematic with tristate buffer

$$Y = D_0 \bar{S} + D_1 S$$

## → Wider Multiplexers

→ 4:1 Multiplexer has 4 data inputs and one output



Sum-of-products logic



Tristates



Hierarchical

→ Wider multiplexers, such as 8:1 & 16:1 multiplexers can also be built by expanding

→ In general,  $N:1$  multiplexer needs  $\log_2 N$  select lines

→ Multiplexer can be used as lookup tables to perform logic operations

| $S_0$ | $S_1$ | $y$   |
|-------|-------|-------|
| 0     | 0     | $D_0$ |
| 0     | 1     | $D_1$ |
| 1     | 0     | $D_2$ |
| 1     | 1     | $D_3$ |

$$\Rightarrow \bar{S}_0 \bar{S}_1 D_0 + \bar{S}_0 S_1 D_1 + S_0 \bar{S}_1 D_2 + S_0 S_1 D_3$$

Q. Show the implementation of two-input AND and XOR functions with 2:1 multiplexers

A.



→ 2-input XOR Gate



Q. Alyssa P. Hacker needs to implement the function  $Y = A\bar{B} + \bar{B}\bar{C} + \bar{A}BC$  to finish her senior project, but when she looks in her lab kit, the only part she has left is an 8:1 MUX. How does she implement the function?

A.

| A | B | C | Y |
|---|---|---|---|
| 0 | 0 | 0 | 1 |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | 1 |
| 1 | 0 | 0 | 1 |
| 1 | 0 | 1 | 1 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 0 |



Q. Alyssa turns on her circuit one more time before the final presentation and blows up the 8:1 multiplexer. She begs her friends for spare parts and they gave her a 4:1 multiplexer and an inverter. Can she build her circuit with only these parts?

A.



→ Decoders

- A decoder has  $N$  inputs &  $2^N$  outputs. It asserts exactly one of its output depending on input combination
- Outputs are called one-hot because exactly one is 'hot' (HIGH) at a given time



| Decoder |                | A <sub>1</sub> | A <sub>0</sub> | Y <sub>3</sub> | Y <sub>2</sub> | Y <sub>1</sub> | Y <sub>0</sub> |
|---------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| 11      | Y <sub>3</sub> | 0              | 0              | 0              | 0              | 0              | 1              |
| 10      | Y <sub>2</sub> | 0              | 1              | 0              | 0              | 1              | 0              |
| 01      | Y <sub>1</sub> | 1              | 0              | 0              | 1              | 0              | 0              |
| 00      | Y <sub>0</sub> | 1              | 1              | 1              | 0              | 0              | 0              |

$$Y_0 \bar{A}_0 \bar{A}_1 + Y_1 A_0 \bar{A}_1 + Y_2 \bar{A}_0 A_1 + Y_3 A_0 A_1$$

- an  $N: 2^N$  decoder can be constructed from  $2^N$  N-input AND gates

→ Timing

- An output takes time to change in response to an input change
- A timing diagram portrays the transient response of the buffer circuit when an input changes



Blue arrow indicates rising edge of  $Y$  that is caused by rising edge of  $A$ . The delay is measured from 50% point of input signal ( $A$ ) to 50% point of output signal ( $Y$ ) b/w LOW & HIGH changes

- Combinational logic is 2 types:

1) Propagational delay ( $t_{pd}$ ) -  
Max time from when an input changes until output(s) reach their final value

2) Contamination delay ( $t_{cd}$ ) -  
Min time from when an input changes until output(s) start to change its value



- Critical path is the longest therefore the slowest path
- Short path is the shortest therefore the fastest path

- Propagation delay of a combinational circuit is sum of propagation delays through each element on circuit path
- Contamination delay of a combinational circuit is sum of contamination delays through each element on circuit path

ex:



- Critical Path
- Short Path



Short Path

$$t_{cd} = t_{cd\text{-AND}}$$

Critical Path

$$t_{pd} = 2t_{pd\text{-AND}} + t_{pd\text{-OR}}$$

- Q. Ben Bitdiddle needs to find the propagation delay and contamination delay of the circuit shown in Figure. According to his data book, each gate has a propagation delay of 100 picoseconds (ps) and a contamination delay of 60 ps.

A. Solution:

- Ben begins by finding the critical path and the shortest path through the circuit.
- Hence,  $t_{pd}$  is three times the propagation delay of a single gate, or 300 ps.
- There are only two gates in the shortest path, so  $t_{cd}$  is 120 ps.



### CONTROL-CRITICAL VS. DATA-CRITICAL

Q. Compare the worst-case timing of the three four-input multiplexer designs shown in Figure. Table lists the propagation delays for the components. What is the critical path for each design? Given your timing analysis, why might you choose one design over the other?



| Gate                      | $t_{pd}(\text{ps})$ |
|---------------------------|---------------------|
| NOT                       | 30                  |
| 2-input AND               | 60                  |
| 3-input AND               | 90                  |
| 4-input OR                | 90                  |
| inverter ( $A$ to $Y$ )   | 30                  |
| inverter (enable to $Y$ ) | 3.5                 |

A.



$$\begin{aligned} t_{pd-sy} &= t_{pd-\text{INV}} + t_{pd-\text{AND3}} + t_{pd-\text{OR4}} \\ &= 30 + 80 + 90 \\ &= 200 \text{ ps} \end{aligned}$$

$$\begin{aligned} t_{pd-dy} &= t_{pd-\text{AND3}} + t_{pd-\text{OR4}} \\ &= 170 \text{ ps} \end{aligned}$$

$$\max(t_{pd-sy}, t_{pd-dy}) = 200 \text{ ps} \rightarrow \text{Worst of the two}$$



$$\begin{aligned} t_{pd-sy} &= t_{pd-\text{INV}} + t_{pd-\text{AND2}} + t_{pd-\text{TR1-sy}} \\ &= 30 + 60 + 35 \\ &= 125 \text{ ps} \end{aligned}$$

$$\begin{aligned} t_{pd-dy} &= t_{pd-\text{TR1-ay}} \\ &= 50 \text{ ps} \end{aligned}$$



$$t_{pd-S0y} = t_{pd-\text{TR1-sy}} + t_{pd-\text{TR1-ay}} = 85 \text{ ps}$$

$$t_{pd-dy} = 2 t_{pd-\text{TR1-ay}} = 100 \text{ ps}$$

→ Best choice not only depends on critical path & input arrival times, but also power, cost, availability of parts etc.,

## Glitches

- Usually single input transition causes single output transition, but multiple output transitions are also possible, and are called Glitches (or) Hazards
- Glitches don't actually cause any harm, but we must recognize them



when  $A \& C = 0$  and  $B$  transitions from 1 to 0  
n<sub>2</sub> falls before n<sub>1</sub> rises

|   | AB | 00 | 01 | 11 | 10 |
|---|----|----|----|----|----|
| C | 0  | 1  | 0  | 0  | 0  |
|   | 1  | 1  | 1  | 1  | 0  |



→ If the circuitry implementing one of the prime implicants turns off before the circuitry of other prime implicant can turn on, there is a glitch

when n<sub>1</sub> rises, Y returns to 1  
Y initially starts at 1, and ends at 1, but glitches to 0  
We can avoid the glitches by adding another gate

In the previous example,

to fix it, we add another covering prime implicant boundary  
adding a redundant term and additional hardware

(NOTE: Glitches cannot be FIXED by adding hardware)



# Unit-2 Sequential Logic Design

- Outputs of sequential logic depend on both current & previous inputs  
Hence, Sequential Logic has memory
- It might explicitly remember certain previous inputs or it might distill previous inputs into smaller ground of information called state of system
- State of digital sequential circuit is set of bits called state variables containing info about past necessary to explain future behaviour

## Bistable Element

- A fundamental building block of memory with 2 stable states
- Consists of pair of inverters connected in a loop



- Inverters are cross-coupled (input of  $I_2$  = Output of  $I_1$  & vice-versa)



## SR Latch

- Simplest sequential circuit which is composed of 2 cross coupled NOR gates

| S | R | Q          | $\bar{Q}$        |
|---|---|------------|------------------|
| 0 | 0 | $Q_{prev}$ | $\bar{Q}_{prev}$ |
| 0 | 1 | 0          | 1                |
| 1 | 0 | 1          | 0                |
| 1 | 1 | 0          | 0                |

(Here  $\bar{Q}$  is not the inverse of  $Q$ )



- If either NOR gates see one TRUE (1) input, they produce FALSE (0) output
- To SET means to make it TRUE ( $Q=1$  &  $\bar{Q}=0$ )
- To RESET means to make it FALSE ( $Q=0$  &  $\bar{Q}=1$ )
- Turning both off means the previous inputs are remembered
- Turning both on simultaneously doesn't make sense, hence impossible

## D Latch

- D Latch solves the problem of SR Latch
- Data Input (D) controls what next state should be
- Clock Input (CLK) controls what state should change

→

| CLK | D | Q          | $\bar{Q}$        |
|-----|---|------------|------------------|
| 0   | X | $Q_{prev}$ | $\bar{Q}_{prev}$ |
| 1   | 0 | 0          | 1                |
| 1   | 1 | 1          | 0                |

$$D = S$$

$$\bar{D} = R$$



(Also called transparent latch / level-sensitive latch)



## D Flip-Flop

- Built from 2 back-to-back D Latches controlled by complementary clocks
- First Latch, L1 → Master
- Second Latch, L2 → Slave
- Node between them → N1

→

| CLK | D | Q          | $\bar{Q}$        |
|-----|---|------------|------------------|
| 0   | X | $Q_{prev}$ | $\bar{Q}_{prev}$ |
| ↑   | 0 | 0          | 1                |
| ↑   | 1 | 1          | 0                |

(Also called master-slave flip flop, edge-triggered flip flop or positive edge triggered flip flop)



- Q. Ben Bitdiddle applies the D and CLK inputs shown to D Latch & D Flip Flop. Determine the output, Q, for both devices

A.



- Q. How many transistors are needed to build D-Flip flop & D-latch?

A. S-R latch is made of 2 NOR Gates  $\Rightarrow 2 \times 4 = 8$  Transistors

AND Gate is made of NAND and NOR Gate  $\Rightarrow 4 + 2 = 6$  Transistors

So, D Latch is made of S-R Latch, 2 AND gates, NOT Gate  $\Rightarrow 8 + (2 \times 6) + 2 = 22$

D Flip flop is made of 2 D latches, 1 NOT gate  $\Rightarrow (2 \times 22) + 2 = 46$  Transistors

## Register



## Enabled Flip Flop

- It adds another input EN or ENABLED to determine if data is loaded on clock edge
- When EN is true, behaves like ordinary D flip flop
- When EN is false, ignores clock & retains its state
- It is used to load a new value into a flip-flop only some time instead of every clock edge



## Resettable Flip flop

- It adds another input RESET
- When RESET is false, behaves like ordinary D flip flop
- When RESET is true, ignores D & resets output to 0
- Used to force a known state (0) into all flip-flops when turned on
- 2 Types : SYNCHRONOUS and ASYNCHRONOUS

↳ Resets themselves only on rising edge of CLK



If  $\overline{\text{RESET}} = \text{FALSE} \Rightarrow 0$  is forced on flip flop input

If  $\overline{\text{RESET}} = \text{TRUE} \Rightarrow 0$  is passed on flip flop input

↳ Resets as RESET becomes TRUE independent of CLK



Q. Alyssa P Hacker encounters 3 misbegotten inverters who have tied themselves in a loop. The 3rd inverter's output is fed back to 1st inverter. Each inverter has a propagation delay of 1ns. Determine what the circuit does.



- A. If initially  $X=0$ , then  $Y=1$  &  $Z=0$  and  $X=1$  which is inconsistent. The circuit is unstable (or) astable.



- Q. Ben Bitdiddle designed a new D latch claiming to be better than the one in the figure because it has fewer gates. The truth table is given, based on which he derived boolean equations.  $Q_{prev}$  is obtained by feeding the output  $Q$ .

| CLK | D | $Q_{prev}$ | $Q$ |
|-----|---|------------|-----|
| 0   | 0 | 0          | 0   |
| 0   | 0 | 1          | 1   |
| 0   | 1 | 0          | 0   |
| 0   | 1 | 1          | 1   |
| 1   | 0 | 0          | 0   |
| 1   | 0 | 1          | 0   |
| 1   | 1 | 0          | 1   |
| 1   | 1 | 1          | 1   |



$$Q = \text{CLK} \cdot D + \overline{\text{CLK}} \cdot Q_{prev}$$



Figure shows that the circuit has race condition that causes it to fail when certain gates are slower than others.

A.

If  $\text{CLK} = D = 1$ , Latch is transparent & passes D through to make  $Q = 1$ .

Now  $\text{CLK}$  falls, Latch should remember its old value, Keeping  $Q = 1$ .

However, suppose delay through inverter from  $\text{CLK}$  to  $\overline{\text{CLK}}$  is rather long compared to delays of AND and OR.

Then  $N_1$  &  $Q$  may both fall before  $\overline{\text{CLK}}$  rises, Then  $N_2$  will never rise &  $Q$  becomes stuck at 0.

## Synchronous Sequential Circuits

- Loops called cyclic paths are sequential circuits, in which outputs are fed back to inputs
- Usually they have undesirable races or unstable behaviour & analyzing them is very time consuming
- To avoid such problems, designers break cyclic paths by inserting registers which contain the state of system only at clock edge, so we say state is synchronized to clock
- If clock is sufficiently slow, so that inputs to all registers settle before next clock edge, all races are eliminated
- This way of adopting usage of registers is the proper definition of synchronous sequential circuits

## Rules of Synchronous Sequential Circuit

- 1) Every circuit element is either a register or combinational circuit
  - 2) At least one circuit element is a register
  - 3) All registers receive same clock signal
  - 4) Every cyclic path contains atleast 1 register
- Simplest synchronous sequential circuit = flip-flop → 1 input (D), 1 clock (CLK), 1 output (Q), 2 states (0,1)
- 2 common synchronous sequential circuits ⇒ Finite State Machines & Pipelines

Q. Which of the following are synchronous sequential circuit



- A.
- Combinational (No registers)
  - Simple Sequential
  - Neither Combinational nor Synchronous Sequential (Because Latch isn't register or combinational)
  - Synchronous Sequential (FSM)
  - Synchronous Sequential (FSM)
  - Neither Combinational nor Synchronous Sequential (Cyclic path without any register in path)
  - Synchronous Sequential (Pipeline)
  - Not Synchronous Sequential (2nd register receives different clock signal than 1st cuz of inverter delays)

## Finite State Machines

- A circuit with  $K$  registers can be in one of a finite number ( $2^K$ ) of unique states
- FSM has  $M$  inputs,  $N$  outputs &  $K$  bits of state
- Also receives a clock & optionally, reset signal
- FSM consists 2 blocks of combinational logic
  - next state logic
  - register
- On each clock edge, FSM advances to next state
- There are 2 classes of FSM's :
  - Moore Machines - Output depends only on current state of machines
  - Mealy Machines - Output depends on both current state & current inputs



Q. Consider a traffic light intersection, where a traffic light must be inverted. The rule is that they can't go in perpendicular paths and can only go straight from the signal.

A. We can install 2 traffic sensors,  $T_A$  &  $T_B$ . They must indicate TRUE if traffic is present FALSE if traffic is absent

We can also install 2 traffic lights,  $L_A$  &  $L_B$  which receive input to tell whether it should be green, yellow, red

So FSM has 2 inputs  $\rightarrow T_A$  &  $T_B$   
2 outputs  $\rightarrow L_A$  &  $L_B$



Let's provide a clock with 5-second period, lights change for every rising edge

Also a RESET button so a physical technician can control it to a known initial state

When system is RESET, lights are GREEN at  $L_A$  & RED at  $L_B$

It is repeated till there's no traffic detected at  $T_A$ .

When  $T_A$  is FALSE,  $L_A$  is YELLOW, and then from S0 to S1.

After S1, it goes to S2,  $L_A$  is red,  $L_B$  is green,

$T_B$  is passed till it is FALSE to which it goes to S3 where  $L_B$  is yellow, and finally returning to S0



| Current State S | Inputs T <sub>A</sub> | T <sub>B</sub> | Next State S'  |
|-----------------|-----------------------|----------------|----------------|
| S <sub>0</sub>  | 0                     | X              | S <sub>0</sub> |
| S <sub>0</sub>  | 1                     | X              | S <sub>1</sub> |
| S <sub>1</sub>  | X                     | X              | S <sub>2</sub> |
| S <sub>2</sub>  | X                     | 0              | S <sub>3</sub> |
| S <sub>2</sub>  | X                     | 1              | S <sub>2</sub> |
| S <sub>3</sub>  | X                     | X              | S <sub>0</sub> |



The state transition diagram is abstract,  
it uses {S<sub>0</sub>, S<sub>1</sub>, S<sub>2</sub>, S<sub>3</sub>} and {red, yellow, green}

But to build real circuit, we must use binary encodings

Like

| State          | Binary Encoding |
|----------------|-----------------|
| S <sub>0</sub> | 00              |
| S <sub>1</sub> | 01              |
| S <sub>2</sub> | 10              |
| S <sub>3</sub> | 11              |

  

| Output | Binary Encoding |
|--------|-----------------|
| Green  | 00              |
| Yellow | 01              |
| Red    | 10              |

→ Updated State Transition State using binary encodings

| Current State S <sub>1</sub><br>S <sub>0</sub> | Inputs T <sub>A</sub> T <sub>B</sub> | Next State S' <sub>1</sub><br>S' <sub>0</sub> |
|------------------------------------------------|--------------------------------------|-----------------------------------------------|
| 0 0                                            | 1 X                                  | 0 0                                           |
| 0 0                                            | 0 X                                  | 0 1                                           |
| 0 1                                            | X X                                  | 1 0                                           |
| 1 0                                            | X 0                                  | 1 1                                           |
| 1 0                                            | X 1                                  | 1 0                                           |
| 1 1                                            | X X                                  | 0 0                                           |

$$\Rightarrow S'_1 = \overline{S}_1 S_0 + S_1 \overline{S}_0 \overline{T}_B + S_1 \overline{S}_0 T_B = S_1 \overline{S}_0 + \overline{S}_1 S_0 = S_1 \oplus S_0$$

$$S'_0 = \overline{S}_1 \overline{S}_0 \overline{T}_A + S_1 \overline{S}_0 \overline{T}_B$$

→ Truth Table for Outputs

| Current State S <sub>1</sub> S <sub>0</sub> | Outputs L <sub>A1</sub> L <sub>A0</sub> L <sub>B1</sub> L <sub>B0</sub> |
|---------------------------------------------|-------------------------------------------------------------------------|
| 0 0                                         | 0 0 1 0                                                                 |
| 0 1                                         | 0 1 1 0                                                                 |
| 1 0                                         | 1 0 0 0                                                                 |
| 1 1                                         | 1 0 0 1                                                                 |

$$\Rightarrow L_{A1} = S_1 \overline{S}_0 + S_1 S_0 = S_1$$

$$L_{A0} = \overline{S}_1 S_0$$

$$L_{B1} = \overline{S}_1 \overline{S}_0 + \overline{S}_1 S_0 = \overline{S}_1$$

$$L_{B0} = S_1 S_0$$

## → Diagrammatic Representation



## → Timing Diagram



## State Encodings

- Like the previous example where we used arbitrary encodings, different choices would give different outputs.
- But we can get the set of possible encodings & select a reasonable one by using Computer-Aided Design
- There are 2 choices of state encoding

Binary Encoding ↘ ↗ One-hot Encoding

Each state is represented as binary number

Separate bit of state is used for each state & one bit is hot or TRUE at any time hence, the name

FSM with 3 states can be stored as 001, 010, 100 but this requires more flip-flops but output logic is simpler, hence less gates

Q. Design a counter that doesn't take any input, but has 1 output, such that it counts upto 3 and resets with binary & one-hot encoding



A

| Current State | Next State |
|---------------|------------|
| S0            | S1         |
| S1            | S2         |
| S2            | S0         |

| Current State | Output |
|---------------|--------|
| S0            | 1      |
| S1            | 0      |
| S2            | 0      |

| State | OHC      | BC    |
|-------|----------|-------|
| S0    | S2 S1 S0 | S1 S0 |
| S1    | S0 1 0   | 0 1   |
| S2    | 1 0 0    | 1 0   |

| $s'_2$ | $s'_1$ | $s'_0$ |
|--------|--------|--------|
| 0      | 1      | 0      |
| 1      | 0      | 0      |
| 0      | 0      | 1      |

$$\begin{aligned}
 s'_2 &= s_1 \\
 s'_1 &= s_0 \\
 s'_0 &= s_2 \\
 y &= s_0
 \end{aligned}
 \quad
 \begin{aligned}
 s'_1 &= \bar{s}_1 s_0 \\
 s'_0 &= \bar{s}_1 \bar{s}_0 \\
 y &= \bar{s}_1 \bar{s}_0
 \end{aligned}
 \quad
 \begin{array}{|c|c|} \hline s'_1 & s'_0 \\ \hline 0 & 1 \\ 1 & 0 \\ 0 & 0 \\ \hline \end{array}$$



### Moore & Mealy Machines

- In state transition diagrams for Moore machines, outputs labeled in circles
- In state transition diagrams for Mealy machines, outputs labeled on arcs

Q. Alyssa P Hacker owns a pet robotic snail with an FSM brain. The snail crawls from left to right along a paper tape containing a sequence of 0's & 1's. On each clock cycle, the snail crawls to the next bit. The snail smiles when the last 2 bits that it has crawled over are 01. Design the FSM to compute when the snail should smile. The input A is the bit underneath the snail's antennae. The output Y is TRUE when the snail smiles. Compare Moore & Mealy state machine designs. Sketch a timing diagram for each machine showing the input, states, & output as Alyssa's snail crawls along the sequence 010001101111:

A.

Moore:



| Current State<br>S | Input<br>A | Next State<br>S' |
|--------------------|------------|------------------|
| S0                 | 0          | S1               |
| S0                 | 1          | S0               |
| S1                 | 0          | S1               |
| S1                 | 1          | S2               |
| S2                 | 0          | S1               |
| S2                 | 1          | S0               |

In terms of state encodings,

| Current State<br>S <sub>1</sub> S <sub>0</sub> | Input<br>A | Next State<br>S' <sub>1</sub> S' <sub>0</sub> |
|------------------------------------------------|------------|-----------------------------------------------|
| 0 0                                            | 0          | 0 1                                           |
| 0 0                                            | 1          | 0 0                                           |
| 0 1                                            | 0          | 0 1                                           |
| 0 1                                            | 1          | 1 0                                           |
| 1 0                                            | 0          | 0 1                                           |
| 1 0                                            | 1          | 0 0                                           |



| Current State<br>S | Input<br>A | Next State<br>S' | Output<br>Y |
|--------------------|------------|------------------|-------------|
| S0                 | 0          | S1               | 0           |
| S0                 | 1          | S0               | 0           |
| S1                 | 0          | S1               | 0           |
| S1                 | 1          | S0               | 1           |

In terms of state encodings

| Current State<br>S | Input<br>A | Next State<br>S' | Output<br>Y |
|--------------------|------------|------------------|-------------|
| S0                 | 0          | S1               | 0           |
| S0                 | 1          | S0               | 0           |
| S1                 | 0          | S1               | 0           |
| S1                 | 1          | S0               | 1           |



## Procedure to design FSM

- 1) Identify inputs & outputs
- 2) Sketch state transition diagram
- 3) For Moore machine, write state transition table & write output table
- 4) For Mealy machine, write combined state transition & output table
- 5) Select state encodings
- 6) Write boolean equations for the next state & output logic
- 7) Sketch the circuit schematic

## Deriving an FSM from a Schematic

- Deriving the state transition diagram from a schematic follows nearly the reverse process of FSM design.
- This process can be necessary  
ex: reverse engineering somebody else's system
- Procedure to derive an FSM from its circuit
  - 1) Examine circuit by stating inputs, outputs & state bits
  - 2) Write next state & output tables
  - 3) Create next state & output tables
  - 4) Reduce the next state table to eliminate unreachable states
  - 5) Assign each valid state bit combination a name
  - 6) Rewrite next state & output tables with state names
  - 7) Draw state transition diagram
  - 8) State in words what the FSM does

**Q.** Alyssa P Hacker arrives home, but her keypad lock has been rewired and her old code no longer works. A piece of paper is taped to it showing the circuit diagram in Figure. Alyssa thinks the circuit could be a finite state machine & decides to derive the state transition diagram to see if it helps her get in the door.

**A.** input is  $A_1, A_0$ , output is Unlock  
Moore machine (Output depends on state bits)  
Next state & output equations  $\Rightarrow$

$$S'_1 = S_0 \bar{A}_1 A_0$$

$$S'_0 = \bar{S}_1 \bar{S}_0 A_1 A_0$$

Unlock =  $S_1$

Output Table  $\Rightarrow$

| $S_1$ | $S_0$ | Output |
|-------|-------|--------|
| 0     | 0     | 0      |
| 0     | 1     | 0      |
| 1     | 0     | 1      |
| 1     | 1     | 1      |



Next State table  $\Rightarrow$

| $S_1$ | $S_0$ | $A_1$ | $A_0$ | $S'_1$ | $S'_0$ |
|-------|-------|-------|-------|--------|--------|
| 0     | 0     | 0     | 0     | 0      | 0      |
| 0     | 0     | 0     | 1     | 0      | 0      |
| 0     | 0     | 1     | 0     | 0      | 0      |
| 0     | 0     | 1     | 1     | 0      | 1      |
| 0     | 1     | 0     | 0     | 0      | 0      |
| 0     | 1     | 0     | 1     | 1      | 0      |
| 0     | 1     | 1     | 0     | 0      | 0      |
| 0     | 1     | 1     | 1     | 0      | 0      |
| 1     | 0     | 0     | 0     | 0      | 0      |
| 1     | 0     | 0     | 1     | 0      | 0      |
| 1     | 0     | 1     | 0     | 0      | 0      |
| 1     | 0     | 1     | 1     | 0      | 0      |
| 1     | 1     | 0     | 0     | 0      | 0      |
| 1     | 1     | 0     | 1     | 1      | 0      |
| 1     | 1     | 1     | 0     | 0      | 0      |
| 1     | 1     | 1     | 1     | 0      | 0      |



Since  $S_{1:0} = 11$  can never come from previous states, it is unused & unreachable because of the FSM's design. So we can ignore it  
And for  $S_{1:0} = 10$ , next state is always  $S_{1:0} = 00$ , so don't cares are inserted

| $S_1$ | $S_0$ | $A_1$ | $A_0$ | $S'_1$ | $S'_0$ |
|-------|-------|-------|-------|--------|--------|
| 0     | 0     | 0     | 0     | 0      | 0      |
| 0     | 0     | 0     | 1     | 0      | 0      |
| 0     | 0     | 1     | 0     | 0      | 0      |
| 0     | 0     | 1     | 1     | 0      | 1      |
| 0     | 1     | 0     | 0     | 0      | 0      |
| 0     | 1     | 0     | 1     | 1      | 0      |
| 0     | 1     | 1     | 0     | 0      | 0      |
| 1     | 0     | x     | x     | 0      | 0      |

| $S_1$ | $S_0$ | O/P |
|-------|-------|-----|
| 0     | 0     | 0   |
| 0     | 1     | 0   |
| 1     | 0     | 1   |

Reduced Output Table



Reduce Next State Table

## Timing of Sequential Logic

- Sampling is the process where a flip-flop copies input to output on rising edge of clock
- A sequential element has an aperture time (some period of time) around the clock edge during which input must be stable for flip-flop to produce well-defined output.
- The aperture of sequential element is defined by setup time & hold time before & after clock edge

Static discipline limits to use logic levels within forbidden zone

Dynamic discipline limits to use logic levels outside forbidden zone

↳ When clock rises, the output may start to change after clock-to-Q contamination delay & must settle to final value within clock-to-Q propagation delay.

But for circuit to operate properly, there must be stabilized at least for some setup time before rising edge of clock.

Even after rising edge of clock, input must remain stable for some hold time

So, Dynamic Discipline states "inputs of synchronous sequential circuit must be stable during setup & hold aperture time around clock edge"

## System Timing

- Clock period (or) Cycle Time,  $T_c$ , is time b/w rising edges of a repetitive clock signal
- $f_c = \frac{1}{T_c}$  ⇒ Clock frequency

## Setup time Constraint

- Minimum Clock period  $\Rightarrow T_c \geq \text{Clock to Q Propagation Delay} + \text{Propagation Delay} + \text{setup delay} \rightarrow t_{\text{pd}} + t_{\text{pcq}} + t_{\text{setup}}$
- $t_{\text{pd}} \leq T_c - (t_{\text{pcq}} + t_{\text{setup}})$

↳ Setup Time Constraint

## Hold Time Constraint

- Clock to Q Contamination Delay + Contamination Delay  $\geq$  Hold Delay
  - $\hookrightarrow t_{\text{ccq}}$
  - $\hookrightarrow t_{\text{cd}}$
  - $\hookrightarrow t_{\text{hold}}$
- $\Rightarrow t_{\text{cd}} \geq t_{\text{hold}} - t_{\text{ccq}}$

↳ Hold Time Constraint

Q. Ben Bitdiddle designed the following circuit. According to the datasheets, Flip flops have  $t_{cq} = 30\text{ps}$ ,  $t_{pd} = 80\text{ps}$ ,  $t_{setup} = 50\text{ps}$ ,  $t_{hold} = 60\text{ps}$ . Each logic gate has  $t_{pd} = 40\text{ps}$  &  $t_{cd} = 25\text{ps}$ . Help Ben determine max clock frequency & whether any hold time violations could occur?

A.



Figure: Timing diagram:  
(a) general case,  
(b) critical path,  
(c) short path



$$T_c \geq t_{cq} + (\text{overall } t_{pd}) + t_{setup}$$

$$T_c \geq 80 + 3(40) + 50$$

$$T_c \geq 250\text{ ps}$$

$$f_c = \frac{1}{T_c} = \frac{1}{250\text{ p}} = 4 \times 10^9$$

$$t_{hold} \leq t_{cq} + t_{cd} = 30 + 25 = 55\text{ ps}$$

but given  $t_{hold} = 60\text{ ps}$ , Hence there is hold time variation

Q. How can the previous circuit be fixed?

A. We can add a buffer to increase gate delays

The critical path will be unaffected, So,  $f_c = 4\text{GHz}$

Short path will change,  $t_{hold} \leq t_{cq} + 2t_{cd} = 30 + 2(25) = 80\text{ps}$

This is after 60ps hold time has elapsed, so circuit operates correctly now.



# ~~~ CADD ~ U-3 ~~~

## Introduction

- For higher productivity, designers used a lower level of abstraction by using a CAD Tool to produce optimized gates
- Specifications are given in Hardware description Language

↓  
System Verilog      VHDL  
↓

## Bitwise Operators

- They act on single-bit signals or multi-bit buses

ex: input logic a;

ex: input logic [3:0] a;

- INVERTERS

System verilog ⇒

```
module inv(input logic [3:0] a,
            output logic [3:0] y);
    assign y = ~a;
endmodule
```

4 bit bus ⇒ Little Endian Order  
 3 2 1 0  
 ↑  
 MSB      LSB



→ Synthesized Circuit

- Usually endianness is irrelevant. It matters only for operators like addition etc.,

- LOGIC GATES

System verilog ⇒

```
module gates(input logic[3:0] a, b,
              output logic [3:0] y1, y2,
              y3, y4, y5);

    /* five different two-input logic
       gates acting on 4-bit buses */
    assign y1=a & b; // AND
    assign y2=a | b; // OR
    assign y3=a ^ b; // XOR
    assign y4=-(a & b); // NAND
    assign y5=-(a | b); // NOR
endmodule
```

&, |, ^, ~ are operators

a, b, y1, y2, y3, y4 are operands



## Comments and White Spaces

- Both the languages aren't worried about spaces, tabs, line breaks but for readability we use proper indenting
- For both languages,

Multiline Comments  $\Rightarrow$  `/* */` (Just like in C!)

Single Line Comments  $\Rightarrow$  `//`

- SystemVerilog is CASE-SENSITIVE
- VHDL is NOT CASE-SENSITIVE

## Reduction Operators

- Implies a multiple-input gate acting on a single bus
- EIGHT INPUT AND

System verilog  $\Rightarrow$

```
module and8(input logic [7:0] a,
             output logic y);
    assign y = a[7];
    // a[7] is much easier to write than
    // assign y = a[7] & a[6] & a[5] & a[4] &
    //         a[3] & a[2] & a[1] & a[0];
endmodule
```



- Analogous reduction operator exists for OR, XOR, NAND, NOR and XNOR

## Internal Variables

- Usually its better to write complex functions into intermediate steps

- For a FULL ADDER,

$$\left. \begin{array}{l} S(\text{sum}) = A \oplus B \oplus C \\ C(\text{carry}) = AB + AC_{in} + BC_{in} \end{array} \right\} \quad \begin{array}{l} P = A \oplus B \\ G = AB \end{array} \quad \begin{array}{l} \Rightarrow S = P \oplus C_{in} \\ C = G + PC_{in} \end{array}$$

internal variables

System verilog  $\Rightarrow$

```
module fulladder(input logic a, b, cin,
                  output logic s, cout);
    logic p, g;
    assign p = a & b;
    assign g = a & cin;
    assign s = p ^ cin;
    assign cout = a | (p & cin);
endmodule
```

Synthesized Circuit



## Operator Precedence

→ System Verilog precedence order (highest to lowest)

↓

- NOT ( $\sim$ )
- MUL (\*), DIV (/), MOD (%)
- PLUS (+), MINUS (-)
- Logical Left Shift (<<), Logical Right Shift (>>)
- Arithmetic Left Shift (<<<), Arithmetic Right Shift (>>>)
- Relative Comparison (<, <=, >, >=)
- Equality Comparison (==, !=)
- AND (&), NAND ( $\sim \&$ )
- XOR (^), XNOR ( $\sim ^$ )
- OR (!), NOR ( $\sim !$ )
- Conditional (?:)

## Numbers

→ Numbers can be binary, octal, decimal or hexadecimal

→ The format for declaring constants in System Verilog is

$\frac{N}{\downarrow}$   $\frac{B}{\downarrow}$   $\frac{\text{Value}}{\downarrow}$   
Size Base Value

ex:  $3'b110 \Rightarrow 3$  bits, base 2, value = 6

$8'b1010_1111 \Rightarrow 8$  bits, base 2, value = 171 (underscores are ignored, to break values into small chunks)

$8'b11 \Rightarrow 8$  bits, base 2, value = 3

$'b11 \Rightarrow ?$  bits, base 2, value = 3 (value stored = 000...0011)

$3'd6 \Rightarrow 3$  bits, base 10, value = 6

$6'o42 \Rightarrow 6$  bits, base 8, value = 34

$8'hAB \Rightarrow 8$  bits, base 16, value = 171

$42 \Rightarrow ?$  bits, base 10, value = 42

Read All Examples G



## Z's and X's

→ Z indicates floating value (Used in Tristate Buffer)  
X indicates invalid logic level

If bus is simultaneously driven to 0

& 1 by 2 enabled Tristate buffers

resulting X, indicating contention

| c | a | f |
|---|---|---|
| 0 | 0 | z |
| 0 | 1 | z |
| 1 | 0 | 0 |
| 1 | 1 | 1 |

System verilog ⇒

```
module tristate(input logic [3:0] a,
                 input logic en,
                 output tri [3:0] y);
    assign y = en ? a : 4'bzzz;
endmodule
```

z = floating value

Synthesized Circuit



Logic signals can have only single driver, Tristate buses have multiple drivers, so declared as net  
1 driver is active & net takes that value tri trireg  
But when none are active, tri takes z, trireg retains previous value

system writes it as 4'bzzzz  
padding

→ How SystemVerilog deals with X's & Z's

| & | A | values of A |   |   |   |
|---|---|-------------|---|---|---|
| B |   | 0           | 1 | z | x |
| 0 |   | 0           | 0 | 0 | 0 |
| 1 |   | 0           | 1 | x | x |
| z |   | 0           | x | x | x |
| x |   | 0           | x | x | x |

Values of B

### Bit Swizzling

- It refers to manipulating the bits of a signal or group of signals for purposes of combining, selecting or rearranging them.
- We must operate on a subset of a bus (or) concatenate multiple signals together to form a wider bus

→  $\{n\}$  is used to repeat a signal/value n times

System verilog  $\Rightarrow$  `assign y = {c[2:1], {3{d[0]}}, c[0], 3'b101};`

→  $\{ \}$  concatenates busses

assume  $c = 3'b110$  &  $d = 1'b1$

then,  $y = 11-111-0-101$

$\downarrow \quad \downarrow \quad \downarrow \quad \downarrow$

$c[2:1] \quad 3\{d[0]\} \quad c[0] \quad 3'b101$

$y = 9'b111110101$

### Delays

- Delays can be specified in HDL to control the timing during simulation
  - Delays help in
    - Predicting Circuit Behaviour (we can simulate & find how fast a circuit will operate)
    - Debugging (Help understand cause & effect in complex designs)
  - Delays are ignored during synthesis of circuit
- Actual Delays in hardware depends on  $t_{pd}$  and  $t_{cd}$  specifications, not on numbers in HDL code

ex: take  $y = \bar{a}\bar{b}\bar{c} + a\bar{b}\bar{c} + a\bar{b}c$

→ Waveform with delays

System Verilog  $\Rightarrow$

```
timescale 1ns/1ps
module example(input logic a, b, c,
                output logic y);
    logic ab, ac, bc, abc;
    assign #1 ab=ab, ac=ac, bc=bc;
    assign #2 y=(ab&bc)&abc;
    assign #3 y=(ab&bc)&c;
    assign #4 y=x#1 | y#1 | y#1;
endmodule
```



→ timescale unit / precision  
↳ 1ns ↳ 1ps

# is used to indicate no. of units of delays

## Truth Tables with Don't Cares

→ They are used for logic simplification

System Verilog →

logic is evaluated whenever  
any of its input changes  
(always combinational)

```
module priority_casez(input logic[3:0] a,
                      output logic[3:0] y);
  always_comb
    casez(a)
      4'b1???: y = 4'b1000;
      4'b01???: y = 4'b0100;
      4'b001?: y = 4'b0010;
      4'b0001: y = 4'b0001;
      default: y = 4'b0000;
    endcase
endmodule
```

casez statement acts like a case  
statement except that it also recognizes  
? as 'Don't care'

So, by logic,

if  $a[3] = 1 \Rightarrow y = 4'b1000$   
else if  $a[3:2] = 01 \Rightarrow y = 4'b0100$   
else if  $a[3:1] = 001 \Rightarrow y = 4'b0010$   
else if  $a[3:0] = 0001 \Rightarrow y = 4'b0001$   
else  $y = 4'b0000$

## Blocking and Nonblocking Assignments

→ Blocking and Nonblocking Assignments Guidelines

1) Use always-ff @ (posedge clk) and non-blocking assignments to model synchronous sequential logic

```
always_ff@(posedge clk)
begin
  #1 void; // nonblocking
  q = #1 z; // nonblocking
end
```

( $\leq \rightarrow$  non-blocking)

2) Use continuous assignments to model simple combinational logic

```
assign y = s ? d1 : d0;
```

3) Use always-comb & blocking assignments to model more complicated combinational logic where the always statement is helpful

```
always_comb
begin
  p = a * b; // blocking
  g = a & b; // blocking
  s = p ^ cin;
  cout = g | (p & cin);
end
```

( $= \rightarrow$  blocking)

4) Don't make assignments to the same signal in more than one always statement or continuous assignment statement

- Combinational circuits are modelled better using blocking assignments
- Sequential circuits are modelled better using non-blocking assignments

### CORRECT MODELLING USING BLOCKING ASSIGNMENT

ex:

```
module fulladder(input logic a, b, cin,
                  output logic s, cout);
    logic p, g;
    always_comb
    begin
        p = a & b; // blocking
        g = a & b; // blocking
        s = p ^ cin; // blocking
        cout = g || (p & cin); // blocking
    end
endmodule
```

Initially everything is 0  
After some time,  $a$  changes to 1,  $always$  gets triggered

Order of Evaluation :

- 1)  $p \leftarrow 1 \oplus 0 = 1$
- 2)  $g \leftarrow 1 \cdot 0 = 0$
- 3)  $s \leftarrow 1 \oplus 0 = 1$
- 4)  $cout \leftarrow 0 + 1 \cdot 0 = 0$

### NON-RECOMMENDED MODELLING USING NON-BLOCKING ASSIGNMENT

```
// nonblocking assignments (not recommended)
module fulladder(input logic a, b, cin,
                  output logic s, cout);
    logic p, g;
    always_comb
    begin
        p = a & b; // nonblocking
        g = a & b; // nonblocking
        s = p ^ cin;
        cout = g || (p & cin);
    end
endmodule
```

In contrast if we use non-blocking for combinational,  
let  $a$  rise from 0 to 1,  $b \& cin = 0$

- 1)  $p \leftarrow 1 \oplus 0 = 1$
- 2)  $g \leftarrow 1 \cdot 0 = 0$
- 3)  $s \leftarrow 0 \oplus 0 = 0$
- 4)  $cout \leftarrow 0 + 0 \cdot 0 = 0 \rightarrow \text{which is NOT TRUE!}$

But  $p$  changes from 0 to 1, triggering  $always$  and evaluates second time,

- 5)  $p \leftarrow 1 \oplus 0 = 1$
- 6)  $g \leftarrow 1 \cdot 0 = 0$
- 7)  $s \leftarrow 1 \oplus 0 = 1$

8)  $cout \leftarrow 0 + 1 \cdot 0 = 0 \rightarrow \text{Eventually reaching TRUE ans.}$

Even though same hardware,  $always$  is evaluated twice slowing down the simulation  
Also, HDL produces wrong result if intermediate values are forgotten in sensitivity list.

### CORRECT MODELLING USING NON-BLOCKING ASSIGNMENT

ex:

```
module sync(input logic clk,
            input logic d,
            output logic q);
    logic n1;
    always_ff @(posedge clk)
    begin
        n1 <= d; // nonblocking
        q <= n1; // nonblocking
    end
endmodule
```

On rising edge of  $clk$ ,  $d$  is copied to  $n1$  at same time as  $n1$  is copied to  $q$

Initially  $\Rightarrow d = 0, n1 = 1, q = 0$

Later  $\Rightarrow n1 \leftarrow d = 0$

$q \leftarrow n1 = 0$

### NON-RECOMMENDED MODELLING USING BLOCKING ASSIGNMENT

```
// Bad implementation of a synchronizer using blocking
// assignments
module sync(input logic clk,
            input logic d,
            output logic q);
    logic n1;
    always_ff @ (posedge clk)
    begin
        n1 = d; // blocking
        q = n1; // blocking
    end
endmodule
```

On rising edge of  $clk$ ,  $d$  is copied to  $n1$  but new value of  $n1$  is copied to  $q$

Initially  $\Rightarrow d = 0, n1 = 1, q = 0$

Later  $\Rightarrow n1 \leftarrow d = 0$

$q \leftarrow n1 = 0$

## Finite State Machines

→ FSM's consists of state register & 2 blocks of combinational logic to compute the next state & output, given current state & input

```
module divideby3FSM(input logic clk,
                      input logic reset,
                      output logic y);
    typedef enum logic [1:0] {S0, S1, S2} statetype;
    statetype [1:0] state, nextstate;

    // state register
    always_ff @(posedge clk, posedge reset)
        if (reset) state <= S0;
        else state <= nextstate;

    // next state logic
    always_comb
        case (state)
            S0:   nextstate <= S1;
            S1:   nextstate <= S2;
            S2:   nextstate <= S0;
            default: nextstate <= S0;
        endcase

    // output logic
    assign y = (state == S0);
endmodule
```

divide-by-3 FSM

→ **typedef** - statement defines **statetype** to be a two-bit **logic** value with 3 possibilities : S0, S1 or S2

S0: 00, S1: 01, S2: 10

→ Encodings must be set by user

In the example, we can use 3-bit one-hot values

**typedef enum logic [2:0] {S0 = 3'b001, S1 = 3'b010, S2 = 3'b100}**

→ **case** is used to define state transition table

→ Output y is 1 when state is S0

The equality comparison  $a == b$  evaluates to 1 if  $a = b$  & 0 otherwise

The inequality comparison  $a != b$  does the inverse, evaluating to 1 if  $a != b$ , 0 otherwise

→ Synthesis tools produce a block-diagram & state transition diagram for state machines  
It doesn't show logic gates, inputs or outputs on the arcs & states



## PATTERN RECOGNIZER MOORE FSM

```
module patternMoore(input logic clk,
                     input logic reset,
                     input logic a,
                     output logic y);

    typedef enum logic [1:0] {S0, S1, S2} statetype;
    statetype state, nextstate;

    // state register
    always_ff @(posedge clk, posedge reset)
        if (reset) state <= S0;
        else      state <= nextstate;

    // next state logic
    always_comb
        case (state)
            S0: if (a) nextstate=S0;
                  else nextstate=S1;
            S1: if (a) nextstate=S2;
                  else nextstate=S1;
            S2: if (a) nextstate=S0;
                  else nextstate=S1;
            default: nextstate=S0;
        endcase

    // output logic
    assign y=(state==S2);
endmodule
```

*Non-blocking for sequential (state register)*

*Blocking for combinational (next state)*

## PATTERN RECOGNIZER MEALY FSM

```
module patternMealy(input logic clk,
                     input logic reset,
                     input logic a,
                     output logic y);

    typedef enum logic [S0, S1] statetype;
    statetype state, nextstate;

    // state register
    always_ff @(posedge clk, posedge reset)
        if (reset) state <= S0;
        else      state <= nextstate;

    // next state logic
    always_comb
        case (state)
            S0: if (a) nextstate=S0;
                  else nextstate=S1;
            S1: if (a) nextstate=S0;
                  else nextstate=S1;
            default: nextstate=S0;
        endcase

    // output logic
    assign y=(a & state==S1);
endmodule
```



## Data Types

- Verilog primarily had 2 types : **reg** and **wire**
  - ↳ Ultimately caused lot of confusion
- In verilog, if a signal appears on LHS of **=** or **<=**, it must be declared **reg**  
others are declared as **wire**

```
module flop(input
            input [3:0] d,
            output reg [3:0] q);
    always @ (posedge clk)
        q <= d;
endmodule
```

By default, input & output ports are declared **wire**  
Unless mentioned that they are **reg**

- SystemVerilog introduced **logic** which is a synonym for **reg** & avoids misleading users about whether it is actually a flip-flop
- SystemVerilog also relaxes rules on **assign** statements & hierarchical port instantiations so, **logic** is used even outside **always** block where **wire** would have been required
- But signals with multiple drivers are declared as **net** which allows SystemVerilog to generate error message instead of **x** when **logic** is accidentally connected to multiple drivers
- Most common type of **net** is called **wire** or **tri**
  - ↳ Used for single drivers
  - ↳ Used for multiple drivers
- But we already use **logic** for single drivers, So, **wire** is obsolete
- When net is driven to a certain value by 1 or more drivers, it takes up that value  
When undriven, it floats (**z**)  
When driven to different value (**0, 1, x**) by multiple drivers, it is in contention (**x**)

| Net Type | No Driver      | Conflicting Drivers  |
|----------|----------------|----------------------|
| tri      | <b>z</b>       | <b>x</b>             |
| trireg   | previous value | <b>x</b>             |
| tril     | <b>z</b>       | 0 if there are any 0 |
| trior    | <b>z</b>       | 1 if there are any 1 |
| trio     | <b>0</b>       | <b>x</b>             |
| trit     | <b>1</b>       | <b>x</b>             |

How net resolves differently when driven/undriven by multiple sources.

## Parameterized Modules

→ HDL permits usage of variable bit widths using parameterized modules.

ex: Parameterized N-Bit 2:1 Multiplexers

```
module mux2
  #(parameter width=8)
  (input logic [width-1:0] d0, d1,
   input logic [1:0] s,
   output logic [width-1:0] y);
  assign y = s ? d1 : d0;
endmodule
```

$\#(\text{parameter variable} = \text{x})$  allows to define parameters

```
module mux4_8(input logic [7:0] d0, d1, d2, d3,
               input logic [1:0] s,
               output logic [7:0] y);
  logic [7:0] low, hi;
  mux2 lowmux(d0, d1, s[0], low);
  mux2 himux(d2, d3, s[0], hi);
  mux2 outmux(low, hi, s[1], y);
endmodule
```

8 bit 4:1 MUX instantiates 3 2:1 MUX using their default widths

```
module mux4_12(input logic [11:0] d0, d1, d2, d3,
                input logic [1:0] s,
                output logic [11:0] y);
  logic [11:0] low, hi;
  mux2 #(12) lowmux(d0, d1, s[0], low);
  mux2 #(12) himux(d2, d3, s[0], hi);
  mux2 #(12) outmux(low, hi, s[1], y);
endmodule
```

But for 12-bit 4:1 MUX, we override the existing parameter value using  $\#()$  before instance name

→ NOTE: # ⇒ Delays

# (...) ⇒ Parameter Defining & Overriding



SYNTHESIZED 12-BIT 4:1 MUX

### ex: PARAMETERIZED N: $2^N$ DECODER

```
module decoder
#(parameter N=3)
  (input logic [N-1:0] a,
   output logic [2**N-1:0] y);
  always_comb
    begin
      y=0;
      y[a]=1;
    end
  endmodule
```

A larger DECODER would have to specify with cases

But parameterized code sets appropriate output bit to 1.  
(Decoder uses blocking assignments to set all bits to 0,  
then changes to appropriate bit to 1)

- HDL also provides `generate` Statement to produce variable amount of hardware depending on value of a parameter.

### ex: PARAMETERIZED N-INPUT AND GATE

```
module andN
#(parameter width=8)
  (input logic [width-1:0] a,
   output logic           y);
  genvar i;
  logic [width-1:0] x;
  generate
    assign x[0]=a[0];
    for(i=1; i<width; i=i+1) begin:forloop
      assign x[i]=a[i] & x[i-1];
    end
  endgenerate
  assign y=x[width-1];
endmodule
```

for statement loops through  $i = 1, 2, \dots, \text{width}-1$   
to produce many consecutive AND gates  
The begin in generate for loop must be  
followed by ":" and an arbitrary label  
ex: forloop



## Testbenches

- HDL module that is used to test another module called device under test (DUT)
- It contains statements to apply inputs to DUT & check that correct outputs are produced
- Input & desired output patterns are called test vectors
- Testbenches are simulated but not synthesizable

ex-

```
module testbench();
    logic a, b, c, y;
    // instantiate device under test
    sillyfunction dut(a, b, c, y);
    // apply inputs one at a time
    initial begin
        a=0; b=0; c=0; #10;
        c=1; #10;
        b=1; c=0; #10;
        c=1; #10;
        a=1; b=0; c=0; #10;
        c=1; #10;
        b=1; c=0; #10;
        c=1; #10;
    end
    endmodule
```

→ Initiate which device to test

→ Must be used only for testbenches & nowhere else.

→ After initial begin starts executing statements in its body

So, first is 000, wait 10 units of time, then 001, again wait and so on till all inputs are applied

- But checking for correct outputs is tedious & error prone. Its easy to determine the correct output now, but later on, even if minor changes are made, determining correct output becomes a hassle. Instead we can use self-checking testbench

```
module testbench2();
    logic a, b, c, y;
    // instantiate device under test
    sillyfunction dut(a, b, c, y);
    // apply inputs one at a time
    // checking results
    initial begin
        a=0; b=0; c=0; #10;
        assert(y === 1) else $error("000 failed.");
        c=1; #10;
        assert(y === 0) else $error("001 failed.");
        b=1; a=0; #10;
        assert(y === 0) else $error("010 failed.");
        c=1; #10;
        assert(y === 0) else $error("011 failed.");
        a=1; b=0; c=0; #10;
        assert(y === 1) else $error("100 failed.");
        c=1; #10;
        assert(y === 2) else $error("101 failed.");
        b=1; c=0; #10;
        assert(y === 0) else $error("110 failed.");
        c=1; #10;
        assert(y === 0) else $error("111 failed.");
    end
    endmodule
```

→ assert verifies if a specified condition is true.

If not, executes else.

\$error prints an error message describing assertion failure  
assert is ignored during synthesis

$==$ ,  $\neq$  effective for signals without  $x$  and  $z$

$==>$ ,  $\neq$  effective for signals with  $x$  and  $z$

$$(\text{sillyfunction} = \bar{a}\bar{b}\bar{c} + a\bar{b}\bar{c} + a\bar{b}c)$$

→ Writing code for each test vector also becomes tedious, especially for modules that require large number of vectors. Better approach is to place test vectors in a separate file

```
module testbench3();
    logic      clk, reset;
    logic      a, b, c, y, yexpected;
    logic[31:0] vectornum, errors;
    logic[3:0]  testvectors[10000:0];

    // instantiate device under test
    sillyfunction dut(a, b, c, y);

    // generate clock
    always
    begin
        clk=1#0; clk=0#0;
    end

    // at start of test, load vectors
    // and pulse reset
    initial
    begin
        $readmemb("example.tv", testvectors);
        vectornum=0; errors=0;
        reset=1#27; reset=0;
    end

    // apply test vectors on rising edge of clk
    always@(posedge clk)
    begin
        #1; {a, b, c, yexpected} = testvectors[vectornum];
    end

    // check results on falling edge of clk
    always@(negedge clk)
    begin
        if (~reset) begin // skip during reset
            if (y != yexpected) begin // check result
                $display("Error: Inputs=%b", {a, b, c});
                $display(" Outputs=%b (%b expected)", y, yexpected);
                errors=errors+1;
            end
            vectornum=vectornum+1;
            if (testvectors[vectornum]==4'bxx) begin
                $display("%d tests completed with %d errors",
                        vectornum, errors);
                $finish;
            end
        end
    end
endmodule
```

→ always/process generates a clock without sensitivity list which is continuously reevaluated

It reads test vectors from text file & pulses reset for 2 cycles.

example.tv  
000\_1  
001\_0  
010\_0  
011\_0  
100\_1  
101\_1  
110\_0  
111\_0

Overkill for such a simple circuit 😞  
But better for complex circuits

## Parallelism

→ The speed of a system is characterized by latency & throughput of information moving through it.



Q. Ben Bitdiddle is throwing a milk & cookies party to celebrate the installation of his traffic light controller. It takes 5 minutes to roll cookies & place them on his tray. It then takes 15 minutes for the cookies to bake in the oven. Once the cookies are baked, he starts another tray. What is Ben's throughput & latency for a tray?

A. Latency =  $5\text{min} + 15\text{min} = 20\text{ min} = \frac{1}{3}\text{ hour}$

Throughput =  $1\text{ tray}/20\text{ min} = 3\text{ trays/hr}$

→ Throughput can be improved by processing several tokens at same time, this is called parallelism. It comes in 2 forms :

**Spatial (parallelism)**  
Multiple copies of the hardware are provided so multiple tasks can be done at same time

& **temporal (pipelining)**

Task is broken into stages & Multiple tasks can be spread across stages Each task must pass through all stages but can go in any order, hence multiple tasks can overlap

Q. Ben Bitdiddle has 100 friends coming to his party & needs to bake cookies faster. He is considering using spatial and/or temporal parallelism

A. Spatial Parallelism : Ben asks Alyssa P Hacker to help & she has her own tray & oven

Latency = 20 mins

Throughput = 6 trays/hr

Temporal Parallelism : Ben gets another tray & starts rolling cookies on it after he puts the first tray in the oven

Throughput = 4 trays/hr



If Ben & Alyssa uses both techniques, they can bake 8 trays/hour

→ Consider a task with Latency  $L$

If there wasn't any parallelism, throughput is  $\frac{1}{L}$

→ For spatially parallel system with  $N$  copies of hardware, throughput is  $\frac{N}{L}$

→ For temporally parallel system, task ideally broken into  $N$  stages of equal length, throughput is also  $\frac{N}{L}$  & only 1 copy is required

But finding  $N$  steps of equal length is difficult, so, if longest step has latency  $L_1$ , the pipelined throughput is  $\frac{1}{L_1}$

Q. Assume a circuit with no pipelining. Critical path passes through 2, 3, 4.

Assume register has clock-to-Q propagation delay of 0.3ns & setup time 0.2ns

A.



$$T_c = 0.2 + 3 + 2 + 4 + 0.3 = 9.5 \text{ ns} \Rightarrow \text{latency}$$

$$\text{Throughput} = \frac{1}{9.5 \text{ ns}} = 105 \text{ MHz}$$

Q. Now assume same circuit partitioned into 2 stage pipeline by adding register

A.



$$\text{First stage} \Rightarrow 0.3 + 3 + 2 + 0.2 = 5.5 \text{ ns}$$

$$\text{Second stage} \Rightarrow 0.3 + 4 + 0.2 = 4.5 \text{ ns}$$

$$\text{Latency} \Rightarrow 2 \text{ cycles} \Rightarrow 2 \times 5.5 = 11 \text{ ns}$$

$$\text{Throughput} = \frac{1}{5.5 \text{ ns}} = 182 \text{ MHz}$$

Q. Now assume same circuit partitioned into 3 stage pipeline by adding register

A.



$$\text{First stage} \Rightarrow 0.2 + 3 + 0.3 = 3.5 \text{ ns}$$

$$\text{Second stage} \Rightarrow 0.2 + 2 + 0.3 = 2.5 \text{ ns}$$

$$\text{Third stage} \Rightarrow 0.2 + 4 + 0.3 = 4.5 \text{ ns}$$

$$\text{Latency} \Rightarrow 3 \text{ cycles} \Rightarrow 3 \times 4.5 = 13.5 \text{ ns}$$

$$\text{Throughput} = \frac{1}{4.5 \text{ ns}} = 222 \text{ MHz}$$

→ In conclusion, adding pipeline stage improves throughput in expense for latency

# Unit-4 Digital Building Blocks

## Arithmetic Circuits

- They are the central building blocks of computers
- They perform arithmetic functions like addition, subtraction, comparision, shift, multiplication, and division.

## Adders

### → Half adder :

→ Here we add 1 bit binary numbers

→ It has 2 inputs ( $A \& B$ ) and 2 outputs ( $S \& C_{out}$ )

$S = \text{Sum of } A \& B$

$C_{out} = \text{Carry Out}$

ex :  $1 + 1 = 2 \Rightarrow \text{can't be represented on binary as } 2, \text{ so } S = 0 \& C = 1$

→ Half adder is made of XOR and AND gate



| A | B | $C_{out}$ | S |
|---|---|-----------|---|
| 0 | 0 | 0         | 0 |
| 0 | 1 | 0         | 1 |
| 1 | 0 | 0         | 1 |
| 1 | 1 | 1         | 0 |

### → Full Adder :

→ In a single bit half adder,  $C_{out}$  can be used to display second bit

But for multi-bit addition, Half adder can't accept a  $C_{in}$  to perform addition in second column

So we use Full adder which accepts  $C_{in}$



1 bit Full Adder

| $C_{in}$ | A | B | $C_{out}$ | S |
|----------|---|---|-----------|---|
| 0        | 0 | 0 | 0         | 0 |
| 0        | 0 | 1 | 0         | 1 |
| 0        | 1 | 0 | 0         | 1 |
| 0        | 1 | 1 | 1         | 0 |
| 1        | 0 | 0 | 0         | 1 |
| 1        | 0 | 1 | 1         | 0 |
| 1        | 1 | 0 | 1         | 0 |
| 1        | 1 | 1 | 1         | 1 |

$$S = A \oplus B \oplus C_{in}$$

$$C_{out} = AB + AC_{in} + BC_{in}$$

### → Carry Propagate Adder (CPA)

- N-bit adder adds 2 N-bit inputs A & B and  $C_{in}$  to produce N-bit result S and carry out  $C_{out}$  which propagates into next bit.
- It is similar to full adder but A, B, S are buses rather than single bits
- Common Implementations are :
  - i) Ripple-Carry adders
  - ii) Carry-Lookahead adders
  - iii) Prefix Adders



### → Ripple Carry Adder

- Ripple carry adder is built by chaining together N full adders
- It carries ripples through the carry chain creating a delay
- ex: For a 32-bit addition,  $S_{31}$  depends on  $C_{30}$ ,  $S_{30}$  depends on  $C_{29}$  and so on



$$t_{\text{ripple}} = N \cdot t_{\text{FA}}$$

delay of each F.A  
↳ N-bits

### → Carry-Lookahead Adder (CLA)

- CLA solves the problem of ripple carry adder by dividing the adder into blocks & providing circuitry to quickly determine  $C_{out}$  as soon as  $C_{in}$  is known.
- CLA's use generate (G) and propagate (P) signals that describe how a column or block determines the  $C_{out}$ .

→ Generate : Carry will definitely be produced in that column regardless of previous carries

Propagate : Carry is received from previous bit, it will pass through to the next bit.

- $i^{th}$  column of adder will generate carry if it produces  $C_{out}$  independent of  $C_{in}$  ( $A_i = B_i = 1$ )
- $i^{th}$  column of adder will propagate carry if it produces  $C_{out}$  when there is  $C_{in}$  ( $A_i = 1$  or  $B_i = 1$ )
- $i^{th}$  column of adder will generate carry out  $C_i$  if it generates carry  $G_i$

or propagates carry in  $P_i C_{i-1}$

$$\text{So, } C_i = A_i B_i + (A_i \oplus B_i) C_{i-1} = G_i + P_i C_{i-1}$$

- Block will generate carry if it produces carry out independent of carry in to block  $G_i$
- Block will propagate carry if it produces carry out whenever there is carry in to block  $P_i$

$$\text{So, } G_{3:0} = G_3 + P_3 (G_2 + P_2 (G_1 + P_1 G_0))$$

$$\text{and, } P_{3:0} = P_3 P_2 P_1 P_0$$

- Carry out can be computed as

$$C_i = G_{i:j} + P_{i:j} C_{j-1}$$





- All CLA blocks generate & propagate signals simultaneously
- N-bit adder divided into K-bit blocks has a delay

$$t_{CLA} = t_{pg} + t_{pg\_block} + \left(\frac{N}{K} - 1\right) t_{AND/OR} + K t_{FA}$$

delay of the individual generate/propagate gates to generate  $P_i:j$  &  $G_i:j$   
Single AND/or gates

Delay to find the generate/propagate signals  $P_i:j$  &  $G_i:j$  for K-bit block

Delay from  $Cin$  to  $Cout$  through final AND/OR logic of K-bit CLA block

Q. Compare the delays of a 32-bit ripple-carry adder and a 32-bit carry-lookahead adder with 4-bit blocks. Assume that each 2-input gate delay is 100ps & full adder delay is 300 ps

$$t_{ripple} = N t_{FA}$$

$$= 32 \times 300 \times 10^{-12} = 9.6 \text{ ns}$$

$$t_{CLA} = t_{pg} + t_{pg\_block} + \left(\frac{N}{K} - 1\right) t_{AND/OR} + K t_{FA}$$

$$= \left(100 + 600 + \left(\frac{32}{4} - 1\right) 200 + (4 \times 300)\right) \times 10^{-12} = 3.3 \text{ ns}$$

For 4 input AND/OR gate, 3 AND and 3 OR Gates according to this diagram

```
module adder #(parameter N=8)
  (input logic [N-1:0] a, b,
   input logic cin,
   output logic [N-1:0] s,
   output logic cout);
  assign cout, s = a+b+cin;
endmodule
```



→ Prefix Adder

- Prefix adder extends generate & propagate signals of carry-lookahead adder to perform faster addition
- Uses  $P_i$  and  $G_i$  signals to calculate carries more quickly by combining columns (bits) into groups and doubling the size of the groups in each step ( $2, 4, 8, \dots$ )
- The circuit grows logarithmically in complexity
- $C_i = G_{i-1} + P_{i-1} C_{i-1}$
- $S_i = (A_i \oplus B_i) \oplus C_{i-1}$

$$\rightarrow \begin{aligned} G_{i-1} &= C_{in} \\ P_{i-1} &= 0 \\ C_0 &= G_{i-1} \\ S_i &= (A_i \oplus B_i) \oplus C_{i-1} \end{aligned} \Rightarrow \begin{aligned} G_{i:j} &= G_{i:n} + P_{i:n} G_{n-1:j} \\ P_{i:j} &= P_{i:n} P_{n-1:j} \end{aligned}$$



$$\rightarrow t_{PA} = t_{pg} + \log_2 N \underbrace{(t_{pg-prefix})}_{\text{delay of black prefix cell}} + t_{XOR}$$

Q. Compute delay of 32-bit prefix adder. Assume 2-input gate delay = 100ps

A.  $t_{pg-prefix} = 200\text{ps}$  (2 gate delays in each black prefix cell)

$$t_{pg} = 100\text{ps}$$

$$N = 32 \Rightarrow \log_2 32 = 5$$

$$t_{XOR} = 100\text{ps}$$

$$t_{PA} = 100 + (5 \times 200) + 100 = 1200\text{ps}$$

## Subtractors

- Adders can add +ve & -ve numbers using 2's complement number representation
- $y = A - B$
- $-B = \bar{B} + 1$
- $\Rightarrow y = A + \bar{B} + 1$

This can be performed by carry propagate adder ( $A + \bar{B}$  with  $C_{in} = 1$ )



```
module subtractor #(parameter N=8)
  (input logic [N-1:0] a, b,
   output logic [N-1:0] y);
  assign y = a - b;
endmodule
```



## Comparators

- Determines if 2 binary numbers are = or > or < to each other
- It receives 2 N-bit binary numbers A and B
- 2 Types of Comparators
  - i) Equality Comparator - Produces single output indicating whether  $A = B$  by using XNOR gates to verify
  - ii) Magnitude Comparator - Produces one or more outputs indicating relative values of A & B by computing  $A - B$  and looking at sign (MSB) of result ( $1 = \text{Negative}$ ) ( $0 = \text{Positive}$ )



but functions incorrectly upon overflow

```
module comparator #(parameter N=8)
  (input logic [N-1:0] a, b,
   output logic eq, neq, lt, lte, gt, gte);
  assign eq = (a == b);
  assign neq = (a != b);
  assign lt = (a < b);
  assign lte = (a <= b);
  assign gt = (a > b);
  assign gte = (a >= b);
endmodule
```

## Arithmetic Logic Unit

- Combines variety of mathematical & logical operations into single unit
- Typical ALU can perform addition, subtraction, AND, OR operations
- ALU forms heart of most computer systems
- ALU receives control signal, F that specifies which function to perform
  - ↳ usually in blue to distinguish from data
- Some ALUs produce extra outputs called flags

ex: Overflow flag - indicates adder overflowed

Zero flag - ALU output is 0

C flag - When adder produces  $C_{out}$  & ALU is performing addition/subtract<sup>n</sup>



↗ indicate info abt ALU

| $F_{2:0}$ | Function                   |
|-----------|----------------------------|
| 000       | A AND B                    |
| 001       | A OR B                     |
| 010       | A + B                      |
| 011       | not used                   |
| 100       | A AND $\bar{B}$            |
| 101       | A OR $\bar{B}$             |
| 110       | A - B                      |
| 111       | SLT ↗ Magnitude Comparison |

## Shifters and Registers

- Move bits & multiply or divide by powers of 2
- Shifters ⇒ Shifts binary number left / right by specified no. of posns
- Logical Shifter - Shifts number to left or right and fills empty spots with 0's

$$11001 \text{ LSR } 2 \Rightarrow 00110 \Rightarrow >>$$

$$11001 \text{ LSL } 2 \Rightarrow 00100 \Rightarrow << \quad (\text{Special Case of Multiplication by } 2^N \rightarrow \frac{000011 \ll 4}{3 \times 2^4} = 110000)$$

- Arithmetic Shifter - Same as logical shifter but right shift fills MSB with copy of old MSB  
(useful for multiplying & dividing signed numbers) ↴ Only for ASR
- ASL = LSL (Fills with 0)

$$11001 \text{ ASR } 2 \Rightarrow 11110 \Rightarrow >> \quad (\text{Special case of Division by } 2^N \rightarrow \frac{11100 \ggg 2}{-4/2^2} = -1)$$

$$11001 \text{ ASL } 2 \Rightarrow 00100 \Rightarrow <<$$

- Rotators ⇒ Rotates numbers in a circle such that empty spots are filled with bits shifted off the other end

$$11001 \text{ ROR } 2 \Rightarrow 01110$$

$$11001 \text{ RDL } 2 \Rightarrow 00111$$

- N-bit shifter built from N 'N:1 MUX'

Input is shifted by 0 to N-1 bits depending on  $\log_2 N$ -bit select lines



## Multiplication

- Similar to decimal multiplication but involves 1's & 0's
- $N \times N$  multiplier multiplies 2  $N$ -bit numbers and produces a  $2N$ -bit result



$\rightarrow$   $N$  partial products  
 $\rightarrow$   $N-1$  stages of 1-bit adders



Q.

$$\begin{array}{r} 0 \ 1 \ 0 \ 1 \\ \times 0 \ 1 \ 1 \ 1 \\ \hline \end{array}$$

A.

$$\begin{array}{r} 0 \ 1 \ 0 \ 1 \\ 0 \ 1 \ 0 \ 1 \\ 0 \ 1 \ 0 \ 1 \\ 0 \ 0 \ 0 \ 0 \\ \hline 0 \ 1 \ 0 \ 0 \ 0 \ 1 \ 1 \end{array}$$

→ This operation also called MACs (used in DSP)

```
module multiplier #(parameter N=8)
  input logic [N-1:0] a, b;
  output logic [2*N-1:0] y;
begin
  assign y = a * b;
endmodule
```



## Division

→ Algorithm for  $N$ -bit unsigned in the range  $[0, 2^{N-1}]$ ;

$K = 0$

for  $i = N-1$  to 0

$$R' = \{R' \ll 1, A_i\}$$

$D = R' < 0$

else

$$R = R'$$

→ Divider computes  $\frac{A}{B}$  and produces quotient  $Q$  and remainder  $R$

→ Each row calculates  $D = R - B$

signal 'N' indicates whether  $D$  is -ve (row's MUX selects MSB = 1)

$Q_i = 0$  if  $D = -ve$  else 1 otherwise

MUX passes  $R$  to next row if difference is -ve &  $D$  otherwise



## Number System

→ Now we must consider fixed & floating point number system (Before only signed & unsigned integers)

### Fixed Point Number System

→ It has a decimal point between integer & fraction bits  
similar to decimal point between integer & fraction digits of decimal no.

ex: 01101100 → 0110.1100  
 (integer bits) high word      ↓  
 ↓ low word (fractional bits)  
 $2^2 + 2^1 + 2^0 + 2^{-1} = 6.75$

→ Signed fixed-point numbers can use either 2's complement or sign/magnitude notation

ex: 0010.0110  
 2's complement      ↓ sign & magnitude  
 1010.0110

Q. Compute  $0.75 - 0.625$  using fixed point numbers

A.  $0.625 \Rightarrow (0000.1010)_2$

Applying 2's complement,

$$(1111.0101) + 0.0001$$

$$-0.625 = (1111.0110)_2$$

$$0.75 = (0000.1100)_2$$

$$0.75 \quad 0000.1100$$

$$\begin{array}{r} +(-0.625) \\ \hline 0.125 \end{array} \quad \begin{array}{r} 1111.0110 \\ 10000.0010 \end{array}$$

But 1 overflows

$$\text{So, } 0000.001$$

### Floating Point Number System

→ It is similar to scientific notation  
and avoids the limitation of having constant no. of integer & fractional bits

→ It has  $\pm M \times E^{\text{Base}}$   
 Sign      ↓ Exponent  
 Mantissa

Q. Represent  $228_{10}$  in floating point representation

A.  $228_{10} = 11100100_2 = 1.1100100 \times 10^7$

|      |          |                          |
|------|----------|--------------------------|
| 0    | 00000111 | 110010000000000000000000 |
| Sign | Exponent | Mantissa                 |

We make 2 modifications

i) Since first bit of mantissa is always 1, we don't have to store it (implicit leading one)  
So we can improve efficiency

|   |          |                          |
|---|----------|--------------------------|
| 0 | 00000111 | 110010000000000000000000 |
|---|----------|--------------------------|

ii) The exponent needs to represent +ve & -ve exponents, Hence we used a biased exponent

$$(\text{Biased exponent} = \text{Original Exponent} + \text{Constant Bias } (127)) \Rightarrow 7 + 127 = 134 = 10000110$$

|   |          |                          |
|---|----------|--------------------------|
| 0 | 10000110 | 110010000000000000000000 |
|---|----------|--------------------------|

↳ IEEE 754 Standard

### Converting Decimal to Binary & Vice-versa for Fraction

ex: 3.125

→ Write 3 in binary as usual  $= (011)_2$

→ Take fractional part & keep multiplying with 2 till you get 1

$$\begin{aligned} 0.125 \times 2 &= 0.25 \\ 0.25 \times 2 &= 0.5 \\ 0.5 \times 2 &= 1 \end{aligned}$$

→ Write together  
 $(3.125)_{10} = (011.001)_2$

ex: 011.001  
 → Multiply by  $2^N$   
 N is position

$$\begin{array}{r} 0 \ 1 \ 1 \ . \ 0 \ 0 \ 1 \\ \downarrow \uparrow \downarrow \uparrow \downarrow \uparrow \\ 2 \ 1 \ 0 \ -1 \ -2 \ -3 \\ 0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0 + 0 \times 2^{-1} \\ + 0 \times 2^{-2} + 0 \cdot 2^{-3} \\ = 0 + 2 + 1 + 0 + 0 + 0 + 0.125 \\ = 3.125 \\ (011.001)_2 = (3.125)_{10} \end{array}$$

→ But all numbers can't be represented like some special cases which are given special notations

| No. | Sign | Exponent | Fraction                 |
|-----|------|----------|--------------------------|
| 0   | x    | 00000000 | 000000000000000000000000 |
| ∞   | 0    | 11111111 | 000000000000000000000000 |
| -∞  | 1    | 11111111 | 000000000000000000000000 |
| NaN | x    | 11111111 | Non-zero                 |

→ We have 2 types of precision

- i) Single-precision / Single / Float → Sign Bits = 1, Exponent Bits = 8, Fraction Bits = 23
- ii) Double-precision / Double → Sign Bits = 1, Exponent Bits = 11, Fraction Bits = 52

$$\rightarrow \pm 1.175494 \times 10^{-38} \text{ to } \pm 3.402824 \times 10^{38}$$

$$\rightarrow \pm 2.22507385850720 \times 10^{-308} \text{ to } \pm 1.79769313486232 \times 10^{308}$$

Q. Express the base 10 Numbers in IEEE 754 single precision floating-point format

i) -13.5625

ii) 42.3125

iii) -17.15625

A. i)  $-13.5625 = -(13.5625) = -(1101.1001) = -(1.1011001 \times 10^3)$

$$3 + 127 = 130 \Rightarrow 10000010$$

|   |          |                          |
|---|----------|--------------------------|
| 1 | 10000010 | 101100100000000000000000 |
|---|----------|--------------------------|

ii)  $42.3125 = 101010.0101 = 1.0101000101 \times 10^5$

$$5 + 127 = 132 \Rightarrow 10000100$$

|   |          |                          |
|---|----------|--------------------------|
| 0 | 10000100 | 010100100000000000000000 |
|---|----------|--------------------------|

iii)  $-17.15625 = -(17.15625) = -(10001.00101) = -(1.000100101 \times 10^4)$

$$4 + 127 = 131 \Rightarrow 10000011$$

|   |          |                          |
|---|----------|--------------------------|
| 1 | 10000011 | 000100101000000000000000 |
|---|----------|--------------------------|

## Floating Point Addition

→ Steps to follow:

- i) Extract exponent & fraction bit
- ii) Add leading 1 to form mantissa
- iii) Compare Exponents
- iv) Shift smaller mantissa if necessary
- v) Add mantissas
- vi) Normalize mantissa & adjust exponent if necessary
- vii) Round result
- viii) Assemble exponent & fraction back into floating point number

## Rounding

- Arithmetic results that fall outside of available precision must round to a neighboring number
  - Rounding modes - round down, round up, round towards 0, round to nearest
  - Overflow - Number is too large to be represented
  - Underflow - Number is too tiny to be represented

Q. Solve  $7.875 + 0.1875$  by floating point addition

A.

|    |        |   |   |           |                               |
|----|--------|---|---|-----------|-------------------------------|
| i) | 7.875  | = | 0 | 1000 0001 | 1111 100 000 0000 00000000 00 |
|    | 0.1875 | = | 0 | 0111 1100 | 1000 0000 000 000 000 0000 00 |

$$3) \quad 7.875 = 1.11111 \times 2^2$$

$$0.1875 = 1.1 \times 2^{-3}$$

$$\text{Powers difference} = 2 - (-3) = (5)_2 = (101)_2$$

Shift amount = 101

|    |           |                                                                                                                                                                                           |   |           |                                  |   |           |                                |        |
|----|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|-----------|----------------------------------|---|-----------|--------------------------------|--------|
| 4) |           | <table border="1"> <tr> <td>0</td><td>1000 0001</td><td>1.1111 100 000 00000 00000000 00</td></tr> <tr> <td>0</td><td>0111 1100</td><td>0.00000 11 000000000000 000000</td></tr> </table> | 0 | 1000 0001 | 1.1111 100 000 00000 00000000 00 | 0 | 0111 1100 | 0.00000 11 000000000000 000000 | 000000 |
| 0  | 1000 0001 | 1.1111 100 000 00000 00000000 00                                                                                                                                                          |   |           |                                  |   |           |                                |        |
| 0  | 0111 1100 | 0.00000 11 000000000000 000000                                                                                                                                                            |   |           |                                  |   |           |                                |        |

|    |  |   |           |                                 |
|----|--|---|-----------|---------------------------------|
| 5) |  | 0 | 1000 0001 | 1.1111 1 0000000000 000 0000 00 |
|    |  | 0 | 1000 0001 | 0.00000 11 000000000000 0000000 |

|    |                                                                                                                                                                                    |   |           |                                       |
|----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|-----------|---------------------------------------|
| 6) | <table border="1" style="border-collapse: collapse; width: 100%; text-align: center;"> <tr> <td style="padding: 2px;">0</td><td style="padding: 2px;">1000 0001</td></tr> </table> | 0 | 1000 0001 | $10.0000001\ 000000000\ 000000000$ >> |
| 0  | 1000 0001                                                                                                                                                                          |   |           |                                       |
|    | <table border="1" style="border-collapse: collapse; width: 100%; text-align: center;"> <tr> <td style="padding: 2px;">0</td><td style="padding: 2px;">1000 0010</td></tr> </table> | 0 | 1000 0010 | $1.0000001\ 000000000\ 000000000$     |
| 0  | 1000 0010                                                                                                                                                                          |   |           |                                       |

7) No Rounding Required

8)

Answer      0 100000010 00000011000000000 000000000

$$1.0000001 \times 2^3 = 1000.0001 \Rightarrow 8.0625$$

## Sequential Building Blocks

### → Counters

- It is a sequential arithmetic circuit with CLK and RESET as inputs & N-bit output Q.
- After every cycle, counter adds 1 to value stored in register



```
module counter #(parameter N=8)
    (input logic clk,
     input logic reset,
     output logic [N-1:0] q);
    always_ff @(posedge clk, posedge reset)
        if (reset) q <- 8'b0;
        else q <- q + 1;
endmodule
```



### → Shift Registers

- It has CLK, S<sub>in</sub>, S<sub>out</sub>, N parallel outputs Q<sub>N-1:0</sub>
- On rising edge of CLK, new bit is shifted from S<sub>in</sub> & subsequent contents are shifted forward & last bit in shift register is available at S<sub>out</sub>



- They are also called **serial-to-parallel** converters because input is given at S<sub>in</sub> (1 bit at a time) and after N cycles, the past N inputs are in parallel at Q.



- Shift register can also be modified to perform serial-to-parallel & parallel-to-serial operations by adding parallel input D<sub>N-1:0</sub> & control signal 'Load'

```
module shiftreg #(parameter N=8)
    (input logic clk,
     input logic reset, load,
     input logic sin,
     input logic [N-1:0] d,
     output logic [N-1:0] q,
     output logic sout);
    always_ff @(posedge clk, posedge reset)
        if (reset) q <- 8'b0;
        else if (load) q <- d;
        else q <- {q[N-2:0], sin};
    assign sout = q[N-1];
endmodule
```



## Scan Chains

- Shift registers test sequential circuits using 'scan chains'
- It is relatively difficult compared to combinational circuits testing
- Designer must directly observe & control all states of machine by adding a test mode in which contents of all flip-flops can be read out or loaded with desired values.
- i) In normal mode, flip-flops load data from their D input & ignore S<sub>in</sub>
- ii) In test mode, flip-flops serially shift contents out & shift in new content using S<sub>in</sub> & S<sub>out</sub> respectively



Q. Develop verilog model for the circuit that counts 16 clock cycles & produces control signal ctrl, that is 1 during every 8<sup>th</sup> & 12<sup>th</sup> cycle

```

A.
1 // Verilog model for a circuit that counts in clock cycles and produces a control signal 'ctrl'
2 `timescale 1ns/1ps
3
4 module clock_counter (
5   input clk,      // Clock signal
6   input rst,      // Reset signal
7   output reg ctrl // Control signal
8 );
9
10 // 4-bit counter to count up to 16
11 reg [3:0] count;
12
13 // Sequential logic: Counter increments on every clock edge
14 always @(posedge clk or posedge rst) begin
15   if (rst) begin
16     count <- 4'd0; // Reset counter to 0
17   end else begin
18     count <- count + 4'd1; // Increment counter
19     if (count == 4'd15) begin
20       count <- 4'd0; // Wrap around after 16
21     end
22   end
23 end
24
25 // Combinational logic: Set ctrl high during the 8th and 12th cycles
26 always @(*) begin
27   if (count == 4'd7 || count == 4'd11) begin
28     ctrl = 1'b1;
29   end else begin
30     ctrl = 1'b0;
31   end
32 end
33
34 endmodule
35

```

## Memory Arrays

→ Memory is organized as 2D array of memory cells

Memory reads/writes contents of one of the rows of the arrays

→ Row is specified by an **address**

Value read/written is called **Data**

Array with N-bit addresses & M-bit data has  $2^N$  rows & M columns

Each row of data is called **word**

So, array contains  $2^N M$ -bit words

ex:  $2^2 \times 3$ -bit array

$$\text{No. of words} = 2^N = 2^2 = 4$$

$$\text{Width} = 3 \text{ bits}$$



ex:  $2^{10} \times 32$ -bit array

$$\text{Size} = 32 \text{ Kb}$$



## Bit Cells

→ Memory arrays are built as an array of bit cells each of which stores 1 bit of data and each bit cell is connected to a wordline & bitline

→ For each combination of address bits, memory asserts a single wordline that activates the bitcell in me in that row

→ Operation:

i) Read Operation - To write a bit cell, bitline is initially left floating ( $Z$ )

wordline is turned ON, allowing stored value to drive the bitline to 0 or 1



ii) Write Operation - To write a bit cell, bitline is strongly driven to desired value, then

wordline is turned ON, connecting bitline to stored bit



## Organization of 4x3 Memory Array

- Read: a wordline is asserted & corresponding row of bit cells drives the bitlines HIGH or LOW  
to read 10, bitlines are left floating, decoder asserts wordline2 & then data stored in bit cells is read out to Data bitlines
- Write: Bitlines are driven HIGH or LOW & then wordline is asserted, allowing bitline values to be stored in that row of bit cells  
To write 001 to address 11, bitlines are driven to 001 & wordline3 is asserted & value of 001 is stored



## Memory Ports

- All memories have 1 or more ports and each port gives read and/or write access to one memory address
- Multiported memories can access several addresses simultaneously



CLEAN COPY

FOR REFERENCE



## Memory Types

→ Classified based on how bits are stored in bit cell

### Random Access Memory

- Volatile (Loses data when turned OFF)
- Read & written quickly
- Main memory (DRAM) in computer

### Read-Only Memory

- Non-volatile (Retains data when turned OFF)
- Reads quickly but writing is impossible / slow
- Flash memory in devices

### DRAM

- Stores data bits on capacitors
- Dynamic because data needs to be rewritten periodically



- NMOS behave like switch & when wordline is asserted, NMOS is ON & stored bit value transfers to / from bitline
- As time passes, noise degrades value of stored bit

### SRAM

- Static, cuz data doesn't need to be refreshed
- Data is stored on cross-coupled inverters
- Each cell has 2 outputs, bitline & bitline
- When wordline is asserted both NMOS are ON & data values transferred to / from bitlines
- Unlike DRAM, value is restored in cross-coupled inverter



- Stores a bit as presence / absence of transistor
- To read the cell, bitline is weakly pulled HIGH & wordline is turned ON
- If transistor is present, pulls bitline LOW. If absent, bitline remains HIGH



| Memory Type | Transistors per bit cell | Latency |
|-------------|--------------------------|---------|
| Flip-flop   | 20                       | fast    |
| RAM         | 6                        | medium  |
| RAM         | 1                        | slow    |

→ More area but less delay  
→ less area but more latency

| Type   | Name                          | Description                                                                                                                                                                                   |
|--------|-------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ROM    | Read Only Memory              | Chip is hardwired with presence or absence of transistors. Changing requires building a new chip.                                                                                             |
| EPROM  | Programmable ROM              | Fuses in series with each transistor are blown to program bits. Can't be changed after programming.                                                                                           |
| EEPROM | Electrically Programmable ROM | Charge is stored on a floating gate to activate or deactivate transistor. Erasing requires exposure to UV light.                                                                              |
| Flash  | Flash Memory                  | Like EEPROM, but erasing can be done electrically. Like EEPROM, but erasing is done on large blocks to amortize cost of erase circuit. Low cost per bit, dominates nonvolatile storage today. |

### 4x3 ROM Dot Notation

- Contents of ROM is indicated using DOT Notation where DOT represents intersection of wordline (row) & bitline (column) that the data bit is 1

Q. Implement the following functions using a  $2^2 \times 3$ -bit ROM

$$X = AB$$

$$Y = A + B$$

$$Z = AB'$$

A.



Bit-cell for a fuse-programmable ROM (One-Time Programmable ROM)

- User programs ROM by applying high voltage to selectively blow fuses
- If fuse present, transistor connected to ground & cell holds 0.
- If fuse destroyed, transistor disconnected from ground & cell holds 1.



Logic using memory arrays

- Memory arrays can also perform combinational logic functions
- Like in the previous question, X is AND gate, Y is OR gate
- Memory arrays that perform logic are called lookup tables (LUTs)

Memory HDL

- $2^N$ -word x M-bit RAM

```
module ram #(parameter N=8, M=32)
  input logic [31:0] din;
  input logic [M-1:0] we;
  input logic [N-1:0] adr;
  input logic [M-1:0] dout;
  output logic [M-1:0] dout;
begin
  logic [N-1:0] mem [2**N-1:0];
  always_ff @(posedge clk)
    if (we) mem [adr] <- din;
  assign dout = mem [adr];
end
endmodule
```



- 4 word x 3-bit ROM

```
module rom(input logic [1:0] adr,
            output logic [2:0] dout);
  always_comb
    case(adr)
      2'b00: dout <= 3'b111;
      2'b01: dout <= 3'b110;
      2'b10: dout <= 3'b100;
      2'b11: dout <= 3'b011;
    endcase
  endmodule
```

## Logic Arrays

- Gates organized into regular arrays
- If connections are made programmable, then any function can be performed w/o specific ways to connect wires
- Logic arrays are also reconfigurable
- 2 Types :
  - i) Programmable Logic Arrays (PLA)
  - ii) Field Programmable Gate Arrays (FPGA)

## PLAs

- Implement 2-level combinational logic in SOP form



Q. Dot Notation for  $3 \times 3 \times 2$ -bit PLA performing  $X = A'B'C + ABC'$  &  $Y = AB'$

A.



$$\xrightarrow{\text{implicants}} \bar{A}\bar{B}C + A\bar{B}\bar{C}$$



To minimize size & cost, ROMs & PLAs use pseudo NMOS or dynamic circuits instead of conventional logic gates

## FPGAs

- Array of reconfigurable gates.
  - Software tools like Xilinx Vivado can be used to implement designs onto it using HDL or schematic
  - FPGAs are powerful & flexible than PLAs
  - They can implement both combinational & sequential logic
  - Can implement multi-level functions
  - Composed of Logic Elements , Input / Output Elements , Programmable Interconnection
- Perform Logic ↪      ↪ Interface with outside world      ↪ Connect LEs & IOEs



- Leading FPGA manufacturers - Altera & Xilinx



Cyclone IV LE  
 ↪ Only 1 logic element  
 Consists of  
 i) 1 4-input LUT  
 ii) 1 registered output  
 iii) 1 combinational output

Alyssa P. Hacker is building a finite state machine that must run at 200 MHz. She uses a Cyclone IV GX FPGA with the following specifications:  $t_{LE} = 381$  ps per LE,  $t_{setup} = 76$  ps, and  $t_{pcq} = 199$  ps for all flip-flops. The wiring delay between LEs is 246 ps. Assume the hold time for the flip-flops is 0. What is the maximum number of LEs her design can use?

**Solution:** Alyssa uses Equation 3.13 to solve for the maximum propagation delay of the logic:  $t_{pd} \leq T_c - (t_{pcq} + t_{setup})$ .

Thus,  $t_{pd} = 5 \text{ ns} - (0.199 \text{ ns} + 0.076 \text{ ns})$ , so  $t_{pd} \leq 4.725 \text{ ns}$ . The delay of each LE plus wiring delay between LEs,  $t_{LE+wire}$ , is  $381 \text{ ps} + 246 \text{ ps} = 627 \text{ ps}$ . The maximum number of LEs,  $N$ , is  $Nt_{LE+wire} \leq 4.725 \text{ ns}$ . Thus,  $N = 7$ .

### Example 5.6 FUNCTIONS BUILT USING LEs

Explain how to configure one or more Cyclone IV LEs to perform the following functions: (a)  $X = \overline{A}\overline{B}C + A\overline{B}\overline{C}$  and  $Y = A\overline{B}$  (b)  $Y = JKLM PQR$ ; (c) a divide-by-3 counter with binary state encoding (see Figure 3.29(a)). You may show interconnection between LEs as needed.

**Solution:** (a) Configure two LEs. One LUT computes X and the other LUT computes Y, as shown in Figure 5.59. For the first LE, inputs *data 1*, *data 2*, and *data 3* are A, B, and C, respectively (these connections are set by the routing channels). *data 4* is a don't care but must be tied to something, so it is tied to 0. For the second LE, inputs *data 1* and *data 2* are A and B; the other LUT inputs are don't cares and are tied to 0. Configure the final multiplexers to select the combinational outputs from the LUTs to produce X and Y. In general, a single LE can compute any function of up to four input variables in this fashion.

(b) Configure the LUT of the first LE to compute  $X = JKLM$  and the LUT on the second LE to compute  $Y = XPQR$ . Configure the final multiplexers to select the combinational outputs X and Y from each LE. This configuration is shown in Figure 5.60. Routing channels between LEs, indicated by the dashed blue lines, connect the output of LE 1 to the input of LE 2. In general, a group of LEs can compute functions of N input variables in this manner.

(c) The FSM has two bits of state ( $S_{1:0}$ ) and one output (Y). The next state depends on the two bits of current state. Use two LEs to compute the next state from the current state, as shown in Figure 5.61. Use the two flip-flops, one from each LE, to hold this state. The flip-flops have a reset input that can be connected to an external *Reset* signal. The registered outputs are fed back to the LUT inputs

| (A)           | (B)           | (C)           |               | (X)        |
|---------------|---------------|---------------|---------------|------------|
| <i>data 1</i> | <i>data 2</i> | <i>data 3</i> | <i>data 4</i> | LUT output |
| 0             | 0             | 0             | X             | 0          |
| 0             | 0             | 1             | X             | 1          |
| 0             | 1             | 0             | X             | 0          |
| 0             | 1             | 1             | X             | 0          |
| 1             | 0             | 0             | X             | 0          |
| 1             | 0             | 1             | X             | 0          |
| 1             | 1             | 0             | X             | 1          |
| 1             | 1             | 1             | X             | 0          |



| (A)           | (B)           |               | (Y)           |            |
|---------------|---------------|---------------|---------------|------------|
| <i>data 1</i> | <i>data 2</i> | <i>data 3</i> | <i>data 4</i> | LUT output |
| 0             | 0             | X             | X             | 0          |
| 0             | 1             | X             | X             | 0          |
| 1             | 0             | X             | X             | 1          |
| 1             | 1             | X             | X             | 0          |



| (J)    | (K)    | (L)    | (M)    | (X)        | (P)    | (Q)    | (R)    | (X)    | (Y)        |
|--------|--------|--------|--------|------------|--------|--------|--------|--------|------------|
| data 1 | data 2 | data 3 | data 4 | LUT output | data 1 | data 2 | data 3 | data 4 | LUT output |
| 0      | 0      | 0      | 0      | 0          | 0      | 0      | 0      | 0      | 0          |
| 0      | 0      | 0      | 1      | 0          | 0      | 0      | 0      | 1      | 0          |
| 0      | 0      | 1      | 0      | 0          | 0      | 0      | 1      | 0      | 0          |
| 0      | 0      | 1      | 1      | 0          | 0      | 0      | 1      | 1      | 0          |
| 0      | 1      | 0      | 0      | 0          | 0      | 1      | 0      | 0      | 0          |
| 0      | 1      | 0      | 1      | 0          | 0      | 1      | 0      | 1      | 0          |
| 0      | 1      | 1      | 0      | 0          | 0      | 1      | 1      | 0      | 0          |
| 0      | 1      | 1      | 1      | 0          | 0      | 1      | 1      | 1      | 0          |
| 1      | 0      | 0      | 0      | 0          | 1      | 0      | 0      | 0      | 0          |
| 1      | 0      | 0      | 1      | 0          | 1      | 0      | 0      | 1      | 0          |
| 1      | 0      | 1      | 0      | 0          | 1      | 0      | 1      | 0      | 0          |
| 1      | 0      | 1      | 1      | 0          | 1      | 0      | 1      | 1      | 0          |
| 1      | 1      | 0      | 0      | 0          | 1      | 1      | 0      | 0      | 0          |
| 1      | 1      | 0      | 1      | 0          | 1      | 1      | 0      | 1      | 0          |
| 1      | 1      | 1      | 0      | 0          | 1      | 1      | 1      | 0      | 0          |
| 1      | 1      | 1      | 1      | 1          | 1      | 1      | 1      | 1      | 1          |



|        |        |        |        | (S <sub>0</sub> ) | (S <sub>1</sub> ) | (S <sub>0</sub> ) | LUT output |
|--------|--------|--------|--------|-------------------|-------------------|-------------------|------------|
| data 1 | data 2 | data 3 | data 4 |                   |                   |                   |            |
| X      | X      | 0      | 0      |                   |                   | 1                 |            |
| X      | X      | 0      | 1      |                   |                   | 0                 |            |
| X      | X      | 1      | 0      |                   |                   | 0                 |            |
| X      | X      | 1      | 1      |                   |                   | 0                 |            |

|        |        |        |        | (S <sub>1</sub> ) | (S <sub>0</sub> ) | (S <sub>1</sub> ) | LUT output |
|--------|--------|--------|--------|-------------------|-------------------|-------------------|------------|
| data 1 | data 2 | data 3 | data 4 |                   |                   |                   |            |
| X      | X      | 0      | 0      |                   |                   | 0                 |            |
| X      | X      | 0      | 1      |                   |                   | 1                 |            |
| X      | X      | 1      | 0      |                   |                   | 0                 |            |
| X      | X      | 1      | 1      |                   |                   | 0                 |            |



using the multiplexer on *data 3* and routing channels between LEs, as indicated by the dashed blue lines. In general, another LE might be necessary to compute the output  $Y$ . However, in this case  $Y = S_0$ , so  $Y$  can come from LE 1. Hence, the entire FSM fits in two LEs. In general, an FSM requires at least one LE for each bit of state, and it may require more LEs for the output or next state logic if they are too complex to fit in a single LUT.