

# Overview

Goal: build a 4-bit CPU using logic gates, meaning MUX, adder, ALU, registers are all implemented using logic gates

Registers: 4-bits wide

Gate delays

and — 3 ns   or — 3 ns   nand — 2 ns   nor — 2 ns   not — 1 ns

xor — 4 ns   xnor — 5 ns

Note: no need for a dependency tree, easier to just do a linear search through all ~23 modules to fix dependencies.

# Instructions list

Instructions: memory for 16 instructions, each 11b wide  
instruction 3'b00000000 add



instruction 3'b00000001 nand



instruction 3'b00000000 move



instruction 3'b000000011 noop



instruction 3'b000000100 branch



if(\*reg-a == \*reg-b)

PC\_NEXT = PC\_CURR + JMP ;

else

PC\_NEXT = PC\_CURR + 1;

FA



critical path delay: 10ns

RCA



critical path delay = 44 ns

A

# Overflow part 1

1) If  $\text{sign}(A) \neq \text{sign}(B) \rightarrow \text{no overflow}$

- Proof:  $A+B$ 's valid range is limited by

$$\max = 0 + (2^{N-1}) \text{ dec. } 0 \text{ until extreme}$$

$$\min = 0 + -2^{N-1} \text{ inc. } 0 \text{ until extreme}$$

$$\text{extreme: } (2^{N-1}-1) + (-2^{N-1}) = -1 \in \text{range (for } N \geq 1)$$

2) If  $\text{sign}(A) = \text{sign}(B)$  AND

a.  $\text{sign}(A+B) \neq \text{sign}(A) \rightarrow \text{ovf bc result makes no sense}$

b.  $\text{sign}(A+B) = \text{sign}(A) \rightarrow \text{no ovf, see proof}$

Prove 2b: If you exceed range you flip signs once

Two max pos.  $\underbrace{0111\dots}_N + \underbrace{0111\dots}_N = \underbrace{111\dots10}_N$

Extreme



Ex.  $111\dots$  ← carries

$$\begin{array}{r} 0111\dots \\ + 0111\dots \\ \hline 1110 \end{array}$$



only 1 flip

Two -max neg.

$$\underbrace{1000\dots}_N + \underbrace{1000\dots}_N = \underbrace{10000\dots}_{N+1}$$

Let  $P = \text{sign}(A) \geq \text{sign}(B)$

∴ If you didn't flip signs you didn't exceed range

∴ 'exceed range' means ovf!

Let  $Q = \text{sign}(A+B) \geq \text{sign}(A)$

Let  $R = \text{overflow has occurred}$

From 1)  $\bar{P} \rightarrow \bar{R}$ , so  $R \rightarrow P$

From 2a)  $P \bar{Q} \rightarrow R$

2b)  $P Q \rightarrow \bar{R}$ , so  $R \rightarrow \bar{P} \bar{Q}$  but  $R \rightarrow P$  so  $R \rightarrow P \bar{Q}$

Then,  $R \leftrightarrow P \bar{Q}$ , now we derive an eq. using T T

## Overflow part 2

We know  $R \leftrightarrow P\bar{Q}$ , i.e. overflow iff  $\text{sign}(A) == \text{sign}(B)$   
and



$\text{sign}(A+B) != \text{sign}(A)$

| Inputs    |           |           | Outputs |           |     |
|-----------|-----------|-----------|---------|-----------|-----|
| $a_{n-1}$ | $b_{n-1}$ | $c_{n-1}$ | $c_n$   | $s_{n-1}$ | Ovf |
| 0         | 0         | 0         | 0       | 0         | 0   |
| 0         | 0         | 1         | 0       | 1         | 1   |
| 0         | 1         | 0         | 0       | 1         | 0   |
| 0         | 1         | 1         | 1       | 0         | 0   |
| 1         | 0         | 0         | 0       | 1         | 0   |
| 1         | 0         | 1         | 1       | 0         | 0   |
| 1         | 1         | 0         | 1       | 0         | 1   |
| 1         | 1         | 1         | 1       | 1         | 0   |

Ovf iff  $c_{n-1} \neq c_n$ ,  
i.e.  $\text{Ovf} = c_n \oplus c_{n-1}$

Overflow examples:

$$-8 + -3 = 5 \text{ (should be } -11)$$

$$\begin{array}{r}
 & 1 & 0 & 0 & 0 \\
 \pm & 1 & 1 & 0 & 1 \\
 \hline
 & 0 & 1 & 0 & 1
 \end{array}$$

$$5 + 5 = -6 \text{ (should be } 10)$$

$$\begin{array}{r}
 & 1 & & \\
 & 0 & 1 & 0 & 1 \\
 \hline
 + & 0 & 1 & 0 & 1 \\
 \hline
 & 1 & 0 & 1 & 0
 \end{array}$$

# ALU



Critical path delay: 51 ns

# MUX\_2\_1 (1b/4b)

1-bit multiplexer



critical path delay = 7 ns

| a     | b | sel | res        |  |
|-------|---|-----|------------|--|
| 0     | 0 | 0   | 0 //copy a |  |
| 0     | 1 | 0   | 0          |  |
| 1     | 0 | 0   | 1          |  |
| 1     | 1 | 0   | 1          |  |
| <hr/> |   |     | <hr/>      |  |
| 0     | 0 | 1   | 0 //copy b |  |
| 0     | 1 | 1   | 1          |  |
| 1     | 0 | 1   | 0          |  |
| 1     | 1 | 1   | 1          |  |

4-bit multiplexer (4 1-bit muxes controlled by the same sel)

partial testbench:  CPD = 7 ns

$$A[3] = 0 \quad B[3] = 0 \quad \text{sel} = 0 \quad \text{res}[3] = 0$$

$$A[2] = 0 \quad B[2] = 1 \quad \text{res}[2] = 0$$

$$A[1] = 1 \quad B[1] = 0 \quad \text{res}[1] = 1$$

$$A[0] = 1 \quad B[0] = 1 \quad \text{res}[0] = 1$$

# MUX\_4\_1\_1b



# reg-file Part 1

Let OUT-A return contents of SEL-A

let OUT-B return contents of SEL-B



Write to reg interface



# reg-file part 2



# reg-file part 3



$DATA\_IN = 4'b0101 \rightarrow$

decoder  $\rightarrow$  mux\_21\_1b  $\rightarrow$  mux\_21\_4b  $\rightarrow$  reg4 setup  $\rightarrow$  (4ns) (7ns) (7ns) (17ns)

clk  
0  $\rightarrow$  1

reg write setup (35 ns)

reg4 hold  $\rightarrow$  mux\_4\_1\_4b  $\rightarrow$  OUT\_A = 4'b0101  
(17ns) (14ns)

reg write hold (17ns) reg read (14ns)

Total delay = 66 ns

写入的时候等setup+hold，读取的时候立刻取值。

可以同时读  $\rightarrow$  write\_en

# decoder\_2\_4\_1b



| $w_1$ | $w_0$ | $d_3$ | $d_2$ | $d_1$ | $d_0$ |
|-------|-------|-------|-------|-------|-------|
| 0     | 0     | 0     | 0     | 0     | 1     |
| 0     | 1     | 0     | 0     | 1     | 0     |
| 1     | 0     | 0     | 1     | 0     | 0     |
| 1     | 1     | 1     | 0     | 0     | 0     |

critical path delay = 4 ns

# PC part |

PC init and branch implementation

PC keep increasing 0-15, then repeat

Calculate both RCAs in parallel, before

clk  
0 → 1



Cycle 0 (set\_pc=1):

$$\max(\text{not} + \text{and}, \text{rca}) + \text{mux\_2\_1\_4b} + \text{mux\_2\_1\_4b} + \text{reg\_set}$$

1ns 3ns 44ns

7ns

7ns

17ns

clk  
0 → 1

reg\_hold  
17ns

setup: 75 ns  
hold: 17 ns

## PC Part 2

Test PC loops around, OSC shown for ref



# branch.tass (program to test branching)

Pseudocode:

```
let stdout = r0, initialized as 0  
move r1, #1  
move r2, #2  
move r3, #3  
if (r1 + r2 == r3)  
    print 1
```

```
else  
    print 2
```

also try w/  
nand

add:

$$\begin{array}{r} 0001 \\ + 0010 \\ \hline 0011 \end{array}$$

nand:

$$\begin{array}{r} 0001 \\ \uparrow 0010 \\ \hline 1111 \end{array}$$

| Tom Assembly:

```
1. move r0, 4'b0000  
2. move r1, 4'b0001  
3. move r2, 4'b0010  
4. add r3, r1, r2 // add circled  
5. branch 4'b0011, r1, r3 // branch to 4'b0011 circled  
6. move r0, 4'b0010  
7. branch 4'b0010, r0, r0 // branch to 4'b0010 circled  
8. move r0, 4'b0001  
9. noop // end of program
```

# back .tass (backward branching)

Pseudocode:

let stdout = r0, initialized as 0

move r1, # -1

move r2, # 2

move r3, # 3

if (r1 == r2) // first time taken  
branch to end program

r1 = r1 + r3

branch back

add:

$$\begin{array}{r} 0001 \\ + 0010 \\ \hline 0011 \end{array}$$

nand:

$$\begin{array}{r} 0001 \\ \uparrow 0010 \\ \hline 1111 \end{array}$$

| Tom Assembly:

| 0. move r0, 4'b0000

| 1. move r1, 4'b1111

| 2. move r2, 4'b0010

| 3. move r3, 4'b0011

| 4. branch 4'b0011, r1, r2

| 5. add r1, r1, r3

| 6. branch 4'b1110, r0, r0

| 7. noop // end of program

# ins\_mem



# ins\_dec part 1

| instruction | OP2<br>INS[10] | OP1<br>INS[9] | OP0<br>INS[8] | sel_data | alu_op |
|-------------|----------------|---------------|---------------|----------|--------|
| add         | 0              | 0             | 0             | 0        | 0      |
| nand        | 0              | 0             | 1             | 0        | 1      |
| push        | 0              | 1             | 0             | 1        | X      |
| noop        | 0              | 1             | 1             | X        | X      |
| branch      | 1              | 0             | 0             | X        | X      |

sel\_data = op1 = INS[9]

write\_en = see next page

alu\_op = OP0 = INS[8]

SEL\_A = INS[3:2]

SEL\_B = INS[1:0]

SEL\_W = INS[5:4]

IMM = INS[3:0]

JMP = INS[7:4]

## ins-dec part 2

| instruction | op2 | op1 | opo | write_en |
|-------------|-----|-----|-----|----------|
| add         | 0   | 0   | 0   | 1        |
| nand        | 0   | 0   | 1   | 1        |
| push        | 0   | 1   | 0   | 1        |
| noop        | 0   | 1   | 1   | 0        |
| branch      | 1   | 0   | 0   | 0        |

write\_en sum of minterms =  $\overline{\text{op2}} \overline{\text{op1}} \overline{\text{opo}} + \overline{\text{op2}} \overline{\text{op1}} \text{opo} +$

$$\overline{\text{op2}} \text{op1} \overline{\text{opo}}$$

product of maxterms =  $\neg (\overline{\text{op2}} \text{op1} \text{opo} + \text{op2} \overline{\text{op1}} \overline{\text{opo}})$

$$= \neg (\overline{\text{op2}} \text{op1} \text{opo}) \cdot \neg (\text{op2} \overline{\text{op1}} \overline{\text{opo}})$$

$$= (\text{op2} + \overline{\text{op1}} + \overline{\text{opo}}) \cdot (\overline{\text{op2}} + \text{op1} + \text{opo})$$

[ ]

less terms



Critical path delay = 7 ns

# MUX\_8\_1\_1b

critical path delay = 21 ns



|  | S2 | S1 | S0 | X | Y | f. |
|--|----|----|----|---|---|----|
|  | 0  | 0  | 0  | a | e | a  |
|  | 0  | 0  | 1  | b | f | b  |
|  | 0  | 1  | 0  | c | g | c  |
|  | 0  | 1  | 1  | d | h | d  |
|  | 1  | 0  | 0  | a | e | e  |
|  | 1  | 0  | 1  | b | f | f  |
|  | 1  | 1  | 0  | c | g | g  |
|  | 1  | 1  | 1  | d | h | h  |

# MUX\_16-1-1b

Critical path delay = 28 ns



|   | S3 | S2 | S1 | SO | x | y | res |
|---|----|----|----|----|---|---|-----|
| a | 0  | 0  | 0  | 0  | a | i | a   |
| b | 0  | 0  | 0  | 1  | b | j | b   |
| c | 0  | 0  | 1  | 0  | c | k | c   |
| d | 0  | 0  | 1  | 1  | d | l | d   |
| e | 0  | 1  | 0  | 0  | e | m | e   |
| f | 0  | 1  | 0  | 1  | f | n | f   |
| g | 0  | 1  | 1  | 0  | g | o | g   |
| h | 0  | 1  | 1  | 1  | h | p | h   |
| i | 1  | 0  | 0  | 0  | g | i | i   |
| j | 1  | 0  | 0  | 1  | b | j | j   |
| k | 1  | 0  | 1  | 0  | c | k | k   |
| l | 1  | 0  | 1  | 1  | d | l | l   |
| m | 1  | 1  | 0  | 0  | e | m | m   |
| n | 1  | 1  | 0  | 1  | f | n | n   |
| o | 1  | 1  | 1  | 0  | g | o | o   |
| p | 1  | 1  | 1  | 1  | h | p | p   |

# MUX\_16-1-9b part 1

critical path delay = 28 ns



# MUX\_16-1\_qb part 2

|        | bit8 | bit7 | bit6 | bit5 | bit4 | bit3 | bit2 | bit1 | bit0 |                                 |
|--------|------|------|------|------|------|------|------|------|------|---------------------------------|
| INS 0  | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 1    | mux_16_1_qb testbench           |
| INS 1  | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 1    | 0    | 代码                              |
| INS 2  | 0    | 0    | 0    | 0    | 0    | 0    | 1    | 0    | 0    | MUX_16_1_qb mux_16_1_qb_0[8:0]( |
| INS 3  | 0    | 0    | 0    | 0    | 0    | 1    | 0    | 0    | 0    | .a(INS0) ] individual           |
| INS 4  | 0    | 0    | 0    | 0    | 1    | 0    | 0    | 0    | 0    | .b(INS1) inputs                 |
| INS 5  | 0    | 0    | 0    | 1    | 0    | 0    | 0    | 0    | 0    | :                               |
| INS 6  | 0    | 0    | 1    | 0    | 0    | 0    | 0    | 0    | 0    | .p(INS15)]                      |
| INS 7  | 0    | 1    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | .sel3(sel3)]                    |
| INS 8  | 1    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | 0    | , sel2(sel2)] shared            |
| INS 9  | 1    | 1    | 1    | 1    | 1    | 1    | 1    | 1    | 0    | , sel1(sel1) inputs             |
| INS 10 | 1    | 1    | 1    | 1    | 1    | 1    | 1    | 0    | 1    | , sel0(sel0)]                   |
| INS 11 | 1    | 1    | 1    | 1    | 1    | 1    | 0    | 1    | 1    | , res(res));                    |
| INS 12 | 1    | 1    | 1    | 1    | 1    | 0    | 1    | 1    | 1    |                                 |
| INS 13 | 1    | 1    | 1    | 1    | 0    | 1    | 1    | 1    | 1    |                                 |
| INS 14 | 1    | 1    | 1    | 0    | 1    | 1    | 1    | 1    | 1    |                                 |
| INS 15 | 1    | 0    | 1    | 1    | 1    | 1    | 1    | 1    | 1    |                                 |



# sr\_latch



NOR gate:  
iff 任何输入是1，输出就是0

CASE 0: reset (delay = 4 ns)



CASE 1: set (delay = 4 ns)



↑ set/reset w/ X means most waiting ↑

CASE 2: hold 0 (delay = 0 ns)



CASE 3: hold 1 (delay = 0 ns)



; critical path delay = 4 ns

# d-latch



| $d$ | $g$ | $nd$ | $s$ | $r$ | action |
|-----|-----|------|-----|-----|--------|
| 0   | 0   | 1    | 0   | 0   | hold   |
| 0   | 1   | 1    | 0   | 1   | reset  |
| 1   | 0   | 0    | 0   | 0   | hold   |
| 1   | 1   | 0    | 1   | 0   | set    |

D-latch critical path = 8 ns because SR-Latch critical path delay  
 $= 4 \text{ ns}$

# dffP (rising edge)



reg4



# Oscillator part 1

1-stage ring oscillator (works in theory)



Ring oscillator w/ enable



| en | w0 | w1 |
|----|----|----|
| 0  | 0  | 0  |
| 0  | 1  | 0  |
| 1  | 0  | 0  |
| 1  | 1  | 1  |

Setting  $en=0$  makes  $w1=0$  after 3ns,  $w0=1$  after 4ns.  
At this point if  $en=1$  then  $w1=1$  and the oscillator works.

What if  $en=1$  after 3ns? 不要



There will always be x's appearing forever  
= metastability = verilog doesn't like this.

# Oscillator part 2

What if  $en=1$  after 4ns? We get a 4ns oscillator!

ns 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17



What if we add more oscillators ( $n=3$ )? Will signals ripple through in a deterministic manner?



ns 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17



For  $w_0$  to be defined when  $en=1$ , we should wait 1 period = 6 ns before  $en: 0 \rightarrow 1$

# Oscillator part 3

Oscillator w/ period  $\geq 2 * 139 \text{ ns}$ , 实际周期 =  $2 * 140 \text{ ns}$   
 Let  $n = \# \text{ not gates}$   $= 280 \text{ ns}$

$$\text{AND\_DELAY} + n * \text{NOT\_DELAY} \geq 139$$

$$3 + n * 1 \geq 139$$

$$n \geq 136, n \% 2 = 1, n = 137$$



Verilog generate



$$X 137 = w[136] \dots w[0]$$

[oscillator testbench]



从此刻开始采信号 (为了整齐)

# Datapath delay



$\boxed{\begin{array}{l} PC \text{ init} \\ PC = 0 \end{array}} \rightarrow clk=0$

为什么  $clk=0$  的时候算 ALU? 因为  
ALU result 来自  
ins 中的信息, 在  
 $clk 0 \rightarrow 1$  时会更新:  
 $reg = ALU \text{ result}$

$ins\_mem(28\text{ ns}) +$   
 $ins\_dec(4\text{ ns}) +$   
 $read\_reg(14\text{ ns}) +$   
 $MAX($   
 $alu\_eq(8\text{ ns}) +$   
 $PC\_setup(75\text{ ns}), ] 83\text{ ns}$   
 $ALU(51\text{ ns}) + ] 93\text{ ns}$   
 $MUX_2L-4b(7\text{ ns}) +$   
 $write\_reg\_setup(35\text{ ns}) + ]$

)

$= 139\text{ ns}$

$clk=1$   
 $\boxed{\begin{array}{l} PC \text{ hold}(17\text{ ns}), \\ write\_reg\_hold(17\text{ ns}) \end{array}}$   
 $= 17\text{ ns}$

$\therefore \text{need clock period} \geq 2(139\text{ ns}) = 278\text{ ns}$