

# ECE243

CPU

Prof. Enright Jerger

## IMPLEMENTING A SIMPLE CPU

- How are machine instructions implemented?
- What components are there?
- How are they connected and controlled?

# MINI ISA:

- every instruction is 1-byte wide
  - data and address values are also 1-byte wide
- address space
  - byte addressable (every byte has an address)
  - 8 addr bits => 256 byte locations
- 4 registers:
  - k0..k3
- PC (resets to ~~\$80~~<sup>0x</sup>)
- Condition codes:
  - Z (zero), N (negative)
  - these are used by branches

## Some Definitions:

- IMM3: a 3-bit signed immediate, 2 parts:
  - 1 sign bit: sign(IMM3)
  - 2 bit value: value(IMM3)
- IMM4: a 4-bit signed immediate
- IMM5: a 5-bit unsigned immediate
- R1, R2: registers variables
  - represent one of k0..k3
- SE8(X):

- means sign-extend value X to 8 bits
- NOTE: ALL INSTS DO THIS LAST:
  - $PC = PC + 1$   
 $= PC + \text{sizeof}(Instr)$

## Mini ISA Instructions

load R1 (R2):

$$R1 = \text{mem}[R2]$$

$$PC = PC + 1$$

store R1 (R2):

$$\text{mem}[R2] = R1$$

$$PC = PC + 1$$

add R1 R2

$$R1 = R1 + R2$$

~~Set cond. codes~~

$$\left\{ \begin{array}{l} \text{if } (R1 == 0) Z = 1, \text{ else } Z = 0 \\ \text{if } (R1 < 0) N = 1, \text{ else } N = 0 \end{array} \right.$$

$$PC = PC + 1$$

sub R1 R2

$$R1 = R1 - R2$$

Set cond. codes

$$PC = PC + 1$$

## nand R1 R2

$R1 = R1 \text{ bit-wise NAND } R2 \text{ Z}$

Set cond. codes

$PC = PC + 1$

## ori IMM5

$K1 = K1 \text{ bitwise -OR } IMM5^-$

Set cond. codes

$PC = PC + 1$

## shift R1 IMM3

if ( $\text{sign}(IMM3)$ )  $R1 = R1 \ll \text{val}(IMM3)$   
else  $R1 = R1 \gg \text{val}(IMM3)$

Set cond. codes

$PC = PC + 1$

## bz IMM4

if ( $Z == 1$ )  $PC = PC + SE8(IMM4)$

$PC = PC + 1$

## bnz IMM4

if ( $Z == 0$ )  $PC = PC + SE8(IMM4)$   
 $PC = PC + 1$

## bpz IMM4

$\text{if } (N == 0) \text{ PC} = \text{PC} + \text{SEG}/(\text{Imm4})$

$\text{PC} = \text{PC} + 1$

## ENCODINGS: Inst(opcode)

- Load(0000), store(0010), add(0100), sub(0110), nand(1000):



Ex: add k<sub>2</sub> k<sub>1</sub>

encoding : 1001 0100

Ex: load k<sub>3</sub> (k<sub>0</sub>)

encoding : 1100 0000

- Ori:



Ex: Ori 5  
encoding 0010 1111

- Shift:



Ex: Shift k3 2  
encoding 1101 0011

- BZ(0101), BNZ(1001), BPZ(1101):



Ex: bnz -2  
encoding: 1110 1001

# DESIGNING A CPU

- Two main components:
  - *datapath* and *control*
- datapath:
  - registers, functional units, muxes, *wires*
  - must be able to perform all steps of every inst
- control:
  - a finite state machine (FSM)
  - commands the datapath
  - performs: fetch, decode, read, execute, write, get next inst

## CPU: basic components

### REGISTERS



- **REGISTERS**

- can always read
- we assume falling-edge-triggered
- in is stored if REGWrite=1 on falling clock edge
- we won't normally draw the clock input

## MUXES



- 'select' signal chooses which input to route to output

## REGISTER FILE



- Out1 is the value of reg indexed by R1
- Out2 is the value of reg indexed by R2
- if REGWrite is 1 when clock goes low
  - then the value on 'in' is written to reg indexed by Rwrite

## ALU (arithmetic logic unit)

- ALUop:
  - add = 000
  - sub = 001
  - or = 010
  - nand = 011
  - shift = 100



- Z = nor(out7,out6,out5...out0)
- N = out bit 7 (implies negative---sign bit)

# MEMORY

- our CPU has two memories for simplicity:
  - instruction memory and data memory
  - known as a “Harvard architecture”

## INSTRUCTION MEM



- is read only
- Iout is set to the value indexed by the address

# DATA MEMORY



- can read or write
  - but only one in a given clock cycle
- on falling clock edge:
  - if  $\text{MEMWrite}==1$ : value on  $\text{Din}$  is stored at  $\text{addr}$
  - if  $\text{MEMRead}==1$ : value at  $\text{addr}$  is output on  $\text{Dout}$

## SE8(x): SIGN-EXTEND TO 8 BITS



- assuming 4-bit input
- Recall: want:
  - $\text{SE8}(0100) \rightarrow 00000100$
  - $\text{SE8}(\underline{1100}) \rightarrow 11111100$
- In bits  $i_3, i_2, i_1, i_0$ ; out bits  $o_7 \dots o_0$

## **ZE8(x): ZERO EXTEND TO 8 bits**



- assuming 5-bit input
- Recall: want
  - $\text{ZE8}(00100) \rightarrow 00000100$
  - $\text{ZE8}(11100) \rightarrow 00011100$
- In bits  $i_4, i_3, i_2, i_1, i_0$ ; out bits  $o_7 \dots o_0$

# CPU: Single Cycle Implementation

## SINGLE CYCLE DATAPATH



- each instruction executes entirely
  - in one cycle of the cpu clock
- registers are triggered by the falling edge
  - new values begin propagating through datapath
  - some values may be temporarily incorrect
- the clock period is large enough to ensure:
  - that all values correct before next falling edge

# FETCH



- needed by every instruction
    - i.e., every instruction must be fetched

PC = PC + 1



# BRANCHES: BZ IMM4

$$PC = PC + 1$$



# ADD add R1 R2



- does r1 = r1 + r2
- same datapath for sub and nand



# SHIFT: SHIFT R1 IMM3



|    |      |    |    |    |    |    |    |
|----|------|----|----|----|----|----|----|
| i7 | i6   | i5 | i4 | i3 | i2 | i1 | i0 |
| R1 | IMM3 | 0  | 1  | 1  |    |    |    |

# ORI: ORI IMM5



- does:  $k1 \leftarrow k1 \text{ bitwise-or IMM5}$



# Store: Store R1 (R2)



- does:  $\text{mem}[r2] = \underline{\text{r1}}$

| Inst: | i7 | i6 | i5     | i4 | i3 | i2 | i1 | i0 |
|-------|----|----|--------|----|----|----|----|----|
|       | R1 | R2 | opcode |    |    |    |    |    |

# Load: Load R1 (R2)



- does:  $r1 = \text{mem}[r2]$

| i7 | i6 | i5     | i4 | i3 | i2 | i1 | i0 |
|----|----|--------|----|----|----|----|----|
| R1 | R2 | opcode |    |    |    |    |    |

# Final Datapath!



# DESIGNING THE CONTROL UNIT



- **CONTROL SIGNALS TO GENERATE:**
  - PCsel, PCwrite, REGwrite, MEMread, MEMwrite, R1sel, ALUop, ALU2, RFin

# Control Signals



|      | INPUTS        |   |   | OUTPUTS |          |           |          |       |           |      |      |       |  |
|------|---------------|---|---|---------|----------|-----------|----------|-------|-----------|------|------|-------|--|
| INST | Inst bits 3-0 | N | Z | PC Sel  | PC Write | Reg Write | Mem Read | R1Sel | Mem Write | ALU2 | RFin | ALUop |  |
| LOAD | 0000          | X | X | 0       | 1        | 1         | 1        | 0     | 0         | XX   | 1    | XXX   |  |



|       | INPUTS        |   |   | OUTPUTS |          |           |          |       |           |      |      |       |  |
|-------|---------------|---|---|---------|----------|-----------|----------|-------|-----------|------|------|-------|--|
| INST  | Inst bits 3-0 | N | Z | PC Sel  | PC Write | Reg Write | Mem Read | R1Sel | Mem Write | ALU2 | RFin | ALUop |  |
| STORE | 0010          | X | X | 0       | 1        | 0         | 0        | 0     | 1         | XX   | XX   | XXX   |  |



|      | INPUTS        |   |   | OUTPUTS |          |           |          |       |           |      |      |       |  |
|------|---------------|---|---|---------|----------|-----------|----------|-------|-----------|------|------|-------|--|
| INST | Inst bits 3-0 | N | Z | PC Sel  | PC Write | Reg Write | Mem Read | R1Sel | Mem Write | ALU2 | RFin | ALUop |  |
| ADD  | 0100          | X | X | 0       | 1        | 1         | 0        | 0     | 0         | 66   | 0    | 066   |  |



sub R1 R2

|      | INPUTS        |   |   |        | OUTPUTS  |           |          |       |           |      |      |       |  |
|------|---------------|---|---|--------|----------|-----------|----------|-------|-----------|------|------|-------|--|
| INST | Inst bits 3-0 | N | Z | PC Sel | PC Write | Reg Write | Mem Read | R1Sel | Mem Write | ALU2 | RFin | ALUop |  |
| SUB  | 0110          | X | X | 0      | 1        | 1         | 0        | 0     | 0         | 00   | 0    | 061   |  |



nand R1 R2

|      | INPUTS        |   |   | OUTPUTS |          |           |          |       |           |      |      |       |  |
|------|---------------|---|---|---------|----------|-----------|----------|-------|-----------|------|------|-------|--|
| INST | Inst bits 3-0 | N | Z | PC Sel  | PC Write | Reg Write | Mem Read | R1Sel | Mem Write | ALU2 | RFin | ALUop |  |
| NAND | 1000          | X | X | 0       | 1        | 1         | 0        | 0     | 0         | 00   | 0    | 011   |  |



|      | INPUTS        |   |   |        | OUTPUTS  |           |          |       |           |       |         |       |  |
|------|---------------|---|---|--------|----------|-----------|----------|-------|-----------|-------|---------|-------|--|
| INST | Inst bits 3-0 | N | Z | PC Sel | PC Write | Reg Write | Mem Read | R1Sel | Mem Write | ALU2  | RFin    | ALUop |  |
| ORI  | X111          | X | X | 0      | 1        | 1         | 0        | 1     | 001       | 01010 | 0010010 | 01010 |  |



|       | INPUTS        |   |   | OUTPUTS |          |           |          |       |           |      |      |       |  |
|-------|---------------|---|---|---------|----------|-----------|----------|-------|-----------|------|------|-------|--|
| INST  | Inst bits 3-0 | N | Z | PC Sel  | PC Write | Reg Write | Mem Read | R1Sel | Mem Write | ALU2 | RFin | ALUop |  |
| SHIFT | X011          | X | X | 0       | 1        | 1         | 0        | 0     | 0         | 10   | 6    | 106   |  |



| INST | Inst bits 3-0 | N | Z | PC Sel | PC Write | Reg Write | Mem Read | R1 Sel | Mem Write | ALU2 | RFin | ALUop |
|------|---------------|---|---|--------|----------|-----------|----------|--------|-----------|------|------|-------|
| BZ   | 0101          | X | 0 | 0      | 1        | 0         | 0        | X      | 0         | XX   | ✓    | X✓X   |
|      | 0101          | X | 1 | 1      | 1        | 0         | 0        | X      | 0         | XX   | ✗    | X✗X   |



bnz IMM4

| INST | Inst bits 3-0 | N | Z | PC Sel | PC Write | Reg Write | Mem Read | R1 Sel | Mem Write | ALU2 | RFin | ALUop |
|------|---------------|---|---|--------|----------|-----------|----------|--------|-----------|------|------|-------|
| BNZ  | 1001          | X | 0 | 1      | 1        | 0         | 0        | X      | 0         | XX   | X    | XXX   |
|      | 1001          | X | 1 | 0      | 1        | 0         | 0        | X      | 0         | XX   | X    | XXX   |



| INST | Inst bits 3-0 | N | Z | PC Sel | PC Write | Reg Write | Mem Read | R1 Sel | Mem Write | ALU2 | RFin | ALUop |
|------|---------------|---|---|--------|----------|-----------|----------|--------|-----------|------|------|-------|
| BPZ  | 1101          | 0 | X | 1      | 1        | 0         | 0        | X      | 0         | x    | X    | XXX   |
|      | 1101          | 1 | X | 0      | 1        | 0         | 0        | X      | 0         | XY   | X    | XXY   |

# All Control Signals

|       | INPUTS        |   |   | OUTPUTS |          |           |          |        |           |      |      |       |  |
|-------|---------------|---|---|---------|----------|-----------|----------|--------|-----------|------|------|-------|--|
| INST  | Inst bits 3-0 | N | Z | PC Sel  | PC Write | Reg Write | Mem Read | R1 Sel | Mem Write | ALU2 | RFin | ALUop |  |
| LOAD  | 0000          | X | X | 0       | 1        | 1         | 1        | X      | 0         | X    | 1    | XXX   |  |
| STORE | 0010          | X | X | 0       | 1        | 0         | 0        | 0      | 1         | X    | X    | XXX   |  |
| ADD   | 0100          | X | X | 0       | 1        | 1         | 0        | 0      | 0         | 00   | 0    | 000   |  |
| SUB   | 0110          | X | X | 0       | 1        | 1         | 0        | 0      | 0         | 00   | 0    | 001   |  |
| NAND  | 1000          | X | X | 0       | 1        | 1         | 0        | 0      | 0         | 00   | 0    | 011   |  |
| ORI   | X111          | X | X | 0       | 1        | 1         | 0        | 1      | 0         | 01   | 0    | 010   |  |
| SHIFT | X011          | X | X | 0       | 1        | 1         | 0        | 0      | 0         | 10   | 0    | 100   |  |
| BZ    | 0101          | X | 0 | 0       | 1        | 0         | 0        | X      | 0         | X    | X    | XXX   |  |
|       | 0101          | X | 1 | 1       | 1        | 0         | 0        | X      | 0         | X    | X    | XXX   |  |
| BNZ   | 1001          | X | 0 | 1       | 1        | 0         | 0        | X      | 0         | X    | X    | XXX   |  |
|       | 1001          | X | 1 | 0       | 1        | 0         | 0        | X      | 0         | X    | X    | XXX   |  |
| BPZ   | 1101          | 0 | X | 1       | 1        | 0         | 0        | X      | 0         | X    | X    | XXX   |  |
|       | 1101          | 1 | X | 0       | 1        | 0         | 0        | X      | 0         | X    | X    | XXX   |  |

# Building Control Logic: MemRead

|                    | Load | Store | Add  | Sub  | Nand | Ori  | Shift | Bz   | Bnz  |      | BPZ  |      |
|--------------------|------|-------|------|------|------|------|-------|------|------|------|------|------|
| inst bits<br>i3-i0 | 0000 | 0010  | 0100 | 0110 | 1000 | X111 | X011  | 0101 | 0101 | 1001 | 1001 | 1101 |
| N                  | X    | X     | X    | X    | X    | X    | X     | X    | X    | X    | 0    | 1    |
| Z                  | X    | X     | X    | X    | X    | X    | X     | 0    | 1    | 0    | 1    | X    |
| Mem<br>Read        | 1    | 0     | 0    | 0    | 0    | 0    | 0     | 0    | 0    | 0    | 0    | 0    |

• MemRead = opcode (load)

$$= \text{!i3} \wedge \text{!i2} \wedge \text{!i1} \wedge \text{!i0}$$

Don't care about N & Z



# Building Control Logic: PCSel

|                    | Load | Store | Add  | Sub  | Nand | Ori  | Shift | Bz   | Bnz  |   | BPZ  |      |      |      |
|--------------------|------|-------|------|------|------|------|-------|------|------|---|------|------|------|------|
| inst bits<br>i3-i0 | 0000 | 0010  | 0100 | 0110 | 1000 | X111 | X011  | 0101 | 0101 | — | 1001 | 1001 | 1101 | 1101 |
| N                  | X    | X     | X    | X    | X    | X    | X     | X    | X    | X | X    | 0    | 1    |      |
| Z                  | X    | X     | X    | X    | X    | X    | X     | 0    | 1    | 0 | 1    | X    | X    |      |
| PCSel              | 0    | 0     | 0    | 0    | 0    | 0    | 0     | 0    | 1    | 0 | 0    | 1    | 0    |      |

$$\begin{aligned}
 \text{PCSel} &= (\text{op} = b_2 \wedge \neg z) / (\text{op} = b_{n2} \wedge \neg !z) / (\text{op} = b_{p2} \wedge \neg !n) \\
 &= (!i_3 \wedge i_2 \wedge \neg !i_7 \wedge i_0 \wedge \neg z) / (i_3 \wedge \neg i_2 \wedge \neg !i_7 \wedge i_0 \wedge \neg !z) / \\
 &\quad (i_3 \wedge i_2 \wedge \neg !i_7 \wedge i_0 \wedge \neg !n) \\
 &= \underline{\neg (i_1 \wedge i_0 \wedge (\neg i_3 \wedge \neg i_2 \wedge z \wedge i_3 \wedge \neg i_2 \wedge \neg z \wedge i_3 \wedge i_2 \wedge \neg n))} / i_3 \wedge i_2 \wedge i_1 \wedge i_0 \wedge \neg z \wedge \neg n
 \end{aligned}$$





# CPU: Multicycle Implementation

# A Multicycle Datapath



# Key Difference #1: Only 1 Memory



# Key Difference #2: Only 1 ALU



previously : 1 ALU + 2 Adders

# Key Difference #3: Temp Regs



What benefit are temp regs/multi/cycle?

# Key Difference #3: Temp Regs



critical path is long  $\rightarrow$  large clock period!

# Key Difference #3: Temp Regs



smaller critical path → shorter clk period!

# IR: Instruction Register



holds instruction encoding

# MDR: Memory Data Register



holds value returned from memory

# R1 and R2



hold values from reg file

# ALUout



holds result calculated by ALU

# Cycle by Cycle Operation

All Insts Cycle1:  
Fetch and Increment PC



$IR \leftarrow \text{mem}[PC]$

$PC \leftarrow PC + 1$

# All Insts Cycle2: Decoding Inst & Reading Reg File



$R1 \leftarrow K_x (IR[7..6])$

$R2 \leftarrow K_y (IR[5..4])$

Note: not all instr need  $R1$  &  $R2$

# Add, Sub, Nand Cycle3: Calculate



ALUout ← R1 op R2

# Add, Sub, Nand Cycle4: Write to Reg File



$R_x \leftarrow ALU.out$

# Shift Cycle3: Calculate



# Shift Cycle4: Write to Reg File



$Kx \leftarrow ALUout$   
 $[IR[7..6]]$

# ORI Cycle3: Read K1 from Reg File



# ORI Cycle4: Calculate



ALUout  $\leftarrow$  R1 op Imm5

# ORI Cycle5: Write to Reg File



$K1 \leftarrow ALUout$

# Load Cycle3: addr to Mem, value into MDR



$MDR \leftarrow mem[R2]$

# Load Cycle4: write value into reg file



# Store Cycle3: addr to Mem, value to Mem



mem[R2] ← R1

# Branches Cycle3



$PC \leftarrow PC + Imm^4$  (if branch taken)

# Summary

| Instructions                   | Single Cycle<br>Eg: 1 MHz | Multicycle<br>Eg: 4 MHz |
|--------------------------------|---------------------------|-------------------------|
| Store, BZ, BNZ,<br>BPZ         | 1 cycle                   | 3 cycles                |
| Add, Sub, Nand,<br>Load, Shift | 1 cycle                   | 4 cycles                |
| ORI                            | 1 cycle                   | 5 cycles                |

Example: total time to execute one of each instruction:

$$\begin{aligned} \text{Single cycle: } & 1 \times 4 + 1 \times 5 + 1 \times 1 = 10 \text{ cycles/1MHz} \\ & = 10 \mu\text{s} \end{aligned}$$

$$\begin{aligned} \text{Multicycle: } & 3 \times 4 + 4 \times 5 + 5 \times 1 = 37 \text{ cycles/4MHz} \\ & = 9.25 \mu\text{s} \end{aligned}$$

# Implementing Multicycle Control

|   | Add, Sub,<br>Nand            | Shift                              | Ori                               | Load                                       | Store                                   | Bnz, Bz,<br>Bpz |
|---|------------------------------|------------------------------------|-----------------------------------|--------------------------------------------|-----------------------------------------|-----------------|
| 1 |                              |                                    |                                   | $IR = Mem[PC]$<br>$PC = PC + 1$            |                                         |                 |
| 2 |                              |                                    |                                   | $R1 = RF[IR[7..6]]$<br>$R2 = RF[IR[5..4]]$ |                                         |                 |
| 3 | $ALUout = R1 \text{ op } R2$ | $ALUout = R1 \text{ Shift } Imm_3$ | $R1 = RF[i]$                      | $MDR = Mem[R2]$                            | $mem[R2] = R1$<br>$PC = PC + SE(Imm_4)$ |                 |
| 4 | $RF[IR[7..6]] = ALUout$      |                                    | $ALUout = R1 \text{ OR } Imm_5^-$ | $RF[IR[7..6]] = MDR$                       | X                                       | X               |
| 5 | X                            | X                                  | $RF[i] = ALUout$                  | X                                          | X                                       | X               |

# Control: An FSM

- need a state transition diagram
- how many states are there?

12

- how many bits to represent state?

4 State bits

# Multicycle Control as an FSM



# Fetch

# Decode



# Multicycle Control Hardware



## CPU: Adding a New Instruction

# EXAMPLE QUESTION: ADDING A NEW INSTRUCTION

- Implement a post-increment load:
- load r1, (r2)+

Does:  $RF[r1] = MEM[RF[r2]]$

$$RF[r2] = RF[r2] + 1$$

r2 is permanently changed to be r2+1

Implementing:

$RF[r1] = MEM[RF[r2]]$ ;  $RF[r2] = RF[r2] + 1$

Recall: load r1, (r2)

$IR = \text{mem}[PC]$ ,  $PC = PC + 1$  - Fetch (same)

$R1 = RF[r1]$ ,  $R2 = RF[r2]$  - Decode (same)

$MDR = \text{mem}[R2]$ ,  $ALUout = R2 + 1$

$RF[r1] = MDR$

$RF[r2] = ALUout$

# Modifying the Datapath

$$\underline{RF[r2] = RF[r2] + 1}$$

$$ALUout = R2 + 1 \quad \checkmark$$

$$RF[r2] = ALUout \quad \checkmark$$

