

# CS 110

## Computer Architecture

### *Single-Cycle CPU*

### *Datapath & Control*

Instructor:  
**Sören Schwertfeger**

<http://shtech.org/courses/ca/>

**School of Information Science and Technology SIST**

**ShanghaiTech University**

**Slides based on UC Berkley's CS61C**

# One-Instruction-Per-Cycle RISC-V Machine



- One clock tick => one instruction
- Current state outputs => inputs to combinational logic => outputs settle at the values of state before next clock edge
- Rising clock edge:
  - all state elements are updated with combinational logic outputs
  - execution moves to next clock cycle

# Stages of Execution on Datapath



# Datapath and Control

- Datapath designed to support data transfers required by instructions
- Controller causes correct transfers to happen



# Branches

- Different change to the state:
  - $\text{PC} = \begin{cases} \text{PC} + 4, & \text{branch not taken} \\ \text{PC} + \text{immediate}, & \text{branch taken} \end{cases}$
- Six branch instructions:  
BEQ, BNE, BLT, BGE, BLTU, BGEU
- Need to compute  $\text{PC} + \text{immediate}$  and to compare values of  $\text{rs1}$  and  $\text{rs2}$ 
  - But have only one ALU – need more hardware

# Adding Branches



# Branch Comparator



- $\text{BrEq} = 1$ , if  $A=B$
- $\text{BrLT} = 1$ , if  $A < B$
- $\text{BrUn} = 1$  selects unsigned comparison for  $\text{BrLT}$ ,  $0=\text{signed}$
- BGE branch:  $A \geq B$ , if  $\overline{\overline{A < B}} = !(A < B)$

$$\overline{\overline{A < B}} = !(A < B)$$

# Branch Immediates (In Other ISAs)

- 12-bit immediate encodes PC-relative offset of -4096 to +4094 bytes in multiples of 2 bytes
- Standard approach: Treat immediate as in range -2048..+2047, then shift left by 1 bit to multiply by 2 for branches



Each instruction immediate bit can appear in one of two places in output immediate value – so need one 2-way mux per bit

# RISC-V Branch Immediates

- 12-bit immediate encodes PC-relative offset of -4096 to +4094 bytes in multiples of 2 bytes
- RISC-V approach: keep 11 immediate bits in fixed position in output value, and rotate LSB of S-format to be bit 12 of B-format



Only one bit changes position between S and B, so only need a single-bit 2-way mux:

# RISC-V Immediate Encoding

| Instruction encodings, $\text{inst}[31:0]$ |            |            |               |                    |               |   |   |   |   |  |        |
|--------------------------------------------|------------|------------|---------------|--------------------|---------------|---|---|---|---|--|--------|
| 31                                         | 30         | 25 24      | 20 19         | 15 14              | 12 11         | 8 | 7 | 6 | 0 |  |        |
| <b>funct7</b>                              | <b>rs2</b> | <b>rs1</b> | <b>funct3</b> | <b>rd</b>          | <b>opcode</b> |   |   |   |   |  | R-type |
| <b>imm[11:0]</b>                           |            | <b>rs1</b> | <b>funct3</b> | <b>rd</b>          | <b>opcode</b> |   |   |   |   |  | I-type |
| <b>imm[11:5]</b>                           | <b>rs2</b> | <b>rs1</b> | <b>funct3</b> | <b>imm[4:0]</b>    | <b>opcode</b> |   |   |   |   |  | S-type |
| <b>imm[12 10:5]</b>                        | <b>rs2</b> | <b>rs1</b> | <b>funct3</b> | <b>imm[4:1 11]</b> | <b>opcode</b> |   |   |   |   |  | B-type |

| 32-bit immediates produced, $\text{imm}[31:0]$ |                  |                      |                      |                   |   |   |   |   |  |  |        |
|------------------------------------------------|------------------|----------------------|----------------------|-------------------|---|---|---|---|--|--|--------|
| 31                                             | 25 24            | 12                   | 11                   | 10                | 5 | 4 | 1 | 0 |  |  |        |
| - $\text{inst}[31]$ -                          |                  | $\text{inst}[30:25]$ | $\text{inst}[24:21]$ | $\text{inst}[20]$ |   |   |   |   |  |  | I-imm. |
| - $\text{inst}[31]$ -                          |                  | $\text{inst}[30:25]$ | $\text{inst}[11:8]$  | $\text{inst}[7]$  |   |   |   |   |  |  | S-imm. |
| - $\text{inst}[31]$ -                          | $\text{inst}[7]$ | $\text{inst}[30:25]$ | $\text{inst}[11:8]$  |                   |   |   |   | 0 |  |  | B-imm. |

Upper bits sign-extended from  $\text{inst}[31]$  always

Only bit 7 of instruction changes role in immediate between S and B

# Let's Add JALR (I-Format)



- JALR rd, rs, immediate
- Two changes to the state
  - Writes PC+4 to rd (return address)
  - Sets PC = rs + immediate
  - Uses same immediates as arithmetic and loads
    - **no** multiplication by 2 bytes
    - LSB is ignored

# Datapath So Far, with Branches



# Adding JALR



# Adding JAL



- JAL saves PC+4 in register rd (the return address)
- Set PC = PC + offset (PC-relative jump)
- Target somewhere within  $\pm 2^{19}$  locations, 2 bytes apart
  - $\pm 2^{18}$  32-bit instructions
- Immediate encoding optimized similarly to branch instruction to reduce hardware cost

# Datapath with JALR



# Adding JAL



# U-Format for “Upper Immediate” Instructions



- Has 20-bit immediate in upper 20 bits of 32-bit instruction word
- One destination register, rd
- Used for two instructions
  - LUI – Load Upper Immediate
  - AUIPC – Add Upper Immediate to PC

# Implementing LUI



# Implementing AUI PC



# Recap: Complete RV32I ISA

|                       |     |     |     |             |
|-----------------------|-----|-----|-----|-------------|
| imm[31:12]            |     |     | rd  | 0110111     |
| imm[31:12]            |     |     | rd  | 0010111     |
| imm[20:10:1 11 19:12] |     |     | rd  | 1101111     |
| imm[11:0]             | rs1 | 000 | rd  | 1100111     |
| imm[12:10:5]          | rs2 | rs1 | 000 | imm[4:1 11] |
| imm[12:10:5]          | rs2 | rs1 | 001 | imm[4:1 11] |
| imm[12:10:5]          | rs2 | rs1 | 100 | imm[4:1 11] |
| imm[12:10:5]          | rs2 | rs1 | 101 | imm[4:1 11] |
| imm[12:10:5]          | rs2 | rs1 | 110 | imm[4:1 11] |
| imm[12:10:5]          | rs2 | rs1 | 111 | imm[4:1 11] |
| imm[11:0]             |     | rs1 | 000 | rd          |
| imm[11:0]             |     | rs1 | 001 | rd          |
| imm[11:0]             |     | rs1 | 010 | rd          |
| imm[11:0]             |     | rs1 | 100 | rd          |
| imm[11:0]             |     | rs1 | 101 | rd          |
| imm[11:5]             | rs2 | rs1 | 000 | imm[4:0]    |
| imm[11:5]             | rs2 | rs1 | 001 | imm[4:0]    |
| imm[11:5]             | rs2 | rs1 | 010 | imm[4:0]    |
| imm[11:0]             |     | rs1 | 000 | rd          |
| imm[11:0]             |     | rs1 | 010 | rd          |
| imm[11:0]             |     | rs1 | 011 | rd          |
| imm[11:0]             |     | rs1 | 100 | rd          |
| imm[11:0]             |     | rs1 | 110 | rd          |
| imm[11:0]             |     | rs1 | 111 | rd          |

|       |                |       |      |       |     |         |
|-------|----------------|-------|------|-------|-----|---------|
| LUI   | 0000000        | shamt | rs1  | 001   | rd  | 0010011 |
| AUIPC | 0000000        | shamt | rs1  | 101   | rd  | 0010011 |
| JAL   | 0100000        | shamt | rs1  | 101   | rd  | 0010011 |
| JALR  | 0000000        | rs2   | rs1  | 000   | rd  | 0110011 |
| BEQ   | 0100000        | rs2   | rs1  | 000   | rd  | 0110011 |
| BNE   | 0000000        | rs2   | rs1  | 001   | rd  | 0110011 |
| BLT   | 0000000        | rs2   | rs1  | 010   | rd  | 0110011 |
| BGE   | 0000000        | rs2   | rs1  | 011   | rd  | 0110011 |
| BLTU  | 0000000        | rs2   | rs1  | 100   | rd  | 0110011 |
| BGEU  | 0000000        | rs2   | rs1  | 101   | rd  | 0110011 |
| LB    | 0100000        | rs2   | rs1  | 101   | rd  | 0110011 |
| LH    | 0000000        | rs2   | rs1  | 110   | rd  | 0110011 |
| LW    | 0000000        | rs2   | rs1  | 111   | rd  | 0110011 |
| LBU   | 0000           | pred  | succ | 00000 | 000 | 0001111 |
| LHU   | 0000           | 0000  | 0000 | 00000 | 001 | 0001111 |
| SB    | 00000000000000 |       |      | 00000 | 000 | 00000   |
| SH    | 000000000001   |       |      | 00000 | 000 | 00000   |
| SW    | csr            |       |      | rs1   | 001 | rd      |
| ADDI  | csr            |       |      | rs1   | 010 | rd      |
| SLTI  | csr            |       |      | rs1   | 011 | rd      |
| SLTIU | csr            |       |      | zimm  | 101 | rd      |
| XORI  | csr            |       |      | zimm  | 110 | rd      |
| ORI   | csr            |       |      | zimm  | 111 | rd      |
| ANDI  | csr            |       |      |       |     | 1110011 |

Not in CA

- RV32I has 47 instructions
- 37 instructions are enough to run any C program

# Complete Single-Cycle RV32I Datapath!



# Register Transfer Level (RTL)

- RTL describes instructions in figures or text
- Can use C (or Verilog) to describe RTL

## Inst Register Transfers

```
add    R[rd] ← R[rs1] + R[rs2]; PC ← PC + 4
sub    R[rd] ← R[rs1] - R[rs2]; PC ← PC + 4
ori    R[rd] ← R[rs1] | Imm; PC ← PC + 4
jal    R[rd] ← PC + 4; PC ← R[rs1] + Imm
beq    if ( R[rs1] == R[rs2] )
            PC ← PC + Imm
        else PC ← PC + 4
```

# Our Single-Core Computer (without FPU, SIMD)



# Example: add



# add Execution



# Example: SW



# Example: beq



# Midterm I

- Date: Wednesday, Apr. 03
- Time: 08:15- 9:55 (normal lecture slot)
  - Be punctual – we start 8:15 sharp!
- Venue: Teaching Center 301 + 201
- Closed book:
  - You can bring one A4 page with notes (both sides; in English): Write your Chinese and **Pinyin** name on the top! **Handwritten** by you!
  - You will be provided with the RISC-V "green sheet"
  - No other material allowed!

# Midterm I

- Switch cell phones **off!** (not silent mode – off!)
  - Put them in your bags.
- Bags under the table. Nothing except paper, pen, 1 drink, 1 snack on the table!
- No other electronic devices are allowed!
  - No ear plugs, music, smartwatch...
- Anybody touching any electronic device will **FAIL** the course!
- Anybody found cheating (copy your neighbors answers, additional material, ...) will **FAIL** the course!











**Old School Machine Structures**

- C program starts
- C executable and is loaded into memory by OS (copying bytes)
- OS sets up stack, then tells sets up stack
- Run time first initializes memory and other libraries
- Then will your procedure run and main

**New School Machine Structures**

- Parallel Requests
- Parallel Threads
- Parallel Data
- Hardware descriptors

**6 Great Ideas in Computer Architecture**

1. Abstraction
2. Levels of Representation / Interpretation
3. Moore's Law
4. Decoupling through threads
5. Principle of Locality
6. Memory Hierarchy
7. Parallelism
8. Performance Measurement and Improvement
9. Dependability via Redundancy

**Two's Complement Representation**

- treats 0 as positive
- 32-bit word represents  $2^{32}$  states from  $-2^{31}$  to  $2^{31}-1$

**Components of a Computer**

**fw. c → cpp → fo.1 → compiler**

- cpp replaces comments with a single space
- cpp commands begins with '#'

**(standard) Type**

- (signed) char / signed char
- (unsigned) int 4
- (signed) short 2 + long × 2
- (unsigned) long 4
- float 4 byte
- double 8 byte

**Level of Representation / Interpretation**

10. 指令集 1. 指令
11. Q Q 是指与
12. II 逻辑或
13. ?: 条件左
14. = + = Q =
- = - = 1 = to → to
- <= <= 1 = to → to
- >= >= 1 = to → to
15. , to → to
- long long 32 8
- long long 64 8

**Valid Pointer Arithmetic**

- Add an integer to a pointer
- Subtract 2 pointers (in the same array)
- Compare two pointers ( $C, C+1, \dots$ )
- Run time first initializes memory and other libraries
- Run time first initializes memory and other libraries
- Then will your procedure run and main

**Six Fundamental Steps in Calling a Function**

1. Put parameters in a place where function can access them
2. Transfer control to function
3. Acquire local storage resources needed for function
4. Perform desired task of the function
5. Put result value in a place where calling code can access it and restore any registers you used
6. Return control to point of origin, since a function can be called from several parts of a program.

**\$fp**

|                             |
|-----------------------------|
| Saved argument register     |
| Saved return address        |
| Saved saved registers       |
| Local arrays and structures |

**sp**

|                             |
|-----------------------------|
| Stack                       |
| ↓                           |
| Dynamic data                |
| Saved static data           |
| Temporary                   |
| ↓                           |
| Local arrays and structures |

**bp**

|                             |
|-----------------------------|
| Stack                       |
| ↓                           |
| Dynamic data                |
| Saved static data           |
| Temporary                   |
| ↓                           |
| Local arrays and structures |

**Code**

**Stack**

- free when function returns
- Last in first out

**Format**

- used for instructions with immediates, lw and sw and branches (beq and bne)

**Common Memory Problems**

- Using uninitialized variables
- Using memory that you don't own
- Improper use of free / realloc by memory with the function handle returned by malloc / calloc
- Memory leaks

**Opcode rs rt rd shamt funct**

**opcode = 0**

**2 format: 5-bit field only represents numbers up to 31**

**1st instruction has immediate, then it uses at most 2 registers used for lw, sw, beq, bne branches and with immediate**

**Decoding with larger immediate**

- Label Upper lui
- add \$to \$to \$to \$to \$to \$to \$to
- lui \$tut \$tut \$tut \$tut
- ori \$tut \$tut \$tut \$tut
- addu \$to \$to \$to \$to

**Architectural Implementation**

**Large Constant Descriptor**

THE PURPOSE OF THIS REFERENCE GUIDE IS TO WALK THROUGH THE PROCESS OF BOOTING THE SIFT WORKSTATION, CREATING A TIMELINE ("SUPER" OR "MICRO") AND REVIEWING IT.



# Content

- Main topics
  - Number representation
  - C
  - RISC-V
- Plus general “Computer Architecture” knowledge
- Everything till lecture 8 CALL – including lecture 8

# Peer Instruction: Critical Path

Critical path for addi

$$R[rd] = R[rs1] + imm$$



- A.  $t_{clk-q} + t_{Add} + t_{IMEM} + t_{Reg} + t_{BComp} + t_{ALU} + t_{DMEM} + t_{mux} + t_{Setup}$
- B.  $t_{clk-q} + t_{IMEM} + \max\{t_{Reg}, t_{Imm}\} + t_{ALU} + 2t_{mux} + t_{Setup}$
- C.  $t_{clk-q} + t_{IMEM} + \max\{t_{Reg}, t_{Imm}\} + t_{ALU} + 3t_{mux} + t_{DMEM} + t_{Setup}$
- D. None of the above

# Peer Instruction: Critical Path

Critical path for addi

$$R[rd] = R[rs1] + imm$$



- A.  $t_{clk-q} + t_{Add} + t_{IMEM} + t_{Reg} + t_{BComp} + t_{ALU} + t_{DMEM} + t_{mux} + t_{Setup}$
- B.  $t_{clk-q} + t_{IMEM} + \max\{t_{Reg}, t_{Imm}\} + t_{ALU} + 2*t_{mux} + t_{Setup}$
- C.  $t_{clk-q} + t_{IMEM} + \max\{t_{Reg}, t_{Imm}\} + t_{ALU} + 3*t_{mux} + t_{DMEM} + t_{Setup}$
- D. None of the above

# Instruction Timing



| IF     | ID       | EX     | MEM    | WB     | Total         |
|--------|----------|--------|--------|--------|---------------|
| I-MEM  | Reg Read | ALU    | D-MEM  | Reg W  |               |
| 200 ps | 100 ps   | 200 ps | 200 ps | 100 ps | <b>800 ps</b> |

# Instruction Timing

| Instr | IF = 200ps | ID = 100ps | ALU = 200ps | MEM=200ps | WB = 100ps | Total |
|-------|------------|------------|-------------|-----------|------------|-------|
| add   | X          | X          | X           |           | X          | 600ps |
| beq   | X          | X          | X           |           |            | 500ps |
| jal   | X          | X          | X           |           | X          | 600ps |
| lw    | X          | X          | X           | X         | X          | 800ps |
| sw    | X          | X          | X           | X         |            | 700ps |

- Maximum clock frequency
  - $f_{\max} = 1/800\text{ps} = 1.25 \text{ GHz}$
- Most blocks idle most of the time
  - E.g.  $f_{\max, \text{ALU}} = 1/200\text{ps} = 5 \text{ GHz!}$

# Control Logic Truth Table

| Inst[31:0]          | BrEq | BrLT | PCSel | ImmSel | BrUn | ASel | BSel | ALUSel | MemRW | RegWEn | WBSel             |
|---------------------|------|------|-------|--------|------|------|------|--------|-------|--------|-------------------|
| <b>add</b>          | *    | *    | +4    | *      | *    | Reg  | Reg  | Add    | Read  | 1      | ALU               |
| <b>sub</b>          | *    | *    | +4    | *      | *    | Reg  | Reg  | Sub    | Read  | 1      | ALU               |
| <b>(R-R<br/>Op)</b> | *    | *    | +4    | *      | *    | Reg  | Reg  | (Op)   | Read  | 1      | ALU               |
| <b>addi</b>         | *    | *    | +4    | I      | *    | Reg  | Imm  | Add    | Read  | 1      | ALU               |
| <b>lw</b>           | *    | *    | +4    | I      | *    | Reg  | Imm  | Add    | Read  | 1      | Mem               |
| <b>sw</b>           | *    | *    | +4    | S      | *    | Reg  | Imm  | Add    | Write | 0      | *                 |
| <b>beq</b>          | 0    | *    | +4    | B      | *    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>beq</b>          | 1    | *    | ALU   | B      | *    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>bne</b>          | 0    | *    | ALU   | B      | *    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>bne</b>          | 1    | *    | +4    | B      | *    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>blt</b>          | *    | 1    | ALU   | B      | 0    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>bltu</b>         | *    | 1    | ALU   | B      | 1    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>jalr</b>         | *    | *    | ALU   | I      | *    | Reg  | Imm  | Add    | Read  | 1      | PC+4              |
| <b>jal</b>          | *    | *    | ALU   | J      | *    | PC   | Imm  | Add    | Read  | 1      | PC+4              |
| <b>auiopc</b>       | *    | *    | +4    | U      | *    | PC   | Imm  | Add    | Read  | 1      | ALU <sup>42</sup> |

# Control Realization Options

- ROM
  - “Read-Only Memory”
  - Regular structure
  - Can be easily reprogrammed
    - fix errors
    - add instructions
  - Popular when designing control logic manually
- Combinatorial Logic
  - Today, chip designers use logic synthesis tools to convert truth tables to networks of gates

# RV32I, a nine-bit ISA!

|                       |     |     |     |             |
|-----------------------|-----|-----|-----|-------------|
| imm[31:12]            |     |     | rd  | 0110111     |
| imm[31:12]            |     |     | rd  | 0010111     |
| imm[20:10:1 11 19:12] |     |     | rd  | 1101111     |
| imm[11:0]             | rs1 | 000 | rd  | 1100111     |
| imm[12:10:5]          | rs2 | rs1 | 000 | imm[4:1 11] |
| imm[12:10:5]          | rs2 | rs1 | 001 | imm[4:1 11] |
| imm[12:10:5]          | rs2 | rs1 | 100 | imm[4:1 11] |
| imm[12:10:5]          | rs2 | rs1 | 101 | imm[4:1 11] |
| imm[12:10:5]          | rs2 | rs1 | 110 | imm[4:1 11] |
| imm[12:10:5]          | rs2 | rs1 | 111 | imm[4:1 11] |
| imm[11:0]             |     | rs1 | 000 | rd          |
| imm[11:0]             |     | rs1 | 001 | rd          |
| imm[11:0]             |     | rs1 | 010 | rd          |
| imm[11:0]             |     | rs1 | 100 | rd          |
| imm[11:0]             |     | rs1 | 101 | rd          |
| imm[11:5]             | rs2 | rs1 | 000 | imm[4:0]    |
| imm[11:5]             | rs2 | rs1 | 001 | imm[4:0]    |
| imm[11:5]             | rs2 | rs1 | 010 | imm[4:0]    |
| imm[11:0]             |     | rs1 | 000 | rd          |
| imm[11:0]             |     | rs1 | 010 | rd          |
| imm[11:0]             |     | rs1 | 011 | rd          |
| imm[11:0]             |     | rs1 | 100 | rd          |
| imm[11:0]             |     | rs1 | 110 | rd          |
| imm[11:0]             |     | rs1 | 111 | rd          |

| inst[30]         |       |      | inst[14:12] inst[6:2] |     |           |
|------------------|-------|------|-----------------------|-----|-----------|
| 0000000          | shamt | rs1  | 001                   | rd  | 0010011   |
| 0000000          | shamt | rs1  | 101                   | rd  | 0010011   |
| 0100000          | shamt | rs1  | 101                   | rd  | 0010011   |
| 0000000          | rs2   | rs1  | 000                   | rd  | 0110011   |
| 0100000          | rs2   | rs1  | 000                   | rd  | 0110011   |
| 0000000          | rs2   | rs1  | 001                   | rd  | 0110011   |
| 0000000          | rs2   | rs1  | 010                   | rd  | 0110011   |
| 0000000          | rs2   | rs1  | 011                   | rd  | 0110011   |
| 0000000          | rs2   | rs1  | 100                   | rd  | 0110011   |
| 0000000          | rs2   | rs1  | 101                   | rd  | 0110011   |
| 0100000          | rs2   | rs1  | 101                   | rd  | 0110011   |
| 0000000          | rs2   | rs1  | 110                   | rd  | 0110011   |
| 0000000          | rs2   | rs1  | 111                   | rd  | 0110011   |
| 0000000          | pred  | succ | 00000                 | 000 | 0001111   |
| 00000            | 0000  | 0000 | 00000                 | 001 | 0001111   |
| 0000000000000000 |       |      | 00000                 | 000 | 000000011 |
| 0000000000000001 |       |      | 00000                 | 000 | 000000011 |
| csr              |       | rs1  | 001                   | rd  | 1110011   |
| csr              |       | rs1  | 010                   | rd  | 1110011   |
| csr              |       | rs1  | 011                   | rd  | 1110011   |
| zimm             |       | zimm | 101                   | rd  | 1110011   |
| zimm             |       | zimm | 110                   | rd  | 1110011   |
| zimm             |       | zimm | 111                   | rd  | 1110011   |

Not in CS61C

Instruction type encoded using only 9 bits  
**inst[30],inst[14:12], inst[6:2]**

# Combinational Logic Control

- Simplest example: BrUn

|              |     |     | inst[14:12] | inst[6:2] = Branch |          |      |
|--------------|-----|-----|-------------|--------------------|----------|------|
| imm[12 10:5] | rs2 | rs1 | 000         | imm[4:1 11]        | 11000 11 | BEQ  |
| imm[12 10:5] | rs2 | rs1 | 001         | imm[4:1 11]        | 11000 11 | BNE  |
| imm[12 10:5] | rs2 | rs1 | 100         | imm[4:1 11]        | 11000 11 | BLT  |
| imm[12 10:5] | rs2 | rs1 | 101         | imm[4:1 11]        | 11000 11 | BGE  |
| imm[12 10:5] | rs2 | rs1 | 110         | imm[4:1 11]        | 11000 11 | BLTU |
| imm[12 10:5] | rs2 | rs1 | 111         | imm[4:1 11]        | 11000 11 | BGEU |

- How to decode whether BrUn is 1?

- BrUn = Inst [13] • Branch

# ROM-based Control



# ROM Controller Implementation



# Controller: AND logic (add, sw)



- $\text{add} = \overline{\text{inst}[2]} \cdot \overline{\text{inst}[3]} \cdot \overline{\text{inst}[4]} \cdot \overline{\text{inst}[5]} \cdot \overline{\text{inst}[6]} \cdot \overline{\text{inst}[12]} \cdot \overline{\text{inst}[13]} \cdot \overline{\text{inst}[14]} \cdot \overline{\text{inst}[30]}$
- $\text{sw} = \overline{\text{inst}[2]} \cdot \overline{\text{inst}[3]} \cdot \overline{\text{inst}[4]} \cdot \overline{\text{inst}[5]} \cdot \overline{\text{inst}[6]} \cdot \overline{\text{inst}[12]} \cdot \overline{\text{inst}[13]} \cdot \overline{\text{inst}[14]}$

AND Controller Logic

|     | 2                                                                                                                                                        | 3 | 4 | 5 | 6 | 12 | 13 | 14 | 30 | unused |
|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------|---|---|---|---|----|----|----|----|--------|
| add | xor(i[2], 1 ) • xor(i[3], 1 ) • xor(i[4], 0 ) • xor(i[5], 0 ) • xor(i[6], 1 ) • xor(i[12], 1 ) • xor(i[13], 1 ) • xor(i[14], 1 ) • (xor(i[30], 1 ) + 0 ) |   |   |   |   |    |    |    |    |        |
| sw  | xor(i[2], 1 ) • xor(i[3], 1 ) • xor(i[4], 1 ) • xor(i[5], 0 ) • xor(i[6], 1 ) • xor(i[12], 1 ) • xor(i[13], 1 ) • xor(i[14], 1 ) • (xor(i[30], 1 ) + 1 ) |   |   |   |   |    |    |    |    |        |
| ... |                                                                                                                                                          |   |   |   |   |    |    |    |    |        |

- $\text{bne\_t} = \text{bne} \cdot \overline{\text{BrEQ}}$
- $\text{bne\_f} = \text{bne} \cdot \text{BrEQ}$

# Controller: OR logic



- MemRW = sw + sh + sb
- WBSel[0] = add + (all reg. to reg.) + AUIPC + ...
- WBSel[1] = JALR + JAL
- PCSel = beq\_t + bne\_t + blt\_t + bltu\_t + jal + jalr
- ...



# Control Logic Truth Table

| Inst[31:0]          | BrEq | BrLT | PCSel | ImmSel | BrUn | ASel | BSel | ALUSel | MemRW | RegWEn | WBSel             |
|---------------------|------|------|-------|--------|------|------|------|--------|-------|--------|-------------------|
| <b>add</b>          | *    | *    | +4    | *      | *    | Reg  | Reg  | Add    | Read  | 1      | ALU               |
| <b>sub</b>          | *    | *    | +4    | *      | *    | Reg  | Reg  | Sub    | Read  | 1      | ALU               |
| <b>(R-R<br/>Op)</b> | *    | *    | +4    | *      | *    | Reg  | Reg  | (Op)   | Read  | 1      | ALU               |
| <b>addi</b>         | *    | *    | +4    | I      | *    | Reg  | Imm  | Add    | Read  | 1      | ALU               |
| <b>lw</b>           | *    | *    | +4    | I      | *    | Reg  | Imm  | Add    | Read  | 1      | Mem               |
| <b>sw</b>           | *    | *    | +4    | S      | *    | Reg  | Imm  | Add    | Write | 0      | *                 |
| <b>beq</b>          | 0    | *    | +4    | B      | *    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>beq</b>          | 1    | *    | ALU   | B      | *    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>bne</b>          | 0    | *    | ALU   | B      | *    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>bne</b>          | 1    | *    | +4    | B      | *    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>blt</b>          | *    | 1    | ALU   | B      | 0    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>bltu</b>         | *    | 1    | ALU   | B      | 1    | PC   | Imm  | Add    | Read  | 0      | *                 |
| <b>jalr</b>         | *    | *    | ALU   | I      | *    | Reg  | Imm  | Add    | Read  | 1      | PC+4              |
| <b>jal</b>          | *    | *    | ALU   | J      | *    | PC   | Imm  | Add    | Read  | 1      | PC+4              |
| <b>auiopc</b>       | *    | *    | +4    | U      | *    | PC   | Imm  | Add    | Read  | 1      | ALU <sup>50</sup> |

# ROM Controller Implementation



# Call home, we've made HW/SW contact!



# “And In conclusion...”

- We have built a processor!
  - Capable of executing all RISC-V instructions in one cycle each
  - Not all units (hardware) used by all instructions
  - Critical path changes
- 5 Phases of execution
  - IF, ID, EX, MEM, WB
  - Not all instructions are active in all phases
- Controller specifies how to execute instructions
  - Implemented as ROM or logic