

# Control and Status Registers

# Complete Single-Cycle RV32I Datapath!





# Control and Status Registers

- Control and status registers (CSRs) are separate from the register file (**x0-x31**)
  - Used for monitoring the status and performance
  - There can be up to 4096 CSRs
- Not in the base ISA, but almost mandatory in every implementation
  - ISA is modular
  - Necessary for counters and timers, and communication with peripherals

# CSR Instructions



| Register operand |            |            |           |            |
|------------------|------------|------------|-----------|------------|
| Instr.           | rd         | rs         | Read CSR? | Write CSR? |
| <b>csrrw</b>     | <b>x0</b>  | -          | no        | yes        |
| <b>csrrw</b>     | <b>!x0</b> | -          | yes       | yes        |
| <b>csrrs/c</b>   | -          | <b>x0</b>  | yes       | no         |
| <b>csrrs/c</b>   | -          | <b>!x0</b> | yes       | yes        |

# CSR Instructions



**source/dest**      **source**    **instr**      **rd**      **SYSTEM**  
uimm[4:0] ← Zero-extended to 32b

| Immediate operand |            |      |           |            |
|-------------------|------------|------|-----------|------------|
| Instr.            | rd         | uimm | Read CSR? | Write CSR? |
| <b>csrrwi</b>     | <b>x0</b>  | -    | no        | yes        |
| <b>csrrwi</b>     | <b>!x0</b> | -    | yes       | yes        |
| <b>csrrs/ci</b>   | -          | 0    | yes       | no         |
| <b>csrrs/ci</b>   | -          | !0   | yes       | yes        |

# CSR Instruction Example

- The CSRRW (Atomic Read/Write CSR) instruction ‘atomically’ swaps values in the CSRs and integer registers.
  - We will see more on ‘atomics’ later
- CSRRW reads the previous value of the CSR and writes it to integer register **rd**. Then writes **rs1** to CSR
- Pseudoinstruction **csrw csr, rs1** is  
**csrrw x0, csr, rs1**
  - **rd=x0**, just writes **rs1** to CSR
- Pseudoinstruction **csrwi csr, uimm** is  
**csrrwi x0, csr, uimm**
  - **rd=x0**, just writes **uimm** to CSR
- Hint: Use write enable and clock...

# System Instructions

- **ecall** – (I-format) makes requests to supporting execution environment (OS), such as system calls (**syscalls**)
- **ebreak** – (I-format) used e.g. by debuggers to transfer control to a debugging environment

| 31            | 20 19 | 15 14       | 12 11 | 7 6           | 0 |
|---------------|-------|-------------|-------|---------------|---|
| funct12       | rs1   | funct3      | rd    | opcode        |   |
| 12            | 5     | 3           | 5     | 7             |   |
| <b>ECALL</b>  | 0     | <b>PRIV</b> | 0     | <b>SYSTEM</b> |   |
| <b>EBREAK</b> | 0     | <b>PRIV</b> | 0     |               |   |

- **fence** – sequences memory (and I/O) accesses as viewed by other threads or co-processors

# Datapath Control

# Our Single-Core Processor



# Single-Cycle RV32I Datapath and Control



# Example: sw



## Example: beq



# Instruction Timing

# Example: add



# Add Execution



# Example: add timing



$$\begin{aligned} \text{Critical path} &= t_{\text{clk-q}} + \max \{t_{\text{Add}} + t_{\text{mux}}, t_{\text{IMEM}} + t_{\text{Reg}} + t_{\text{mux}} + t_{\text{ALU}} + t_{\text{mux}}\} + t_{\text{setup}} \\ &= t_{\text{clk-q}} + t_{\text{IMEM}} + t_{\text{Reg}} + t_{\text{mux}} + t_{\text{ALU}} + t_{\text{mux}} + t_{\text{setup}} \end{aligned}$$

## Example: lw



Critical path =  $t_{clk-q} + \max \{t_{Add} + t_{mux}, t_{IMEM} + t_{Imm} + t_{mux} + t_{ALU} + t_{DMEM} + t_{mux}, t_{IMEM} + t_{Reg} + t_{mux} + t_{ALU} + t_{DMEM} + t_{mux}\} + t_{setup}$

# Instruction Timing



| IF     | ID       | EX     | MEM    | WB     | Total  |
|--------|----------|--------|--------|--------|--------|
| I-MEM  | Reg Read | ALU    | D-MEM  | Reg W  |        |
| 200 ps | 100 ps   | 200 ps | 200 ps | 100 ps | 800 ps |

# Instruction Timing

| Instr | IF = 200ps | ID = 100ps | ALU = 200ps | MEM=200ps | WB = 100ps | Total |
|-------|------------|------------|-------------|-----------|------------|-------|
| add   | X          | X          | X           |           | X          | 600ps |
| beq   | X          | X          | X           |           |            | 500ps |
| jal   | X          | X          | X           |           | X          | 600ps |
| lw    | X          | X          | X           | X         | X          | 800ps |
| sw    | X          | X          | X           | X         |            | 700ps |

- Maximum clock frequency
  - $f_{max} = 1/800\text{ps} = 1.25 \text{ GHz}$
- Most blocks idle most of the time
  - E.g.  $f_{max,ALU} = 1/200\text{ps} = 5 \text{ GHz!}$

# Control Logic Design



# Control Logic Truth Table

| Inst[31:0]      | BrEq | BrLT | PCSel | ImmSel | BrUn | ASel | BSel | ALUSel      | MemRW | RegWEn | WBSel |
|-----------------|------|------|-------|--------|------|------|------|-------------|-------|--------|-------|
| <b>add</b>      | *    | *    | +4    | *      | *    | Reg  | Reg  | Add         | Read  | 1      | ALU   |
| <b>sub</b>      | *    | *    | +4    | *      | *    | Reg  | Reg  | Sub         | Read  | 1      | ALU   |
| <i>(R-R Op)</i> | *    | *    | +4    | *      | *    | Reg  | Reg  | <i>(Op)</i> | Read  | 1      | ALU   |
| <b>addi</b>     | *    | *    | +4    | I      | *    | Reg  | Imm  | Add         | Read  | 1      | ALU   |
| <b>lw</b>       | *    | *    | +4    | I      | *    | Reg  | Imm  | Add         | Read  | 1      | Mem   |
| <b>sw</b>       | *    | *    | +4    | S      | *    | Reg  | Imm  | Add         | Write | 0      | *     |
| <b>beq</b>      | 0    | *    | +4    | B      | *    | PC   | Imm  | Add         | Read  | 0      | *     |
| <b>beq</b>      | 1    | *    | ALU   | B      | *    | PC   | Imm  | Add         | Read  | 0      | *     |
| <b>bne</b>      | 0    | *    | ALU   | B      | *    | PC   | Imm  | Add         | Read  | 0      | *     |
| <b>bne</b>      | 1    | *    | +4    | B      | *    | PC   | Imm  | Add         | Read  | 0      | *     |
| <b>blt</b>      | *    | 1    | ALU   | B      | 0    | PC   | Imm  | Add         | Read  | 0      | *     |
| <b>bltu</b>     | *    | 1    | ALU   | B      | 1    | PC   | Imm  | Add         | Read  | 0      | *     |
| <b>jalr</b>     | *    | *    | ALU   | I      | *    | Reg  | Imm  | Add         | Read  | 1      | PC+4  |
| <b>jal</b>      | *    | *    | ALU   | J      | *    | PC   | Imm  | Add         | Read  | 1      | PC+4  |
| <b>auipc</b>    | *    | *    | +4    | U      | *    | PC   | Imm  | Add         | Read  | 1      | ALU   |

# Control Realization Options

- ROM
  - "Read-Only Memory"
  - Regular structure
  - Can be easily reprogrammed
    - fix errors
    - add instructions
  - Popular when designing control logic manually
- Combinatorial Logic
  - Today, chip designers use logic synthesis tools to convert truth tables to networks of gates

# RV32I, A Nine-Bit ISA!

|                        |       |      |       |             |        |
|------------------------|-------|------|-------|-------------|--------|
| imm[31:12]             |       |      | rd    | 011011      | LUI    |
| imm[31:12]             |       |      | rd    | 001011      | AUIPC  |
| imm[20:10:11 11:19:12] |       |      | rd    | 110111      | JAL    |
| imm[11:0]              |       | rs1  | 000   | imm[4:1 11] | JALR   |
| imm[12 10:5]           | rs2   | rs1  | 001   | imm[4:1 11] | BEQ    |
| imm[12 10:5]           | rs2   | rs1  | 100   | imm[4:1 11] | BNE    |
| imm[12 10:5]           | rs2   | rs1  | 101   | imm[4:1 11] | BLT    |
| imm[12 10:5]           | rs2   | rs1  | 110   | imm[4:1 11] | BGE    |
| imm[12 10:5]           | rs2   | rs1  | 111   | imm[4:1 11] | BLTU   |
| imm[11:0]              |       | rs1  | 000   | rd          | BGEU   |
| imm[11:0]              |       | rs1  | 001   | rd          | LB     |
| imm[11:0]              |       | rs1  | 010   | rd          | LH     |
| imm[11:0]              |       | rs1  | 100   | rd          | LW     |
| imm[11:0]              |       | rs1  | 101   | rd          | LBU    |
| imm[11:0]              |       | rs1  | 000   | imm[4:0]    | LHU    |
| imm[11:5]              | rs2   | rs1  | 001   | imm[4:0]    | SB     |
| imm[11:5]              | rs2   | rs1  | 010   | imm[4:0]    | SH     |
| imm[11:5]              | rs2   | rs1  | 010   | imm[4:0]    | SW     |
| imm[11:0]              |       | rs1  | 000   | rd          | ADDI   |
| imm[11:0]              |       | rs1  | 010   | rd          | SLTI   |
| imm[11:0]              |       | rs1  | 011   | rd          | SLTIU  |
| imm[11:0]              |       | rs1  | 100   | rd          | XORI   |
| imm[11:0]              |       | rs1  | 110   | rd          | ORI    |
| imm[11:0]              |       | rs1  | 111   | rd          | ANDI   |
| 000000                 | shamt | rs1  | 001   | rd          | SLLI   |
| 000000                 | shamt | rs1  | 101   | rd          | SRLI   |
| 000000                 | shamt | rs1  | 101   | rd          | SRAI   |
| 000000                 | rs2   | rs1  | 000   | rd          | ADD    |
| 000000                 | rs2   | rs1  | 000   | rd          | SUB    |
| 000000                 | rs2   | rs1  | 001   | rd          | SU     |
| 000000                 | rs2   | rs1  | 010   | rd          | SLT    |
| 000000                 | rs2   | rs1  | 011   | rd          | SLTU   |
| 000000                 | rs2   | rs1  | 100   | rd          | XOR    |
| 000000                 | rs2   | rs1  | 101   | rd          | SRL    |
| 000000                 | rs2   | rs1  | 101   | rd          | SRA    |
| 000000                 | rs2   | rs1  | 110   | rd          | OR     |
| 000000                 | rs2   | rs1  | 111   | rd          | AND    |
| fm                     | pred  | succ | rs1   | 000         | FENCE  |
| 000000000000           |       |      | 00000 | 00000       | ECALL  |
| 000000000001           |       |      | 00000 | 00000       | EBREAK |

- Instruction type encoded using only 9 bits:
- inst[30], inst[14:12], inst[6:2]**

**inst[6:2]**

**inst[14:12]**

**inst[30]**

# ROM-based Control



# ROM Controller Implementation





# Combinational Logic Control

- Simplest example: **BrUn**

| inst[14:12]  |     |     | inst[6:2] |             |         |       |
|--------------|-----|-----|-----------|-------------|---------|-------|
| imm[12:10:5] | rs2 | rs1 | 000       | imm[4:1 11] | 11000 1 | B EQ  |
| imm[12:10:5] | rs2 | rs1 | 001       | imm[4:1 11] | 11000 1 | B NE  |
| imm[12:10:5] | rs2 | rs1 | 100       | imm[4:1 11] | 11000 1 | BLT   |
| imm[12:10:5] | rs2 | rs1 | 101       | imm[4:1 11] | 11000 1 | B GE  |
| imm[12:10:5] | rs2 | rs1 | 110       | imm[4:1 11] | 11000 1 | BLTU  |
| imm[12:10:5] | rs2 | rs1 | 111       | imm[4:1 11] | 11000 1 | B GEL |

- # How to decode whether BrUn is 1?

## **BrUn = Inst [13] • Branch**

# Control Logic to Decode add

| inst[30] | inst[14:12] | inst[6:2] |          |
|----------|-------------|-----------|----------|
| 000000   | shamt       | rs1       | 001 SLLI |
| 000000   | shamt       | rs1       | 101 SRLI |
| 100000   | shamt       | rs1       | 101 SRAI |
| 000000   | rs2         | rs1       | 000 ADD  |
| 100000   | rs2         | rs1       | 000 SUB  |
| 000000   | rs2         | rs1       | 001 SLL  |
| 000000   | rs2         | rs1       | 010 SLT  |
| 000000   | rs2         | rs1       | 011 SLTU |
| 000000   | rs2         | rs1       | 100 XOR  |
| 000000   | rs2         | rs1       | 101 SRL  |
| 100000   | rs2         | rs1       | 101 SRA  |
| 000000   | rs2         | rs1       | 110 OR   |
| 000000   | rs2         | rs1       | 111 AND  |

$$\text{add} = \overline{i[30]} \cdot \overline{i[14]} \cdot \overline{i[13]} \cdot \overline{i[12]} \cdot \text{R-type}$$

$$\text{R-type} = \overline{i[6]} \cdot \overline{i[5]} \cdot \overline{i[4]} \cdot \overline{i[3]} \cdot \overline{i[2]} \cdot \text{RV32I}$$

$$\text{RV32I} = i[1] \cdot i[0]$$

**“And In  
Conclusion...”**

# Call home, we've made HW/SW contact!

High Level Language  
Program (e.g., C)

| Compiler

Assembly Language  
Program (e.g., RISC-V)

| Assembler

Machine Language  
Program (RISC-V)

```
temp = v[k];  
v[k] = v[k+1];  
v[k+1] = temp;
```

```
lw    x3, 0(x10)  
lw    x4, 4(x10)  
sw    x4, 0(x10)  
sw    x3, 4(x10)
```

```
1000 1101 1110 0010 0000 0000 0000 0000 0000  
1000 1110 0001 0000 0000 0000 0000 0000 0100  
1010 1110 0001 0010 0000 0000 0000 0000 0000  
1010 1101 1110 0010 0000 0000 0000 0000 0100
```

Hardware Architecture Description  
(e.g., block diagrams)

| Architecture Implementation

Logic Circuit Description  
(Circuit Schematic Diagrams)





# “And In conclusion...”

- We have built a processor!
  - Capable of executing all RISC-V instructions in one cycle each
  - Not all units (hardware) used by all instructions
  - Critical path changes
- 5 Phases of execution
  - IF, ID, EX, MEM, WB
  - Not all instructions are active in all phases
- Controller specifies how to execute instructions
  - Implemented as ROM or logic