

# Controller



Control Logic “Truth Table” (incomplete)

| Inst[31:0] | BrEq | BrLT | PCSel | ImmSel | BrUn | ASel | BSel | ALUSel | MemRW | RegWEn | WBSEL | *                                   |
|------------|------|------|-------|--------|------|------|------|--------|-------|--------|-------|-------------------------------------|
| add        | *    | *    | +4    | -      | -    | Reg  | Reg  | Add    | Read  | 1      | ALU   | * means “for all values”            |
| sub        | *    | *    | +4    | -      | -    | Reg  | Reg  | Sub    | Read  | 1      | ALU   | - means “don’t care, use any value” |
| (R-R Op)   | *    | *    | +4    | -      | -    | Reg  | Reg  | (Op)   | Read  | 1      | ALU   |                                     |
| addi       | *    | *    | +4    | I      | -    | Reg  | Imm  | Add    | Read  | 1      | ALU   |                                     |
| lw         | *    | *    | +4    | I      | -    | Reg  | Imm  | Add    | Read  | 1      | Mem   |                                     |
| sw         | *    | *    | +4    | S      | -    | Reg  | Imm  | Add    | Write | 0      | -     |                                     |
| beq        | 0    | *    | +4    | B      | -    | PC   | Imm  | Add    | Read  | 0      | -     |                                     |
| beq        | 1    | *    | ALU   | B      | -    | PC   | Imm  | Add    | Read  | 0      | -     |                                     |
| bne        | 0    | *    | ALU   | B      | -    | PC   | Imm  | Add    | Read  | 0      | -     |                                     |
| bne        | 1    | *    | +4    | B      | -    | PC   | Imm  | Add    | Read  | 0      | -     |                                     |
| blt        | *    | 1    | ALU   | B      | 0    | PC   | Imm  | Add    | Read  | 0      | -     |                                     |
| bltu       | *    | 1    | ALU   | B      | 1    | PC   | Imm  | Add    | Read  | 0      | -     |                                     |
| jalr       | *    | *    | ALU   | I      | -    | Reg  | Imm  | Add    | Read  | 1      | PC+4  |                                     |
| jal        | *    | *    | ALU   | J      | -    | PC   | Imm  | Add    | Read  | 1      | PC+4  |                                     |
| auipc      | *    | *    | +4    | U      | -    | PC   | Imm  | Add    | Read  | 1      | ALU   |                                     |

Note: Instruction type encoded using only 9 bits  
 inst[30], inst[14:12], inst[6:2]

|              | imm[31:12]          |     | rd  | 0110111     | inst[30] | inst[14:12]  | inst[6:2] |
|--------------|---------------------|-----|-----|-------------|----------|--------------|-----------|
|              | imm[31:12]          |     | rd  | 0010111     |          |              |           |
|              | imm[20:10 11 19:12] |     | rd  | 1101111     |          |              |           |
|              | imm[11:0]           | rs1 | 000 | 1100111     | LUI      |              |           |
| imm[12:10:5] | rs2                 | rs1 | 000 | imm[4:1 11] | AUIPC    | shamt        | 00        |
| imm[12:10:5] | rs2                 | rs1 | 001 | imm[4:1 11] | JAL      | shamt        | 101       |
| imm[12:10:5] | rs2                 | rs1 | 100 | imm[4:1 11] | JALR     | shamt        | 101       |
| imm[12:10:5] | rs2                 | rs1 | 101 | imm[4:1 11] | BEQ      | rs2          | 000       |
| imm[12:10:5] | rs2                 | rs1 | 110 | imm[4:1 11] | BNE      | rs2          | 000       |
| imm[12:10:5] | rs2                 | rs1 | 111 | imm[4:1 11] | BLT      | rs2          | 001       |
| imm[12:10:5] | rs2                 | rs1 | 000 | imm[4:1 11] | BGE      | rs2          | 010       |
| imm[12:10:5] | rs2                 | rs1 | 001 | imm[4:1 11] | BLTU     | rs2          | 011       |
| imm[12:10:5] | rs2                 | rs1 | 100 | imm[4:1 11] | BGEU     | rs2          | 100       |
| imm[12:10:5] | rs2                 | rs1 | 101 | imm[4:1 11] | LB       | rs2          | 101       |
| imm[11:0]    | rs1                 | 000 | rd  | 0000011     | LH       | rs2          | 101       |
| imm[11:0]    | rs1                 | 001 | rd  | 0000011     | LW       | rs2          | 110       |
| imm[11:0]    | rs1                 | 010 | rd  | 0000011     | LBU      | rs2          | 111       |
| imm[11:0]    | rs1                 | 100 | rd  | 0000011     | LHU      | rs1          | rd        |
| imm[11:0]    | rs1                 | 101 | rd  | 0000011     | SB       | pred         | 00000     |
| imm[11:5]    | rs2                 | rs1 | 000 | imm[4:0]    | 0100011  | succ         | 000       |
| imm[11:5]    | rs2                 | rs1 | 001 | imm[4:0]    | 0100011  | 00000        | 001       |
| imm[11:5]    | rs2                 | rs1 | 010 | imm[4:0]    | 0100011  | 00000        | 00000     |
| imm[11:0]    | rs1                 | 000 | rd  | 0010011     | SH       | 000000000000 | 0001111   |
| imm[11:0]    | rs1                 | 010 | rd  | 0010011     | SW       | 000000000001 | 0000000   |
| imm[11:0]    | rs1                 | 011 | rd  | 0010011     | ADDI     | csr          | rs1       |
| imm[11:0]    | rs1                 | 100 | rd  | 0010011     | SLTI     | csr          | 011       |
| imm[11:0]    | rs1                 | 110 | rd  | 0010011     | SLTIU    | zimm         | 101       |
| imm[11:0]    | rs1                 | 111 | rd  | 0010011     | XORI     | zimm         | 110       |
| imm[11:0]    | rs1                 | 000 | rd  | 0010011     | ORI      | zimm         | 111       |
| imm[11:0]    | rs1                 | 001 | rd  | 0010011     | ANDI     | zimm         | rd        |

## Control Block Design



# Controller Realization Options

- ROM (Read-Only Memory)
  - Regular structure made from transistors
  - Can be easily reprogrammed during the design process to
    - fix errors
    - add instructions
  - Popular when designing control logic manually
- Combinatorial Logic
  - Today, chip designers often use logic synthesis tools to convert truth tables to networks of gates
  - Logic equation for each control signal (common sub-expressions shared among control signal equations)
  - Can exploit output “don’t cares” and input “for all values” to simplify circuit.

## ROM (read only memory) Controller Implementation



# Instruction Timing

# Typical Approximate Worst-Case Instruction Timing



- How can we keep **Data-Path** resources (ALU ,etc.) busy all the time.

# Performance Measures

$$\frac{\text{time}}{\text{program}} = \frac{\text{instructions}}{\text{program}} \cdot \frac{\text{cycles}}{\text{instruction}} \cdot \frac{\text{time}}{\text{cycle}}$$

## Instructions per Program

- Determined by

$$\frac{\text{time}}{\text{program}} = \frac{\text{instructions}}{\boxed{\text{program}}} \cdot \frac{\text{cycles}}{\text{instruction}} \cdot \frac{\text{time}}{\text{cycle}}$$

- Task specification
- Algorithm, e.g.  $O(N^2)$  vs  $O(N)$
- Programming language
- Compiler
- Instruction Set Architecture (ISA)

## (Average) Clock cycles per Instruction

- Determined by

$$\frac{\text{time}}{\text{program}} = \frac{\text{instructions}}{\text{program}} \cdot \frac{\boxed{\text{cycles}}}{\text{instruction}} \cdot \frac{\text{time}}{\text{cycle}}$$

- ISA (CISC versus RISC)
- Processor implementation (or ***microarchitecture***)
  - E.g. for “our” single-cycle RISC-V design, CPI = 1
- Pipelined processors, CPI > 1 (next lecture)
- Superscalar processors, CPI < 1 (next lecture)

# Time per Cycle (1/Frequency)

$$\frac{\text{time}}{\text{program}} = \frac{\text{instructions}}{\text{program}} \cdot \frac{\text{cycles}}{\text{instruction}} \cdot \frac{\text{time}}{\text{cycle}}$$

- Determined by
  - Processor microarchitecture (determines critical path through logic gates)
  - Technology (e.g. 5nm versus 14nm)
  - Supply voltage (lower voltage reduces transistor speed, but improves energy efficiency)
- For some task (e.g. image compression) ...

|                | Processor A | Processor B |
|----------------|-------------|-------------|
| # Instructions | 1 Million   | 1.5 Million |
| Average CPI    | 2.5         | 1           |
| Clock rate $f$ | 2.5 GHz     | 2 GHz       |
| Execution time | 1 ms        | 0.75 ms     |

Processor B is faster for this task, despite executing more instructions and having a lower clock rate!

$$\frac{\text{energy}}{\text{program}} = \frac{\text{instructions}}{\text{program}} \cdot \frac{\text{energy}}{\text{instruction}}$$

$$\frac{\text{energy}}{\text{program}} \propto \frac{\text{instructions}}{\text{program}} \cdot CV_{dd}^2$$

“Capacitance” depends on technology, microarchitecture, circuit details

Supply voltage, e.g. 1V

Want to reduce capacitance and voltage to reduce energy/task

## Energy Tradeoff Example

- For instance, “Next-generation” processor (Moore’s law):
  - Capacitance, C: reduced by 15 %
  - Supply voltage, V<sub>sup</sub>: reduced by 15 %
  - Energy consumption: (.85C)(.85V)<sup>2</sup> = .63E => **-39 % reduction**
- Significantly improved energy efficiency thanks to
  - Moore’s Law **AND**
  - Reduced supply voltage

## Pipelining

- Option 1: **serial**



- Option 2: **pipelining**



- **Latency**

- Time from entering college to graduation
- Serial                                                  4 years
- Pipelining                                              4 years

- **Throughput**

- Average number of students graduating each year
- Serial                                                  1000
- Pipelining                                              4000

- **Pipelining**

- Increases throughput (4x in this example)
- But can **never improve** latency
  - sometimes worse (additional overhead)

# Pipelining with RISC-V

| Phase             | Pictogram | $t_{step}$ | Serial | $t_{cycle}$ | Pipelined |
|-------------------|-----------|------------|--------|-------------|-----------|
| Instruction Fetch | IM        | 200 ps     |        | 200 ps      |           |
| Reg Read          | Reg       | 100 ps     |        | 200 ps      |           |
| ALU               | ALU       | 200 ps     |        | 200 ps      |           |
| Memory            | DM        | 200 ps     |        | 200 ps      |           |
| Register Write    | Reg       | 100 ps     |        | 200 ps      |           |
| $t_{instruction}$ |           |            | 800 ps |             | 1000 ps   |

instruction sequence ↓

add t0, t1, t2  
or t3, t4, t5  
sll t6, t0, t3

berkeley EECS

# Pipelining with RISC-V

|                                     | Single Cycle                                                           | Pipelining                                             |
|-------------------------------------|------------------------------------------------------------------------|--------------------------------------------------------|
| Timing                              | $t_{step} = 100 \dots 200 \text{ ps}$<br>(Register access only 100 ps) | $t_{cycle} = 200 \text{ ps}$<br>All cycles same length |
| Instruction time, $t_{instruction}$ | $= t_{cycle} = 800 \text{ ps}$                                         | 1000 ps                                                |
| Clock rate, $f_s$                   | $1/800 \text{ ps} = 1.25 \text{ GHz}$                                  | $1/200 \text{ ps} = 5 \text{ GHz}$                     |
| Relative speed                      | 1 x                                                                    | 4 x                                                    |

# RISC-V Pipeline



## Pipelining Datapath

# Pipelining RISC-V RV32I Datapath



# Pipelined Control

- Control signals derived from instruction
  - As in single-cycle implementation
  - Information is stored in pipeline registers for use by later stages

