

# Chapter 4 The processor: Datapath & Control

- Two implementations of RISC-V ISA
  - Single cycle implementation
    - Datapath & Control
  - Pipeline implementation
    - Datapath & Control
    - Data dependence/Hazard
- Consider only the core of RISC-V inst. Set
  - memory-reference inst. ld, sd
  - ALU inst. add, sub, and, or -- R-type
  - branch inst. beq, jal

# The Performance Perspective

- Performance of a machine is determined by:

- Instruction count
  - cycle time
  - CPI



- Processor design (datapath and control) will affect:

- cycle time
  - CPI

- Single cycle processor:

- A simple start to understand more complicated pipeline implementation

# Functional Units for Implementing instructions



## RISC-V ALU Functions

| ALU control | Function |
|-------------|----------|
| 0000        | AND      |
| 0001        | OR       |
| 0010        | add      |
| 0110        | subtract |



Zero flag:

# Basic Idea of Datapath Design

- Design a datapath for each type of instruction
  - R
  - ld
  - sd
    - Combine ld and sd
    - beq
- Combine all
- Immediate Generation Unit

## Datapath for R-type

- Example: add x3, x4, x5
- Basic steps
  - fetch instruction
  - select registers (rs, rt),
  - ALU operation on two data, need ALU
  - write back registers

# Single-Cycle: Datapath for R-type



## Datapath for Load Word

- Basic steps: example: `ld x3 , 64(x4)`
  - fetch instruction
  - select a register (rs1)
  - calculate address, need ALU
  - access memory (read memory)
  - write register file (rd)

## Datapath for Store Word

- Basic steps: example: `sd x3 , 64(x4)`
  - fetch instruction
  - select two registers (rs1, rs2)
  - calculate address, need ALU
  - access memory (write memory)

## Combining datapaths of ld & sd

- Memory address
  - Ld: in funct7 and rs2
  - sw: in funct7 and rd
- Register write control signal.

## Datapath for beq

- Basic steps: beq x1, x2, offset
  - fetch the instruction
  - select registers
  - test condition, calculate branch address (need additional ALU)
- ALU for branch condition
- Adder for branch address
- Zero: control logic to decide if branch.

## Creating a Single Datapath for All Inst.

- Combine datapaths for R-type, memory inst. and branch inst.
  - using multiplexers
  - without duplicating common functional units.
- As an example, combine R-type and memory type datapaths
- Final goal: combine all types of datapaths



R-type data path



Memory type datapath

Key Differences:

2nd input to ALU

Value written into register file

# Simple Datapath for Single Cycle



Can execute basic instructions in a single clock cycle

No resource can be used more than once during a single cycle



# Single-Cycle: Control

- Main control:
  - input: 7-bit from opcode
  - output: 9+ control lines
- ALU control:
  - input: ALUop + 4-bit (funct7 + funct3)
  - output: 4 lines
  - for non R type, ALU control depends on only ALUop
- Control for imm gen ?



# Control



| Instruction | RegWrt | Memto-Reg | Brch | Mem Read | Mem Write | ALUSrc | ALUOp1 | ALUp0 | Gen |
|-------------|--------|-----------|------|----------|-----------|--------|--------|-------|-----|
| R-format    |        |           |      |          |           |        |        |       |     |
| ld          |        |           |      |          |           |        |        |       |     |
| sd          |        |           |      |          |           |        |        |       |     |
| beq         |        |           |      |          |           |        |        |       |     |

# Main Control Unit -- Definitions of Control Signals

| Signal Name | Effect when deasserted          | Effect when asserted         |
|-------------|---------------------------------|------------------------------|
| MemRead     | None                            | Data put on read dataoutput  |
| MemWrite    | None                            | Write data into memory       |
| RegWrite    | None                            | Write register               |
| ALUSrc      | 2nd ALUinput from register file | 2nd ALUinput from imm gen    |
| PCSrc       | $PC = PC + 4$                   | $PC = \text{branch address}$ |
| MemtoReg    | result of ALU is sent           | data in memory is sent       |
|             |                                 |                              |

$PCsrc = \text{branch AND zero}$

# Single cycle control



## ALU Control

- ALU used for
  - Load/Store:  $F = \text{add}$
  - Branch:  $F = \text{subtract}$
  - R-type:  $F$  depends on opcode

| ALU control | Function |
|-------------|----------|
| 0000        | AND      |
| 0001        | OR       |
| 0010        | add      |
| 0110        | subtract |

# ALU Control

- Assume 2-bit ALUOp derived from opcode
  - Combinational logic derives ALU control

| <b>opcode</b> | <b>ALUOp</b> | <b>Operation</b> | <b>Funct7+funct3</b>        | <b>ALU action</b> | <b>ALU control</b> |
|---------------|--------------|------------------|-----------------------------|-------------------|--------------------|
| ld            | 00           | load register    | XXXXXXXX XXX                | add               | 0010               |
| sd            | 00           | store register   | XXXXXXXX XXX                | add               | 0010               |
| beq           | 01           | branch on equal  | XXXXXXXX XXX                | subtract          | 0110               |
| R-type        | 10           | add              | 0 <u>1</u> 00000 <u>000</u> | add               | 0010               |
|               |              | subtract         | 0 <u>1</u> 00010 <u>000</u> | subtract          | 0110               |
|               |              | AND              | 0 <u>1</u> 00100 <u>111</u> | AND               | 0000               |
|               |              | OR               | 0 <u>1</u> 00101 <u>110</u> | OR                | 0001               |

# ALU Control Truth Table

|    | <b>ALUOp</b> | <b>Inst[30][14:12]</b> | <b>ALU control output</b> |
|----|--------------|------------------------|---------------------------|
|    | 00           | X XXX                  | 0010                      |
|    | 00           | X XXX                  | 0010                      |
|    | 01           | X XXX                  | 0110                      |
| 10 | 10           | <u>1</u> <u>000</u>    | 0010                      |
|    |              | <u>1</u> <u>000</u>    | 0110                      |
|    |              | <u>1</u> <u>111</u>    | 0000                      |
|    |              | <u>1</u> <u>110</u>    | 0001                      |

# Datapath With Control



# R-Type Instruction



# Load Instruction



# BEQ Instruction



# Add jal to Datapath



| Instruction | RegWrt | Memto-Reg | Brch | Mem Read | Mem Write | ALUSrc | ALUOp1 | ALUp0 | Gen |
|-------------|--------|-----------|------|----------|-----------|--------|--------|-------|-----|
| R-format    |        |           |      |          |           |        |        |       |     |
| ld          |        |           |      |          |           |        |        |       |     |
| sd          |        |           |      |          |           |        |        |       |     |
| beq         |        |           |      |          |           |        |        |       |     |

## Add jal to the datapath

- Operation 1:  $\text{PC} = \text{jump address}$ 
  - Assume jump address is formed by imm gen
- Operation 2:  $x \text{ rd} = \text{PC}$

## Discussion on imm gen

- Combinational circuit
- Input: the whole instruction[31:0]
- Four possible outputs
  - Immediate for ld
  - Immediate for sd
  - Immediate for beq
  - Immediate for jal

## Single-Cycle: Summary

- How to construct datapath for R-type, ld/sd, beq, jal
- How to construct the control
  - ALU control
  - main control
  - imm gen
- How the datapath + control to execute instruction

# Execution (cycle) Time Analysis

Arithmetic & Logical



Load



*Critical Path*

Store



Branch



- Cycle time  $\geq$  the critical path length
- Long Cycle Time
  - All instructions take as much time as the slowest

Exercise: Can MemtoReg be eliminated and replaced by MemRead?

|            | Brnch | MemRd | MemtoRg | ALUop | MemWrt | ALUSrc | RegWrt | gen |
|------------|-------|-------|---------|-------|--------|--------|--------|-----|
| <i>ld</i>  | 0     | 1     | 1       | 00    | 0      | 1      | 1      |     |
| <i>sd</i>  | 0     | 0     | x       | 00    | 1      | 1      | 0      |     |
| <i>R</i>   | 0     | 0     | 0       | 10    | 0      | 0      | 1      |     |
| <i>beg</i> | 1     | 0     | x       | 01    | 0      | 0      | 0      |     |
| <i>Jal</i> | 0     | 0     | x       | xx    | 0      | x      | 0      |     |
|            | B     |       |         |       |        |        |        |     |

Exercise: What would be the cycle time for the following cases?

- Assume
  - ALU delay=2ns,
  - Adder (PC+4) delay=x ns,
  - Adder (branch address) delay = y ns
- X=3, y=3
- x=5, y=5
- x=1, y=8