

# CSE 112: Computer Organization

---

Instructor: Sujay Deb

## Lecture 13



INDRAPRASTHA INSTITUTE of  
INFORMATION TECHNOLOGY  
**DELHI**



# **Single-Cycle RISC-V Processor**

# Example Program

- Design datapath
- View example program executing

## Example Program:

| Address | Instruction       | Type | Fields                              |              |              |             |                                |               | Machine Language |  |
|---------|-------------------|------|-------------------------------------|--------------|--------------|-------------|--------------------------------|---------------|------------------|--|
| 0x1000  | l7: lw x6, -4(x9) | I    | imm <sub>11:0</sub><br>111111111100 | rs1<br>01001 | f3<br>010    | rd<br>00110 | op<br>0000011                  | FFC4A303      |                  |  |
| 0x1004  | sw x6, 8(x9)      | S    | imm <sub>11:5</sub><br>0000000      | rs2<br>00110 | rs1<br>01001 | f3<br>010   | imm <sub>4:0</sub><br>01000    | op<br>0100011 | 0064A423         |  |
| 0x1008  | or x4, x5, x6     | R    | funct7<br>0000000                   | rs2<br>00110 | rs1<br>00101 | f3<br>110   | rd<br>00100                    | op<br>0110011 | 0062E233         |  |
| 0x100C  | beq x4, x4, l7    | B    | imm <sub>12,10:5</sub><br>1111111   | rs2<br>00100 | rs1<br>00100 | f3<br>000   | imm <sub>4:1,11</sub><br>10101 | op<br>1100011 | FE420AE3         |  |

# Single-Cycle Datapath: PC Increment

**STEP 6:** Determine address of next instruction



| Address | Instruction       | Type | Fields                                       | Machine Language                              |
|---------|-------------------|------|----------------------------------------------|-----------------------------------------------|
| 0x1000  | L7: lw x6, -4(x9) | I    | imm <sub>11:0</sub><br>rs1<br>f3<br>rd<br>op | 111111111100 01001 010 00110 0000011 FFC4A303 |

# **Single-Cycle Datapath: Other Instructions**

# Single-Cycle Datapath: sw

- **Immediate:** now in {instr[31:25], instr[11:7]}
- **Add control signals:** ImmSrc, MemWrite



| Address | Instruction   | Type | Fields                                     |                           |                           |                        |                                         | Machine Language           |          |
|---------|---------------|------|--------------------------------------------|---------------------------|---------------------------|------------------------|-----------------------------------------|----------------------------|----------|
| 0x1004  | sw x6, 8 (x9) | S    | <code>imm<sub>11:5</sub></code><br>0000000 | <code>rs2</code><br>00110 | <code>rs1</code><br>01001 | <code>f3</code><br>010 | <code>imm<sub>4:0</sub></code><br>01000 | <code>op</code><br>0100011 | 0064A423 |

# Single-Cycle Datapath: Immediate

| ImmSrc | ImmExt                                                            | Instruction Type |
|--------|-------------------------------------------------------------------|------------------|
| 0      | $\{{\{20\{instr[31]\}}}, \text{instr[31:20]}$                     | I-Type           |
| 1      | $\{{\{20\{instr[31]\}}}, \text{instr[31:25]}, \text{instr[11:7]}$ | S-Type           |

## I-Type



## S-Type



# Single-Cycle Datapath: R-type

- Read from **rs1** and **rs2** (instead of imm)
- Write *ALUResult* to **rd**



| Address | Instruction   | Type | Fields            |              |              |           |             |               |  | Machine Language |  |
|---------|---------------|------|-------------------|--------------|--------------|-----------|-------------|---------------|--|------------------|--|
| 0x1008  | or x4, x5, x6 | R    | funct7<br>0000000 | rs2<br>00110 | rs1<br>00101 | f3<br>110 | rd<br>00100 | op<br>0110011 |  | 0062E233         |  |

# Single-Cycle Datapath: beq

Calculate **target address**: PCTarget = PC + imm



| Address | Instruction    | Type | Fields                                                     | Machine Language                               |
|---------|----------------|------|------------------------------------------------------------|------------------------------------------------|
| 0x100C  | beq x4, x4, L7 | B    | imm <sub>12,10:5</sub> rs2 rs1 f3 imm <sub>4:1,11</sub> op | 1111111 00100 00100 000 10101 1100011 FE420AE3 |

# Single-Cycle Datapath: ImmExt

| ImmSrc <sub>1:0</sub> | ImmExt                                                                                                       | Instruction Type |
|-----------------------|--------------------------------------------------------------------------------------------------------------|------------------|
| 00                    | $\{{\{20\{instr[31]\}}}, \text{instr[31:20]}$                                                                | I-Type           |
| 01                    | $\{{\{20\{instr[31]\}}}, \text{instr[31:25]}, \text{instr[11:7]}$                                            | S-Type           |
| 10                    | $\{{\{19\{instr[31]\}}}, \text{instr[31]}, \text{instr[7]}, \text{instr[30:25]}, \text{instr[11:8]}, 1'b0\}$ | B-Type           |

I-Type



S-Type



B-Type



# Single-Cycle RISC-V Processor



# **Single-Cycle Control**

# Single-Cycle Control

High-Level View



Low-Level View



# Single-Cycle Control: Main Decoder

| op | Instr.     | RegWrite | ImmSrc | ALUSrc | MemWrite | ResultSrc | Branch | ALUOp |
|----|------------|----------|--------|--------|----------|-----------|--------|-------|
| 3  | <b>lw</b>  |          |        |        |          |           |        |       |
| 35 | <b>sw</b>  |          |        |        |          |           |        |       |
| 51 | R-type     |          |        |        |          |           |        |       |
| 99 | <b>beq</b> |          |        |        |          |           |        |       |



# Review: ALU

| ALUControl <sub>2:0</sub> | Function |
|---------------------------|----------|
| 000                       | add      |
| 001                       | subtract |
| 010                       | and      |
| 011                       | or       |
| 101                       | SLT      |



# Review: ALU

| $ALUControl_{2:0}$ | Function |
|--------------------|----------|
| 000                | add      |
| 001                | subtract |
| 010                | and      |
| 011                | or       |
| 101                | SLT      |



# Single-Cycle Control: ALU Decoder



# Single-Cycle Control: ALU Decoder

| $ALUOp$ | $funct3$ | $op_5, funct7_5$ | Instruction   | $ALUControl_{2:0}$  |
|---------|----------|------------------|---------------|---------------------|
| 00      | x        | x                | <b>lw, sw</b> | 000 (add)           |
| 01      | x        | x                | <b>beq</b>    | 001 (subtract)      |
| 10      | 000      | 00, 01, 10       | <b>add</b>    | 000 (add)           |
|         | 000      | 11               | <b>sub</b>    | 001 (subtract)      |
|         | 010      | x                | <b>slt</b>    | 101 (set less than) |
|         | 110      | x                | <b>or</b>     | 011 (or)            |
|         | 111      | x                | <b>and</b>    | 010 (and)           |



# Example: and

| op | Instruct | RegWrite | ImmSrc | ALUSrc | MemWrite | ResultSrc | Branch | ALUOp |
|----|----------|----------|--------|--------|----------|-----------|--------|-------|
| 51 | R-type   | 1        | XX     | 0      | 0        | 0         | 0      | 10    |



# **Extending the Single-Cycle Processor**

# Extended Functionality: I-Type ALU

Enhance the single-cycle processor to handle **I-Type ALU instructions**: addi, andi, ori, and slti

- Similar to R-type instructions
- But second source comes from immediate
- Change **ALUSrc** to select the immediate
- And **ImmSrc** to pick the correct immediate

# Extended Functionality: I-Type ALU

| op | Instruct.     | RegWrite | ImmSrc    | ALUSrc   | MemWrite | ResultSrc | Branch   | ALUOp     |
|----|---------------|----------|-----------|----------|----------|-----------|----------|-----------|
| 3  | <b>lw</b>     | 1        | 00        | 1        | 0        | 1         | 0        | 00        |
| 35 | <b>sw</b>     | 0        | 01        | 1        | 1        | X         | 0        | 00        |
| 51 | R-type        | 1        | <b>XX</b> | <b>0</b> | 0        | 0         | 0        | 10        |
| 99 | <b>beq</b>    | 0        | 10        | 0        | 0        | X         | 1        | 01        |
| 19 | <b>I-type</b> | 1        | <b>00</b> | <b>1</b> | <b>0</b> | <b>0</b>  | <b>0</b> | <b>10</b> |

# Extended Functionality: addi

| op | Instruct. | RegWrite | ImmSrc | ALUSrc | MemWrite | ResultSrc | Branch | ALUOp |
|----|-----------|----------|--------|--------|----------|-----------|--------|-------|
| 19 | I-type    | 1        | 00     | 1      | 0        | 0         | 0      | 10    |



# Extended Functionality: jal

Enhance the single-cycle processor to handle jal

- **Similar to beq**
- But jump is **always taken**
  - PCSrc should be 1
- **Immediate format** is different
  - Need a new *ImmSrc* of 11
- And jal must **compute PC+4** and **store in rd**
  - Take PC+4 from adder through ResultMux

# Extended Functionality: jal



# Extended Functionality: *ImmExt*

| ImmSrc <sub>1:0</sub> | ImmExt                                                                                                             | Instruction Type |
|-----------------------|--------------------------------------------------------------------------------------------------------------------|------------------|
| 00                    | $\{\{20\{\text{instr}[31]\}\}, \text{instr}[31:20]\}$                                                              | I-Type           |
| 01                    | $\{\{20\{\text{instr}[31]\}\}, \text{instr}[31:25], \text{instr}[11:7]\}$                                          | S-Type           |
| 10                    | $\{\{19\{\text{instr}[31]\}\}, \text{instr}[31], \text{instr}[7], \text{instr}[30:25], \text{instr}[11:8], 1'b0\}$ | B-Type           |
| 11                    | $\{\{12\{\text{instr}[31]\}\}, \text{instr}[19:12], \text{instr}[20], \text{instr}[30:21], 1'b0\}$                 | J-Type           |



# Extended Functionality: jal

| op  | Instruct.  | RegWrite | ImmSrc | ALUSrc | MemWrite | ResultSrc | Branch | ALUOp | Jump |
|-----|------------|----------|--------|--------|----------|-----------|--------|-------|------|
| 3   | <b>lw</b>  | 1        | 00     | 1      | 0        | 01        | 0      | 00    | 0    |
| 35  | <b>sw</b>  | 0        | 01     | 1      | 1        | XX        | 0      | 00    | 0    |
| 51  | R-type     | 1        | XX     | 0      | 0        | 00        | 0      | 10    | 0    |
| 99  | <b>beq</b> | 0        | 10     | 0      | 0        | XX        | 1      | 01    | 0    |
| 19  | I-type     | 1        | 00     | 1      | 0        | 00        | 0      | 10    | 0    |
| 111 | <b>jal</b> | 1        | 11     | X      | 0        | 10        | 0      | XX    | 1    |

# Extended Functionality: jal

| op  | Instruct. | RegWrite | ImmSrc | ALUSrc | MemWrite | ResultSrc | Branch | ALUOp | Jump |
|-----|-----------|----------|--------|--------|----------|-----------|--------|-------|------|
| 111 | jal       | 1        | 11     | X      | 0        | 10        | 0      | XX    | 1    |



# Processor Performance

## Program Execution Time

$$\begin{aligned} &= (\text{\#instructions})(\text{cycles/instruction})(\text{seconds/cycle}) \\ &= \# \text{ instructions} \times \text{CPI} \times T_C \end{aligned}$$

# Single-Cycle Processor Performance



$T_C$  limited by critical path (1w)

# Single-Cycle Processor Performance

- Single-cycle critical path:

$$T_{c\_single} = t_{pcq\_PC} + t_{mem} + \max[t_{RFread}, t_{dec} + t_{ext} + t_{mux}] + t_{ALU} + t_{mem} + t_{mux} + t_{RFsetup}$$

- Typically, limiting paths are:

- memory, ALU, register file

- So, 
$$\begin{aligned} T_{c\_single} &= t_{pcq\_PC} + t_{mem} + t_{RFread} + t_{ALU} + t_{mem} + t_{mux} + t_{RFsetup} \\ &= t_{pcq\_PC} + 2t_{mem} + t_{RFread} + t_{ALU} + t_{mux} + t_{RFsetup} \end{aligned}$$

# Single-Cycle Performance Example

| Element                | Parameter            | Delay (ps) |
|------------------------|----------------------|------------|
| Register clock-to-Q    | $t_{pcq\_PC}$        | 40         |
| Register setup         | $t_{\text{setup}}$   | 50         |
| Multiplexer            | $t_{\text{mux}}$     | 30         |
| AND-OR gate            | $t_{\text{AND-OR}}$  | 20         |
| ALU                    | $t_{\text{ALU}}$     | 120        |
| Decoder (Control Unit) | $t_{\text{dec}}$     | 25         |
| Extend unit            | $t_{\text{ext}}$     | 35         |
| Memory read            | $t_{\text{mem}}$     | 200        |
| Register file read     | $t_{RF\text{read}}$  | 100        |
| Register file setup    | $t_{RF\text{setup}}$ | 60         |

$$\begin{aligned}T_{c\_single} &= t_{pcq\_PC} + 2t_{\text{mem}} + t_{RF\text{read}} + t_{\text{ALU}} + t_{\text{mux}} + t_{RF\text{setup}} \\&= (40 + 2*200 + 100 + 120 + 30 + 60) \text{ ps} = \mathbf{750 \text{ ps}}\end{aligned}$$

# Single-Cycle Performance Example

Program with 100 billion instructions:

$$\begin{aligned}\text{Execution Time} &= \# \text{ instructions} \times \text{CPI} \times T_C \\ &= (100 \times 10^9)(1)(750 \times 10^{-12} \text{ s}) \\ &= \text{75 seconds}\end{aligned}$$

# Class Interaction # 15

---

