

# RISC Design

---

Virendra Singh

Professor

Computer Architecture and Dependable Systems Lab

Department of Electrical Engineering, and  
Dept. of Computer Science & Engineering

Indian Institute of Technology Bombay

<http://www.ee.iitb.ac.in/~viren/>

E-mail: [viren@{ee, cse}.iitb.ac.in](mailto:viren@{ee, cse}.iitb.ac.in)

*EE-739: Processor Design @ IITB*

---



Lecture 13 (11 February 2022)

CADSL

# RISC Design

ISA

Simple instruction

& Fixed encoding ✓

(Boundary)

$S \xrightarrow{I} S'$

State transition:

Content of  
memory  
+  
contents of  
programmer's  
register

Combinational logic.

O.T + HKT.

Instruction set & DLX.

32 bit-formal

3 formats  $\begin{cases} R \\ I \\ J \end{cases}$



# Overview of DLX ISA

---

- ❖ simple instructions, all 32 bits wide
- ❖ very structured, no unnecessary baggage
- ❖ only three instruction formats

|   |    |                |     |                     |       |  |
|---|----|----------------|-----|---------------------|-------|--|
| R | op | rs1            | rs2 | rd                  | funct |  |
| I | op | rs1            | rd  | 16 bit address/data |       |  |
| J | op | 26 bit address |     |                     |       |  |

- ❖ rely on compiler to achieve performance



# Example Instruction Set: MIPS Subset

---

## MIPS Instruction – Subset

- ❖ Arithmetic and Logical Instructions (R type)
    - add, sub, or, and, slt Register Add. mod.
  - ❖ Memory reference Instructions (I-type.)
    - lw, sw → B+D
  - ❖ Branch
    - beq, j
      - (I) ↓ ↑ (J)
- 





# Overview of MIPS

---

- ❖ simple instructions, all 32 bits wide
- ❖ very structured, no unnecessary baggage
- ❖ only three instruction formats

|   |    |                |     |                     |      |       |  |
|---|----|----------------|-----|---------------------|------|-------|--|
| R | op | rs1            | rs2 | rd                  | shmt | funct |  |
| I | op | rs1            | rd  | 16 bit address/data |      |       |  |
| J | op | 26 bit address |     |                     |      |       |  |

- ❖ rely on compiler to achieve performance



# Datapath and Control

---

- **Datapath:** Memory, registers, adders, ALU, and communication buses. Each step (fetch, decode, execute) requires communication (data transfer) paths between memory, registers and ALU.
- **Control:** Datapath for each step is set up by control signals that set up dataflow directions on communication buses and select ALU and memory functions. Control signals are generated by a control unit consisting of one or more finite-state machines.



# Datapath

A LU add | sub | and | or | shift



Programmer's Register → Set of registers  
Register File.



# Datapath for Instruction Fetch



# Register File: A Datapath Component

---





# Multi-Operation ALU

| Operation select | ALU function     |
|------------------|------------------|
| 000              | AND              |
| 001              | OR               |
| 010              | Add              |
| 110              | Subtract         |
| 111              | Set on less than |



**zero = 1, when all bits of result are 0**

# R-Type Instructions

*add r<sub>1</sub>, r<sub>2</sub>, r<sub>3</sub>*  
A/L

- Also known as arithmetic-logical instructions



- Example: add \$t0, \$s1, \$s2

- Machine instruction word

000000 10001 10010 01000 00000 100000

opcode    \$s1    \$s2    \$t0                function

- Read two registers

- Write one register

- Opcode and function code go to control unit that generates RegWrite and ALU operation code.





✓ HKT. + OP  
 Instructions should  
 be fetch  
 Understand  
 Read operands  
 Execute  
 Update the state



# R-Type Instructions

HFC



$$\begin{array}{l} \text{add } r_1, r_2, r_3 \\ r_1 = r_2 + r_3 \end{array}$$

$$\underline{\text{I\_mem\_data}} = \underline{\text{base}}$$



# Datapath for R-Type Instruction



# Load and Store Instructions

- I-type instructions

- Iw

\$t0, 1200 (\$t1)

100011 01001 01000 0000 0100 1011 0000  
opcode \$t1 \$t0 1200

# incr. in bytes

- sw \$t0, 1200 (\$t1) # incr. in bytes

101011 01001 01000 0000 0100 1011 0000  
opcode \$t1 \$t0 1200

$$S_0 = M[r_1 + 200]$$

load \$t0, (\$t1) 200



# Load and Store Instructions

(LOAD)



# Datapath for lw Instruction

100011 01001 01000 0000 0100 1011 0000  
 opcode \$t1 \$t0 1200



# Load and Store Instructions



# Datapath for sw Instruction

101011 01001 01000 0000 0100 1011 0000  
 opcode \$t1 \$t0 1200



# Branch Instruction (I-Type)

100  $\frac{100+4+25 \times 4}{4} = 100+4+100=204$  104  
• beq  $\$s1, \$s2, 25$  # if  $\$s1 = \$s2$ ,  
advance PC through  
25 instructions

beq  $r_1, r_2, \#loc$



Note: Can branch within  $\pm 2^{15}$  words from the current instruction address in PC.



# Branch Instructions



# Datapath for beq Instruction



# Branch Instructions

---



# ✓ J-Type Instruction

PC+4 + hmmm26xy

✓ OPERATION

hmmm26

- j 2500 # jump to instruction 2,500

26-bits

000010 0000 0000 0000 0010 0111 0001 00

opcode

2,500

32-bit jump address

0000 0000 0000 0010 0111 0001 0000

bits 28-31 from PC+4



$PC \rightarrow I\text{-Mem-add}$

$PC \rightarrow alu1\text{-A}$

$\phi i_4 \rightarrow alu1\text{-B}$

—

~~Inst.~~

$alu1\text{-C} \rightarrow alu3\text{-A}$

$Inst_{25-0} \rightarrow SE26 \xrightarrow{2S} alu3\text{-B.}$

$alu3\text{-C} \rightarrow PC$



# Datapath for Jump Instruction





(D3)

alu2-C → D3 ]  
D-mem-data → D3 ]



# Control Logic



# Control Logic: Truth Table

| Instr type | Inputs: instr. opcode bits |    |    |    |    |    | Outputs: control signals |      |        |          |          |        |   |   | ALUOp2 | ALUOp1 |
|------------|----------------------------|----|----|----|----|----|--------------------------|------|--------|----------|----------|--------|---|---|--------|--------|
|            | 31                         | 30 | 29 | 28 | 27 | 26 | RegDst                   | Jump | ALUSrc | MemtoReg | MemWrite | Branch |   |   |        |        |
| R          | 0                          | 0  | 0  | 0  | 0  | 0  | 1                        | 0    | 0      | 0        | 1        | 0      | 0 | 0 | 1      | 0      |
| lw         | 1                          | 0  | 0  | 0  | 1  | 1  | 0                        | 0    | 1      | 1        | 1        | 1      | 0 | 0 | 0      | 0      |
| sw         | 1                          | 0  | 1  | 0  | 1  | 1  | X                        | 0    | 1      | X        | 0        | 0      | 1 | 0 | 0      | 0      |
| beq        | 0                          | 0  | 0  | 1  | 0  | 0  | X                        | 0    | 0      | X        | 0        | 0      | 0 | 1 | 0      | 1      |
| j          | 0                          | 0  | 0  | 0  | 1  | 0  | X                        | 1    | X      | X        | X        | X      | X | X | X      | X      |



# How Long Does It Take?

- Assume control logic is fast and does not affect the critical timing. Major time delay components are ALU, memory read/write, and register read/write.
  - Arithmetic-type (R-type)
    - Fetch (memory read)
    - Register read
    - ALU operation
    - Register write
    - Total
- 
- A hand-drawn timing diagram illustrating the execution times of various components for an R-type instruction. The diagram shows a vertical timeline with several time intervals labeled in nanoseconds (ns). A red bracket groups these intervals, and a red oval at the bottom encloses the total duration. Red annotations with arrows point to specific parts of the diagram:
- An arrow points from the label "I-Mem" to the first interval of 2ns.
  - An arrow points from the label "Memory Access" to the second interval of 1ns.
  - An arrow points from the label "RF access" to the third interval of 2ns.
  - An arrow points from the label "ALU" to the fourth interval of 1ns.
  - An arrow points from the label "1-ns" to the fifth interval of 2ns.
  - An arrow points from the label "2-ns" to the bottom of the red oval.
- | Component / Stage | Time (ns) |
|-------------------|-----------|
| I-Mem             | 2ns       |
| Memory Access     | 1ns       |
| RF access         | 2ns       |
| ALU               | 1ns       |
| 1-ns              | 2ns       |
| Total             | 6ns       |



# Time for lw and sw (I-Types)

- ALU (R-type)
- Load word (I-type)



- Fetch (memory read)
  - Register read
  - ALU operation
  - Get data (mem. Read)
  - Register write
  - Total
- Store word (no register write) 7ns ✓



# Time for beq (I-Type)

beq

- ALU (R-type) 6ns
- Load word (I-type) 8ns
- Store word (I-type) 7ns
- Branch on equal (I-type)
  - Fetch (memory read) 2ns
  - Register read 1ns
  - ALU operation 2ns
  - Total 5ns

?  
 $\tau_1 + \tau_2 = \tau_3$



# Time for Jump (J-Type)

- ALU (R-type)
- Load word (I-type)
- Store word (I-type)
- Branch on equal (I-type)
- Jump (J-type)
  - Fetch (memory read)
  - Total

|     |          |     |
|-----|----------|-----|
| 6ns | 50.      | 2   |
| 8ns | ✓ 10%    |     |
| 7ns |          | 1   |
| 5ns |          | 3   |
| 2ns | + 2 ns J | 41  |
| 2ns |          | 4ns |

195 MHz



# How Fast Can the Clock Be?

---

- If every instruction is executed in one clock cycle, then:
  - Clock period must be at least 8ns to perform the longest instruction, i.e.,  $lw$ . ✓
  - This is a single cycle machine ( $CPI = 1$ )
  - It is slower because many instructions take less than 8ns but are still allowed that much time.
- Method of speeding up: Use multicycle datapath. ↵

$$T_{\text{cycle}} = T_c \times CPI + t.$$



# Thank You

