

361-1-4201/381-1-0107

Computer Architecture

Intro to Microarchitecture: Single-Cycle

Dr. Guy Tel-Zur

Based on slides by Prof. Onur Mutlu

Carnegie Mellon University

Spring 2015, 1/26/2015

With Dr. Danny Seidner modifications

# Agenda for Today & Next Few Lectures

---

- Start Microarchitecture
  - Single-cycle Microarchitectures
  - Multi-cycle Microarchitectures
  - Microprogrammed Microarchitectures
  - Pipelining
  - Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, ...
- 
- The diagram illustrates the grouping of agenda items into three main sections. A large blue curly brace on the right side spans from the 'Start Microarchitecture' item to the 'Issues in Pipelining' item. This brace is divided into three horizontal segments by two smaller curly braces. The top segment covers 'Start Microarchitecture', 'Single-cycle Microarchitectures', and 'Multi-cycle Microarchitectures'. The middle segment covers 'Microprogrammed Microarchitectures'. The bottom segment covers 'Pipelining' and 'Issues in Pipelining'. To the right of each of these three segments is a label: 'Lectures 4-5', 'Lecture 6', and 'Lect 7' respectively.

# Recap of the Last Lectures

---

- Computer Architecture Today and Basics – Lec #1
- Fundamental Concepts – Lec #1
  - Computer Arch = ISA (the definition) + uArch (the implementation)
- ISA basics and tradeoffs – Lec #2
  - Instruction length
  - Uniform vs. non-uniform decode
  - Number of registers
  - Addressing modes
  - Aligned vs. unaligned access
  - RISC vs. CISC properties
- MIPS ISA – Lec #3
  - MIPS ISA Overview
- Datapath & Microarchitecture – Lec #4

# חומר למחשבה

---

- As you learn the MIPS ISA, think about what **tradeoffs** the designers have made
  - in terms of the ISA properties we talked about
- And, think about the **pros and cons** of design choices
  - In comparison to ARM, Alpha
  - In comparison to x86, VAX
- And, think about the **potential mistakes**
  - Branch delay slot? (stall in the pipeline – טרם למדנו –)
  - Load delay slot? (להבטיח שההוראה הבאה תבוצע ללא עיכוב)
  - No FP, no multiply, MIPS (initial)

# עד חומר למחשבה You

---

- How would you design a new ISA?
- Where would you place it?
- What design choices would you make in terms of ISA properties?
- What would be the first question you ask in this process?
  - “What is my **design point**?”
    - The 1<sup>st</sup> question to ask what is the Design Point / Use Case  
מה השאלה הראשונה שתרצו לשאול את המזמין?

**Look Forward & Up**

# Review: Other Example ISA-level Tradeoffs

- Condition codes vs. not – **ילמד בהמשך**
- VLIW vs. single instruction – **ילמד בהמשך**
- SIMD (single instruction multiple data) vs. SISD – **הטකסונומיה של פליין**
- Precise vs. imprecise exceptions – **ילמד בהמשך**
- Virtual memory vs. not
- Unaligned access vs. not
- Hardware interlocks vs. software guaranteed interlocking
- Software vs. hardware managed page fault handling
- Cache coherence (hardware vs. software)
- ...

**Think Programmer vs. (Micro)architect**

**היכן למקם את המעמסה?**



# Review: A Note on RISC vs. CISC

---

- RISC
  - Simple instructions
  - Fixed length
  - Uniform decode
  - Few addressing modes
  
- CISC
  - Complex instructions
  - Variable length
  - Non-uniform decode
  - Many addressing modes

# Now That We Have an ISA

---

- How do we implement it?
- i.e., how do we design a system that obeys the hardware/software interface?
- Aside: “System” can be solely hardware or a combination of hardware and software
  - Remember “Translation of ISAs”
  - A virtual ISA can be converted by “software” into an implementation ISA
- We will assume “hardware” for most lectures

# Implementing the ISA: Microarchitecture Basics

# How Does a Machine Process Instructions?

---

- What does processing an instruction mean?
- Remember the von Neumann model

AS = Architectural (programmer visible) state before an instruction is processed



Process instruction



AS' = Architectural (programmer visible) state after an instruction is processed

- Processing an instruction: Transforming AS to AS' according to the ISA specification of the instruction

# Remember: Programmer Visible (Architectural) State



Memory  
array of storage locations  
indexed by an address

## Registers

- given special names in the ISA (as opposed to addresses)
- general vs. special purpose

## Program Counter

memory address  
of the current instruction

Instructions (and programs) specify how to transform  
the values of programmer visible state

# The “Process instruction” Step

- ISA specifies abstractly what AS' should be, given an instruction and AS
  - It defines an abstract **finite state machine** where
    - State = programmer-visible state
    - Next-state logic = instruction execution specification
  - From ISA point of view, there are no “intermediate states” between AS and AS' during instruction execution
    - One state transition per instruction
- Microarchitecture implements how AS is transformed to AS'
  - There are many choices in implementation
  - We can have programmer-invisible state to optimize the speed of instruction execution: multiple state transitions per instruction
    - Choice 1: **AS → AS'** (transform AS to AS' in a single clock cycle)
    - Choice 2: **AS → AS+MS1 → AS+MS2 → AS+MS3 → AS'** (take multiple clock cycles to transform AS to AS')

Finite State Machine

AS = Architecture State  
MS=Mictoarchitecture State

# A Very Basic Instruction Processing Engine

---

- Each instruction takes a **single clock cycle** to execute
- Only **combinational logic** is used to implement instruction execution
  - *No intermediate, programmer-invisible state updates*

AS = Architectural (programmer visible) state

at the beginning of a clock cycle



Process instruction in one clock cycle



AS' = Architectural (programmer visible) state

at the end of a clock cycle

# A Very Basic Instruction Processing Engine

- Single-cycle machine



- What is the *clock cycle time* determined by? מישחו רוצה לענות?
- What is the *critical path* of the combinational logic determined by?

# Single-cycle vs. Multi-cycle Machines

- Single-cycle machines
  - Each instruction takes a single clock cycle
  - All state updates made at the end of an instruction's execution
  - Big disadvantage: The slowest instruction determines cycle time → long clock cycle time
- Multi-cycle machines
  - Instruction processing broken into multiple cycles (or stages)
  - State updates can be made during an instruction's execution
  - Architectural state(\*) updates made only at the end of an instruction's execution
  - Advantage over single-cycle: The slowest “stage” determines cycle time
- Both single-cycle and multi-cycle machines literally follow the von Neumann model at the microarchitecture level

Slowest  
combinational logic

(\*) "מצב" כפי שהוגדר בשקף הקודם

# Instruction Processing “Cycle”

- Instructions are processed under the direction of a “control unit” step by step.
- Instruction cycle: Sequence of steps to process an instruction
- Fundamentally, there are six phases (steps):
  - Fetch
  - Decode
  - Evaluate Address
  - Fetch Operands
  - Execute
  - Store Result
- Not all instructions require all six stages

יש להבחין בין מחזור של הוראה לבין מחזור של שעון!

# Instruction Processing “Cycle” vs. Machine Clock Cycle

---

- Single-cycle machine:
  - All six phases of the instruction processing cycle take a *single machine clock cycle* to complete
  
- Multi-cycle machine:
  - All six phases of the instruction processing cycle can take *multiple machine clock cycles* to complete
  - In fact, each phase can take multiple clock cycles to complete

למשל יתכן שביצוע FETCH יקח 5 מחזורי שעון...

□ האינטראקציה בין המעבד לזכרון במחשב מסווג **single cycle** היא חוויה מתסכמת...

# Instruction Processing Viewed Another Way

---

- Instructions transform **Data** (AS) to **Data'** (AS')
- This transformation is done by **functional units**
  - Units that “operate” on data
- These units need to be told what to do to the data
- An instruction processing engine consists of two components
  - **Datapath:** Consists of hardware elements that deal with and transform data signals
    - functional units that operate on data
    - hardware structures (e.g. wires and muxes) that enable the flow of data into the functional units and registers
    - storage units that store data (e.g., registers)
  - **Control logic:** Consists of hardware elements that determine control signals, i.e., signals that specify what the datapath elements should do to the data

Data vs. Control separation

# Single-cycle vs. Multi-cycle: Control & Data

---

- Single-cycle machine:
  - Control signals are generated in the same clock cycle as the one during which data signals are operated on
  - Everything related to an instruction happens in one clock cycle (serialized processing)
- Multi-cycle machine:
  - Control signals needed in the next cycle can be generated in the current cycle
  - Latency of control processing can be overlapped with latency of datapath operation (more parallelism)
- We will see the difference clearly in *microprogrammed multi-cycle microarchitectures*

# Many Ways of Datapath and Control Design

---

- There are many ways of designing the data path and control logic
- Single-cycle, multi-cycle, pipelined datapath and control
- Single-bus vs. multi-bus datapaths
- Hardwired/combinational vs. microcoded/microprogrammed control
  - Control signals generated by combinational logic versus
  - Control signals stored in a memory structure
- Control signals and structure depend on the datapath design

# Flash-Forward: Performance Analysis

- Execution time of an instruction
  - $\{\text{CPI}\} \times \{\text{clock cycle time}\}$

$$\text{CPI} = 1 / \text{IPC}$$

- Execution time of a program
  - Sum over all instructions  $[\{\text{CPI}\} \times \{\text{clock cycle time}\}]$
  - $\{\#\text{ of instructions}\} \times \{\text{Average CPI}\} \times \{\text{clock cycle time}\}$

We have  
one degree of freedom  
to optimize

- Single cycle microarchitecture performance

- CPI = 1
- Clock cycle time = long

- Multi-cycle microarchitecture performance

- CPI = different for each instruction
  - Average CPI → hopefully small
- Clock cycle time = short

Now, we have  
two degrees of freedom  
to optimize independently

ניתן "לשחק" עם 2 גדרים:  
clock cycle time -I CPI

# A Single-Cycle Microarchitecture

## *A Closer Look*

# Remember...

---

- Single-cycle machine



# Let's Start with the State Elements

- Data and control inputs

נתחל עם כל מה שהוא  
programmer visible state



# For Now, We Will Assume



- “Magic” memory and register file
- Combinational read
  - output of the read data port is a combinational function of the register file contents and the corresponding read select port
- Synchronous write
  - the selected register is updated on the positive edge clock transition when write enable is asserted
    - Cannot affect read output in between clock edges



- **זהוי הנחה לשם הפסיות – Single-cycle, synchronous memory**
  - Contrast this with memory that tells when the data is ready
  - i.e., Ready bit: indicating the read or write is done



FIGURE 4.3 Combinational logic, state elements, and the clock are closely related. In a synchronous digital system, the clock determines when elements with state will write values into internal storage. Any inputs to a state element must reach a stable value (that is, have reached a value from which they will not change until after the clock edge) before the active clock edge causes the state to be updated. All state elements in this chapter, including memory, are assumed to be **positive edge-triggered**; that is, they change on the rising clock edge.

# Instruction Processing

- 5 generic steps (P&H book) –

אמנם דיברנו על 6 שלבים אך כאן נעשה מיזוג ושינוי קל לעומת מה שנכתב לפני 10 שקיים

- ❑ Instruction fetch (IF)
- ❑ Instruction decode and register operand fetch (ID/RF = Register Fetch)
- ❑ Execute/Evaluate memory address (EX/AG = Address Generate)
- ❑ Memory operand fetch (MEM)
- ❑ Store/writeback result (WB)



# What Is To Come: The Full MIPS Datapath



# Single-Cycle Datapath for *Arithmetic and Logical Instructions*

# R-Type ALU Instructions

- Assembly (e.g., register-register signed addition)

ADD rd<sub>reg</sub> rs<sub>reg</sub> rt<sub>reg</sub>

- Machine encoding



- Semantics

if MEM[PC] == ADD rd rs rt

GPR[rd]  $\leftarrow$  GPR[rs] + GPR[rt]

PC  $\leftarrow$  PC + 4

אם מה שנמצא בזכרון בכתב בכתובת  
PC שווה לתוך שבאגף ימין אז...

# להלן המימוש ALU Datapath



if  $\text{MEM}[\text{PC}] == \text{ADD}$   $\text{rd } \text{rs } \text{rt}$

$\text{GPR}[\text{rd}] \leftarrow \text{GPR}[\text{rs}] + \text{GPR}[\text{rt}]$   
 $\text{PC} \leftarrow \text{PC} + 4$



Combinational  
state update logic

# להלן המימוש R-Type ALU



$GPR[rd] \leftarrow GPR[rs] + GPR[rt]$   
 $PC \leftarrow PC + 4$



Combinational  
state update logic

The General-Purpose Register file (GPR file). This unit is required for the decode phase (and for the Write Back phase)

---

It includes the  
32 registers the  
programmer can  
access (\$0 - \$31)



Let's see how it is used:

# The General-Purpose Register file (GPR).

There are 3 ports in this box

If we want to read from \$7 via port 1

we set Read register1 to 7 = 00111

we then gets the contents of \$7 at the  
Read data1 output

we mark it as: GPR[7]

It could be 103 for example



# The General-Purpose Register file (GPR).

So if we have the instruction:

we connect the Rs field Read reg1

and get GPR[Rs] at the Read data1 output

In our case we assumed GPR[7]=103

add \$12, \$7, \$2  
Rs Rt

$00111 = 7 = \text{Rs}$

$00010 = 2 = \text{Rt}$

In our case we also assume  
assumed that GPR[2]=62

we connect the Rt field Read register2

and get GPR[Rt] ate the  
Read data2 output



# The General-Purpose Register file (GPR).

| R | opcode      | rs    | rt    | rd | shamt | funct |
|---|-------------|-------|-------|----|-------|-------|
|   | 31<br>26 25 | 21 20 | 16 15 | 11 | 6 5   | 0     |

So this is the real picture:



So what's next? We want to add the 103 & 62

$$103 + 62 = 165$$

Now we want to write this to Rd

# I-Type ALU Instructions

- Assembly (e.g., register-immediate signed additions)

ADDI  $rt_{reg}$   $rs_{reg}$  immediate<sub>16</sub>

- Machine encoding



- Semantics

if  $MEM[PC] == ADDI\ rt\ rs\ immediate$

$GPR[rt] \leftarrow GPR[rs] + \text{sign-extend (immediate)}$

$PC \leftarrow PC + 4$

# Datapath for R and I-Type ALU Insts.



if  $\text{MEM}[\text{PC}] == \text{ADDI } rt \text{ } rs \text{ } \text{immediate}$

$\text{GPR}[rt] \leftarrow \text{GPR}[rs] + \text{sign-extend}(\text{immediate})$   
 $\text{PC} \leftarrow \text{PC} + 4$



Combinational  
state update logic

# Datapath for R and I-Type ALU Insts.



$GPR[rt] \leftarrow GPR[rs] + \text{sign-extend (immediate)}$

$PC \leftarrow PC + 4$



# Single-Cycle Datapath for *Data Movement Instructions*

# Load Instructions

- Assembly (e.g., load 4-byte word)

LW rt<sub>reg</sub> offset<sub>16</sub> (base<sub>reg</sub>)

Cמִ rs

הוּא מַשְׁפְּחַת ה - Load  
I-type

- Machine encoding



- Semantics

if MEM[PC]==LW rt offset<sub>16</sub> (base)

$$EA = \text{sign-extend}(\text{offset}) + \text{GPR}[\text{base}]$$

$$\text{GPR}[rt] \leftarrow \text{MEM}[\text{translate}(EA)]$$

$$\text{PC} \leftarrow \text{PC} + 4$$

# LW Datapath



if  $\text{MEM}[\text{PC}] == \text{LW rt offset}_{16}(\text{base})$

$$\text{EA} = \text{sign-extend}(\text{offset}) + \text{GPR}[\text{base}]$$

$$\text{GPR}[rt] \leftarrow \text{MEM}[\text{translate}(\text{EA})]$$



Combinational  
state update logic

# LW Datapath



$$EA = GPR[rs] + \text{sign-extend}(\text{offset})$$

$$GPR[rt] \leftarrow \text{MEM}[EA]$$

$$PC \leftarrow PC + 4$$



Combinational  
state update logic

# Store Instructions

- Assembly (e.g., store 4-byte word)

SW rt<sub>reg</sub> offset<sub>16</sub> (base<sub>reg</sub>)

רְסֵוּסַתְּ

- Machine encoding



I-type

- Semantics

if  $\text{MEM}[\text{PC}] == \text{SW } \text{rt } \text{offset}_{16} (\text{base})$

$\text{EA} = \text{sign-extend}(\text{offset}) + \text{GPR}[\text{base}]$

$\text{MEM}[\text{translate}(\text{EA})] \leftarrow \text{GPR}[\text{rt}]$

$\text{PC} \leftarrow \text{PC} + 4$

# SW Datapath



עתה התהפכו האותות



if  $\text{MEM}[\text{PC}] == \text{SW rt offset}_{16} (\text{base})$

$\text{EA} = \text{sign-extend}(\text{offset}) + \text{GPR}[\text{base}]$

$\text{MEM}[\text{translate}(\text{EA})] \leftarrow \text{GPR}[\text{rt}]$

$\text{PC} \leftarrow \text{PC} + 4$



Combinational  
state update logic

# SW Datapath



$$EA = GPR[rs] + \text{sign-extend}(\text{offset})$$

$\text{MEM}[EA] \leftarrow GPR[rt]$

$PC \leftarrow PC + 4$



Combinational  
state update logic

# Load-Store Datapath



# Datapath for Non-Control-Flow Insts.





---

חזרה והסביר נוסף על ה- datapath עבור ההוראות שנלמדו עד-כה  
ניתנת מחדש ב-7 השקפים הבאים הלקוחים מ-

## **CSE378 University of Washington**

<https://courses.cs.washington.edu/courses/cse378/10sp/lectures.html>

אנו לא נתעכט עליהם כאן ומעבר על השקפים יעשה בלימוד עצמי  
בבית

## Encoding R-type instructions

- Last lecture, we saw encodings of MIPS instructions as 32-bit values.
- Register-to-register arithmetic instructions use the **R-type** format.
  - **op** is the instruction opcode, and **func** specifies a particular arithmetic operation (see textbook).
  - **rs**, **rt** and **rd** are source and destination registers.

| op     | rs     | rt     | rd     | shamt  | func   |
|--------|--------|--------|--------|--------|--------|
| 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |

- An example instruction and its encoding:

add \$s4, \$t1, \$t2

|        |       |       |       |       |         |
|--------|-------|-------|-------|-------|---------|
| 000000 | 01001 | 01010 | 10100 | 00000 | 1000000 |
|--------|-------|-------|-------|-------|---------|

# Registers and ALUs

- R-type instructions must access registers and an ALU.
- Our **register file** stores thirty-two 32-bit values.
  - Each register specifier is 5 bits long.
  - You can read from two registers at a time.
  - **RegWrite** is 1 if a register should be written.



- Here's a simple **ALU** with five operations, selected by a 3-bit control signal **ALUOp**.

| ALUOp | Function |
|-------|----------|
| 000   | and      |
| 001   | or       |
| 010   | add      |
| 110   | subtract |
| 111   | slt      |



# Executing an R-type instruction

1. Read an instruction from the instruction memory.
2. The source registers, specified by instruction fields **rs** and **rt**, should be read from the register file.
3. The ALU performs the desired operation.
4. Its result is stored in the destination register, which is specified by field **rd** of the instruction word.



| op | rs | rt | rd | shamt | func        |
|----|----|----|----|-------|-------------|
| 31 | 26 | 25 | 21 | 20    | 15 10 6 5 0 |

# Encoding I-type instructions

- The lw, sw and beq instructions all use the **I-type** encoding.
  - rt** is the *destination* for lw, but a *source* for beq and sw.
  - address** is a 16-bit signed constant.



- Two example instructions:

lw \$t0, -4(\$sp)      

|        |       |       |                     |
|--------|-------|-------|---------------------|
| 100011 | 11101 | 01000 | 1111 1111 1111 1100 |
|--------|-------|-------|---------------------|

sw \$a0, 16(\$sp)      

|        |       |       |                     |
|--------|-------|-------|---------------------|
| 101011 | 11101 | 00100 | 0000 0000 0001 0000 |
|--------|-------|-------|---------------------|

# Accessing data memory

- For an instruction like `lw $t0, -4($sp)`, the base register `$sp` is added to the *sign-extended* constant to get a data memory address.
- This means the ALU must accept *either* a register operand for arithmetic instructions, *or* a sign-extended immediate operand for `lw` and `sw`.
- We'll add a multiplexer, controlled by `ALUSrc`, to select either a register operand (0) or a constant operand (1).



# MemToReg

- The register file's "Write data" input has a similar problem. It must be able to store *either* the ALU output of R-type instructions, or the data memory output for lw.
- We add a mux, controlled by MemToReg, to select between saving the ALU result (0) or the data memory output (1) to the registers.



# RegDst

- A final annoyance is the destination register of lw is in *rt* instead of *rd*.



lw \$rt, address(\$rs)

- We'll add one more mux, controlled by *RegDst*, to select the destination register from either instruction field rt (0) or field rd (1).



---

עד כאן תוספת שקיים מtower CSE378

# Single-Cycle Datapath for *Control Flow Instructions*

# Unconditional Jump Instructions

- Assembly

$J \text{ immediate}_{26}$

- Machine encoding



- Semantics

if  $\text{MEM}[\text{PC}] == J \text{ immediate}_{26}$

target = {  $\text{PC}[31:28]$ ,  $\text{immediate}_{26}$ ,  $2'b00$  }

$\text{PC} \leftarrow \text{target}$

Guy> This is the addressing mode convention:  
concatenate the  $\text{PC} + \text{imm} + 2'b00$

# זיכרון מפתח זיכרון ה- MIPS



לקריאה מה- MIPS reference card

## MEMORY ALLOCATION

$$16^7 = 2^{28}$$

\$sp → 7fff fffc<sub>hex</sub>

\$gp → 1000 8000<sub>hex</sub>

1000 0000<sub>hex</sub>

pc → 0040 0000<sub>hex</sub>

0<sub>hex</sub>



# Unconditional Jump Datapath



\*\*Based on original figure from [P&H CO&D, COPYRIGHT 2004  
Elsevier. ALL RIGHTS RESERVED.]

if  $\text{MEM}[\text{PC}] == \text{J}$  immediate26

$\text{PC} = \{ \text{PC}[31:28], \text{immediate26}, 2'b00 \}$

What about JR, JAL?

## H&P CO&D



FIGURE 4.24 The simple control and datapath are extended to handle the jump instruction. An additional multiplexor (at the upper right) is used to choose between the jump target and either the branch target or the sequential instruction following this one. This multiplexor is controlled by the jump control signal. The jump target address is obtained by shifting the lower 26 bits of the jump instruction left 2 bits, effectively adding 00 as the low-order bits, and then concatenating the upper 4 bits of  $PC + 4$  as the high-order bits, thus yielding a 32-bit address.

# Aside: MIPS Cheat Sheet

[http://www.ece.cmu.edu/~ece447/s15/lib/exe/fetch.php?media=mips\\_reference\\_data.pdf](http://www.ece.cmu.edu/~ece447/s15/lib/exe/fetch.php?media=mips_reference_data.pdf)

הזהיר:

Pseudo Instructions

| MIPS Reference Data                                                                                                                                                                                                                                                                                                                                                              |                                                                 | ① CORE INSTRUCTION SET    |           | ② ARITHMETIC CORE INSTRUCTION SET |        | ③ FLOW CONTROL INSTRUCTION SET                      |              | ④ DATA TRANSFER INSTRUCTION SET |        | ⑤ JUMP INSTRUCTION SET                              |                       | ⑥ PSEUDOINSTRUCTION SET          |        |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|---------------------------|-----------|-----------------------------------|--------|-----------------------------------------------------|--------------|---------------------------------|--------|-----------------------------------------------------|-----------------------|----------------------------------|--------|
|                                                                                                                                                                                                                                                                                                                                                                                  |                                                                 |                           |           |                                   |        |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| NAME, MNEMONIC                                                                                                                                                                                                                                                                                                                                                                   | FORMAT                                                          | OPERATION (in Verilog)    | OPCODE    | NAME, MNEMONIC                    | FORMAT | OPERATION                                           | OPCODE       | NAME, MNEMONIC                  | FORMAT | NAME, MNEMONIC                                      | FORMAT                | NAME, MNEMONIC                   | FORMAT |
| Add add                                                                                                                                                                                                                                                                                                                                                                          | R [rt] = R[rs] + R[rt]                                          | (1) 0 / 20 <sub>hex</sub> | 0 / FUNCT | Branch On FP True beq             | FI     | iif(FPcond[PC>=PC+4] & BranchAddr(4))               | 11/8/1~      | Load Byte Unsigned lhu          | I      | lui                                                 | 0 / 00 <sub>hex</sub> | Set Less Than sltu               | I      |
| Add Immediate addi                                                                                                                                                                                                                                                                                                                                                               | I R[rt] = R[rs] + SignExtImm                                    | (1,2) 8 <sub>hex</sub>    | / Hex     | Branch On FP False beq            | FI     | iif(FPcond[PC>=PC+4] & BranchAddr(4))               | 11/8/0~      | Load Halfword Unsigned lh       | I      | lo                                                  | 0/0~1/a               | Set Less Than or Equal sltiu     | I      |
| Add Imm Unsigned addu                                                                                                                                                                                                                                                                                                                                                            | I R[rt] = R[rs] + SignExtImm                                    | (2) 9 <sub>hex</sub>      |           | Divide div                        | R      | lo-R[rs](R[rt]); hi-R[rs]~&R[rt]                    | 0/0~1/a      | Load Word Unsigned lw           | I      | lh                                                  | 0/0~1/b               | Set Greater Than sgtr            | I      |
| Add Unsigned addu                                                                                                                                                                                                                                                                                                                                                                | R R[rt] = R[rs] + R[rt]                                         | 0 / 21 <sub>hex</sub>     |           | Divide Unsigned divu              | R      | lo-R[rs](R[rt]); hi-R[rs]~&R[rt]                    | 0/0~1/b      | Store Byte sb                   | I      | sh                                                  | 0/0~1/c               | Set Greater Than or Equal sgtriu | I      |
| And and                                                                                                                                                                                                                                                                                                                                                                          | R R[rd] = R[rs] & R[rt]                                         | 0 / 24 <sub>hex</sub>     |           | FP Add Single add.s               | FR     | Fifld] = F[fs] & F[fr]                              | 11/10~0      | Store Halfword Unsigned sh      | I      | sh                                                  | 0/0~1/d               | Set Less Than sgtriu             | I      |
| And Immediate andi                                                                                                                                                                                                                                                                                                                                                               | I R[rt] = R[rs] & ZeroExtImm                                    | (3) 0 <sub>hex</sub>      |           | FP Add Double add.d               | FR     | [F[fd],F[fd+1]] = [F[fs],F[fs+1]] + [F[fr],F[fr+1]] | 11/11~0      | Store Word Unsigned sw          | I      | sw                                                  | 0/0~1/e               | Set Greater Than sgtriu          | I      |
| Branch On Equal beq                                                                                                                                                                                                                                                                                                                                                              | I if(R[rs]==R[rt])                                              | 4 <sub>hex</sub>          |           | FP Compare Single c.e*            | FR     | FPcond = (F[fs].op[F[fr]]) ? 1 : 0                  | 11/10~0/y    | Subtract sub                    | I      | sf                                                  | 0/0~1/f               | Set Less Than sgtriu             | I      |
| Branch On Not Equalne                                                                                                                                                                                                                                                                                                                                                            | I if(R[rs]!=R[rt])                                              | 5 <sub>hex</sub>          |           | FP Compare Double c.e*            | FR     | FPcond = (F[fs].F[fs+1] op [F[fr],F[fr+1]]) ? 1 : 0 | 11/11~0/y    | Subtract Double sub.d           | FR     | [F[fd],F[fd+1]] = [F[fs],F[fs+1]] - [F[fr],F[fr+1]] | 11/11~0/y             | Set Greater Than sgtriu          | I      |
| Jump j                                                                                                                                                                                                                                                                                                                                                                           | J PC=JumpAddr                                                   | (5) 2 <sub>hex</sub>      |           | FP Divide Single div.s            | FR     | Fifld] = F[fs] / F[fr]                              | 11/10~0/3    | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Jump And Link jal                                                                                                                                                                                                                                                                                                                                                                | J PC[31]~PC[8];PC=JumpAddr                                      | (5) 3 <sub>hex</sub>      |           | FP Divide Double div.d            | FR     | [F[fd],F[fd+1]] = [F[fs],F[fs+1]] / [F[fr],F[fr+1]] | 11/11~0~3    | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Jump Register jr                                                                                                                                                                                                                                                                                                                                                                 | J PC=R[rs]                                                      | 0 / 08 <sub>hex</sub>     |           | FP Multiply Single mul.s          | FR     | Fifld] = F[fs] * F[fr]                              | 11/10~0/2    | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Load Byte Unsigned lhu                                                                                                                                                                                                                                                                                                                                                           | I R[rt] = ~2 <sup>16</sup> & M[R[rs]] + (SignExtImm) 7:0)       | (2) 2 <sub>hex</sub>      |           | FP Multiply Double mul.d          | FR     | [F[fd],F[fd+1]] = [F[fs],F[fs+1]] * [F[fr],F[fr+1]] | 11/11~0/2    | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Load Halfword Unsigned lhu                                                                                                                                                                                                                                                                                                                                                       | I R[rt] = (16'bo,M[R[rs]] + (SignExtImm) 15:0)                  | (2) 2 <sub>hex</sub>      |           | FP Subtract Single sub.s          | FR     | Fifld] = F[fs] - F[fr]                              | 11/10~0/1    | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Load Linked ll                                                                                                                                                                                                                                                                                                                                                                   | I R[rt] = M[R[rs]] & SignExtImm                                 | (2,7) 30 <sub>hex</sub>   |           | FP Subtract Double sub.d          | FR     | [F[fd],F[fd+1]] = [F[fs],F[fs+1]] - [F[fr],F[fr+1]] | 11/11~0/1    | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Load Upper Imm. lui                                                                                                                                                                                                                                                                                                                                                              | I R[rt] = (imm & 16'b0)                                         | 1 <sub>hex</sub>          |           | Load FP Single lwl                | I      | F[rt] = M[R[rs]] & SignExtImm                       | (2) 31/0~m/n | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Load Word lw                                                                                                                                                                                                                                                                                                                                                                     | I R[rt] = M[R[rs]] & SignExtImm                                 | (2) 2 <sub>hex</sub>      |           | Load FP Load lfd                  | I      | F[rt] = M[R[rs]] & SignExtImm;                      | (2) 35/0~m/n | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Nor nor                                                                                                                                                                                                                                                                                                                                                                          | R R[rd] = ~ (R[rs] & R[rt])                                     | 0 / 27 <sub>hex</sub>     |           | Double Move From Hi mfh           | R      | R[rd] = R[rs] & R[rt]                               | 0/0~1/10     | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Or or                                                                                                                                                                                                                                                                                                                                                                            | R R[rd] = R[rs] << shamt                                        | 0 / 25 <sub>hex</sub>     |           | Double Move From Lo mfl           | R      | R[rd] = R[rs] & R[rt]                               | 0/0~1/12     | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Or Immediate ori                                                                                                                                                                                                                                                                                                                                                                 | I R[rt] = R[rs] & ZeroExtImm                                    | (3) 0 <sub>hex</sub>      |           | Move From Hi mfh                  | R      | R[rd] = CR[rs]                                      | 10/0~0       | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Set Less Thanslt                                                                                                                                                                                                                                                                                                                                                                 | R R[rd] = (R[rs] < R[rt]) ? 1 : 0                               | 0 / 21 <sub>hex</sub>     |           | Move From Control mfc0            | R      | R[rd] = (Hi,Lo) = R[rs] & R[rt]                     | 0/0~1/18     | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Set Less Than Imm. slti                                                                                                                                                                                                                                                                                                                                                          | I R[rt] = (R[rs] < SignExtImm) 1 : 0 (2) 2 <sub>hex</sub>       |                           |           | Multiply Unsigned multu           | R      | R[rd] = R[rs] >> shamt                              | 0/0~1/19     | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Set Less Than Imm. sltiu                                                                                                                                                                                                                                                                                                                                                         | I R[rt] = (R[rs] < SignExtImm)                                  | (2,6) 2 <sub>hex</sub>    |           | Multiply Unsigned multu           | R      | M[R[rs]] & SignExtImm = F[rt]                       | 0/0~1/19     | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Set Less Than Unsigned sltu                                                                                                                                                                                                                                                                                                                                                      | R R[rd] = (R[rs] < R[rt]) ? 1 : 0                               | 0 / 20 <sub>hex</sub>     |           | Shift Right Arit. sra             | R      | R[rd] = R[rs] >> shamt                              | 0/0~1/3      | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Shift Left Logical sll                                                                                                                                                                                                                                                                                                                                                           | R R[rd] = R[rs] << shamt                                        | 0 / 00 <sub>hex</sub>     |           | Shift Right Arit. sra             | R      | R[rd] = R[rs] >> shamt                              | 0/0~1/3      | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Shift Right Logical srl                                                                                                                                                                                                                                                                                                                                                          | R R[rd] = R[rs] >> shamt                                        | 0 / 02 <sub>hex</sub>     |           | Store FP Single swcl              | I      | M[R[rs]] & SignExtImm = F[rt]                       | 0/0~1/2      | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Store Byte sb                                                                                                                                                                                                                                                                                                                                                                    | I M[R[rs]] & SignExtImm] 7:0) = R[rt]; R[rt] = R[rt]            | (2) 2 <sub>hex</sub>      |           | Store FP Single swcl              | I      | M[R[rs]] & SignExtImm] = F[rt];                     | (2) 3d/0~m/n | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Store Conditional sc                                                                                                                                                                                                                                                                                                                                                             | I M[R[rs]] & SignExtImm] 7:0) = R[rt]; R[rt] = (atomic) ? 1 : 0 | (2,7) 3 <sub>hex</sub>    |           | Double Swap sp                    | I      | M[R[rs]] & SignExtImm] + F[rt+1]                    | 0/0~1/2      | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Store Conditional sc                                                                                                                                                                                                                                                                                                                                                             | I M[R[rs]] & SignExtImm] 7:0) = R[rt]; R[rt] = (atomic) ? 1 : 0 | (2,7) 3 <sub>hex</sub>    |           | Double Swap sp                    | I      | M[R[rs]] & SignExtImm] + F[rt+1]                    | 0/0~1/2      | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Store Halfword sh                                                                                                                                                                                                                                                                                                                                                                | I M[R[rs]] & SignExtImm] 15:0) = R[rt]; R[rt] = R[rt]           | (2) 2 <sub>hex</sub>      |           | Double Swap sp                    | I      | M[R[rs]] & SignExtImm] + F[rt+1]                    | 0/0~1/2      | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Store Word sw                                                                                                                                                                                                                                                                                                                                                                    | I M[R[rs]] & SignExtImm] = R[rt]                                | (2) 2 <sub>hex</sub>      |           | Double Swap sp                    | I      | M[R[rs]] & SignExtImm] + F[rt+1]                    | 0/0~1/2      | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| Subtract sub                                                                                                                                                                                                                                                                                                                                                                     | R R[rd] = R[rs] - R[rt]                                         | (1) 0 / 22 <sub>hex</sub> |           | Double Swap sp                    | I      | M[R[rs]] & SignExtImm] + F[rt+1]                    | 0/0~1/2      | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Less Than sgtriu             | I      |
| Subtract Unsigned subu                                                                                                                                                                                                                                                                                                                                                           | R R[rd] = R[rs] - R[rt]                                         | 0 / 23 <sub>hex</sub>     |           | Double Swap sp                    | I      | M[R[rs]] & SignExtImm] + F[rt+1]                    | 0/0~1/2      | Swap sp                         | I      | sw                                                  | 0/0~1/g               | Set Greater Than sgtriu          | I      |
| (1) May cause overflow exception<br>(2) SignExtImm = { 16'imm immediate[15]}, immediate<br>(3) ZeroExtImm = { 16'bo imm}, immediate<br>(4) BranchAddr = { 14'imm immediate[15]}, immediate, 2'b0<br>(5) JumpAddr = { PC+4 11:28}, address, 2'b0<br>(6) Operands considered unsigned numbers (vs. 2's comp)<br>(7) Atoms test&set pair, R[rt] = 1 if pair atomic, 0 if not atomic |                                                                 |                           |           |                                   |        |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| <b>BASIC INSTRUCTION FORMATS</b>                                                                                                                                                                                                                                                                                                                                                 |                                                                 |                           |           |                                   |        |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| <b>R</b>                                                                                                                                                                                                                                                                                                                                                                         |                                                                 |                           |           |                                   |        |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| opcode                                                                                                                                                                                                                                                                                                                                                                           | rs                                                              | rt                        | rd        | shamt                             | funct  |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| 31 28 25                                                                                                                                                                                                                                                                                                                                                                         | 23 20                                                           | 21 18                     | 11 10     | 8 5                               | 6      |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| <b>I</b>                                                                                                                                                                                                                                                                                                                                                                         |                                                                 |                           |           |                                   |        |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| opcode                                                                                                                                                                                                                                                                                                                                                                           | rs                                                              | rt                        | immediate |                                   |        |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| 31 28 25                                                                                                                                                                                                                                                                                                                                                                         | 23 20                                                           | 18 15                     |           |                                   | 6      |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| <b>J</b>                                                                                                                                                                                                                                                                                                                                                                         |                                                                 |                           |           |                                   |        |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| opcode                                                                                                                                                                                                                                                                                                                                                                           | address                                                         |                           |           |                                   |        |                                                     |              |                                 |        |                                                     |                       |                                  |        |
| 31 28 25                                                                                                                                                                                                                                                                                                                                                                         |                                                                 |                           |           |                                   | 6      |                                                     |              |                                 |        |                                                     |                       |                                  |        |

Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed.

Please download!  
מצא במודול

# Conditional Branch Instructions

- Assembly (e.g., branch if equal)

$\text{BEQ } \text{rs}_{\text{reg}} \text{ rt}_{\text{reg}} \text{ immediate}_{16}$

- Machine encoding



- Semantics (assuming no branch delay slot)

if  $\text{MEM}[\text{PC}] == \text{BEQ } \text{rs } \text{rt } \text{immediate}_{16}$

$\text{target} = \text{PC} + 4 + \text{sign-extend(immediate)} \times 4$

if  $\text{GPR}[\text{rs}] == \text{GPR}[\text{rt}]$  then  $\text{PC} \leftarrow \text{target}$

else  $\text{PC} \leftarrow \text{PC} + 4$

! פועלות השוואת נועשית ב-ALU

# Conditional Branch Datapath (for you to finish)



\*\*Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]

# Conditional Branch Datapath (finished :-))



\*\*Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]

We need a second adder, since the ALU is already doing subtraction for the beq.



# The Final Datapath (without the controls)



# Putting It All Together



# Single-Cycle Control Logic

נסביר את אותות הבקירה ששולטים על ה- datapath ב-10 השקפים  
הבאים של דני

# Control



## Let's explain all Control Signals:

ALUSrc – 0 will connect GPR[Rt] to the ALU B input [Rtype, beq]

1- will connect sext(imm) to the ALU B input [in lw or sw inst.]

ALUOP – Two bits: 00 force the ALU to add [lw & sw]. 01 force the ALU to sub [beq]. 10 force the ALU to look at the funct. Field [Rtype].

MemRead – 0 - nothing happens. 1- data is read from M[ALU output]

MemWrite – 0 - nothing happens. 1- GPR[Rt] is written into M[ALU output] at the next rising edge of the CK



RegWrite – 0 – if we do not write back to GPR [sw, beq].

1 – if we do write back to GPR [Rtype, lw]

MemToReg – 0 – The ALU output is fed to the GPR Write Data input [Rtype]. 1- Data read from Memory is fed to the GPR Write Data input [lw]

RegDst – 0 – The GPR Write Reg gets Rt field of the instruction [lw]. 1- The GPR Write Reg gets Rd field of instruction [Rtype]

Branch – 0 – ia all instructions except beq. 1- in beq. Selects whether to branch or not.

Inputs



# Control

We would like to build the Control Decoder together

| Instr. | ALU<br>Src | ALUOp<br>[1:0] | Mem<br>Read | Mem<br>Write | Mem<br>to<br>Reg | Reg<br>Dst | Reg<br>Write | Branch |
|--------|------------|----------------|-------------|--------------|------------------|------------|--------------|--------|
| R-type |            |                |             |              |                  |            |              |        |
| lw     |            |                |             |              |                  |            |              |        |
| sw     |            |                |             |              |                  |            |              |        |
| beq    |            |                |             |              |                  |            |              |        |



We can fill up columns

We only read from memory in lw instruction, so

We only write to memory in sw instruction, so

| Instr. | ALU Src | ALUOp [1:0] | Mem Read | Mem Write | Mem to Reg | Reg Dst | Reg Write | Branch |
|--------|---------|-------------|----------|-----------|------------|---------|-----------|--------|
| R-type |         |             |          | 0         | 0          |         |           |        |
| lw     |         |             |          | 1         | 0          |         |           |        |
| sw     |         |             |          | 0         | 1          |         |           |        |
| beq    |         |             |          | 0         | 0          |         |           |        |

A better way is to follow  
Each instruction and see  
What happens in the  
Data-Path during the  
Execution of the instruction



In Rtype we use  $GPR[Rs]$  op  $GPR[Rt]$

Thus,  $ALUSrc = 0$

In Rtype we must have  $ALUSOp = 10$  (function)

$MemRead=0$  and  $MemWrite=0$

$MemToReg=0$

$RegDst=1$  to select Rd

$RegWrite=1$  to write to the GPR

$Branch=0$  so we always get  
 $PC=PC+4$

| Instr. | ALU Src | ALUOp [1:0] | Mem Read | Mem Write | Mem to Reg | Reg Dst | Reg Write | Branch |
|--------|---------|-------------|----------|-----------|------------|---------|-----------|--------|
| R-type | 0       | 1 0         | 0        | 0         | 0          | 1       | 1         | 0      |
| lw     |         |             |          | 1         | 0          |         |           |        |
| sw     |         |             |          | 0         | 1          |         |           |        |
| beq    |         |             |          | 0         | 0          |         |           |        |



| <b>Instr.</b> | <b>ALU Src</b> | <b>ALUOp [1:0]</b> |          | <b>Mem Read</b> | <b>Mem Write</b> | <b>Mem to Reg</b> | <b>Reg Dst</b> | <b>Reg Write</b> | <b>Branch</b> |
|---------------|----------------|--------------------|----------|-----------------|------------------|-------------------|----------------|------------------|---------------|
| R-type        | 0              | 1                  | 0        | 0               | 0                | 0                 | 1              | 1                | 0             |
| lw            | <b>1</b>       | <b>0</b>           | <b>0</b> | 1               | 0                | <b>1</b>          | <b>0</b>       | <b>1</b>         | <b>0</b>      |
| sw            |                |                    |          | 0               | 1                |                   |                |                  |               |
| beq           |                |                    |          | 0               | 0                |                   |                |                  |               |

RegDst=0 to select Rt

RegWrite=1 to write to the GPR

Branch=0 so we always get  
 $PC = PC + 4$



In sw we use  $GPR[Rs] + \text{sext}(imm)$

Thus, ALUSrc = 1

In sw we must have ALUSOp=00 (add)

MemRead=0 and MemWrite=1

MemToReg=X. We do not write back!

| Instr. | ALU Src | ALUOp [1:0] |   | Mem Read | Mem Write | Mem to Reg | Reg Dst | Reg Write | Branch |
|--------|---------|-------------|---|----------|-----------|------------|---------|-----------|--------|
| R-type | 0       | 1           | 0 | 0        | 0         | 0          | 1       | 1         | 0      |
| lw     | 1       | 0           | 0 | 1        | 0         | 1          | 0       | 1         | 0      |
| sw     | 1       | 0           | 0 | 0        | 1         | X          | X       | 0         | 0      |
| beq    |         |             |   | 0        | 0         |            |         |           |        |

RegDst=X. We do not write back!

RegWrite=0. We do not write back!

Branch=0 so we always get  
 $PC = PC + 4$



| Instr. | ALU Src  | ALUOp [1:0] |          | Mem Read | Mem Write | Mem to Reg | Reg Dst  | Reg Write | Branch   |
|--------|----------|-------------|----------|----------|-----------|------------|----------|-----------|----------|
| R-type | 0        | 1           | 0        | 0        | 0         | 0          | 1        | 1         | 0        |
| lw     | 1        | 0           | 0        | 1        | 0         | 1          | 0        | 1         | 0        |
| sw     | 1        | 0           | 0        | 0        | 1         | X          | X        | 0         | 0        |
| beq    | <b>0</b> | <b>0</b>    | <b>1</b> | 0        | 0         | <b>X</b>   | <b>X</b> | <b>0</b>  | <b>1</b> |

RegDst= X. We do not write back!

RegWrite=0 We do not write back!

Branch=1 so we check Zero to decide whether to branch (PC= branch target) or not to branch (PC=PC+4)



So the final Control Decoder is as follows:

| Instr. | ALU Src | ALUOp [1:0] |   | Mem Read | Mem Write | Mem to Reg | Reg Dst | Reg Write | Branch |
|--------|---------|-------------|---|----------|-----------|------------|---------|-----------|--------|
| R-type | 0       | 1           | 0 | 0        | 0         | 0          | 1       | 1         | 0      |
| lw     | 1       | 0           | 0 | 1        | 0         | 1          | 0       | 1         | 0      |
| sw     | 1       | 0           | 0 | 0        | 1         | X          | X       | 0         | 0      |
| beq    | 0       | 0           | 1 | 0        | 0         | X          | X       | 0         | 1      |

# Control Decoder



| Instr. | ALU<br>Src | ALUOp<br>[1:0] |   | Mem<br>Read | Mem<br>Write | Mem<br>to<br>Reg | Reg<br>Dst | Reg<br>Write | Branch |
|--------|------------|----------------|---|-------------|--------------|------------------|------------|--------------|--------|
| R-type | 0          | 1              | 0 | 0           | 0            | 0                | 1          | 1            | 0      |
| lw     | 1          | 0              | 0 | 1           | 0            | 1                | 0          | 1            | 0      |
| sw     | 1          | 0              | 0 | 0           | 1            | X                | X          | 0            | 0      |
| beq    | 0          | 0              | 1 | 0           | 0            | X                | X          | 0            | 1      |

**סיום השקפים של דני** על אותות הבדיקה ששולטים על ה- datapath

# Single-Cycle Hardwired Control

- As combinational function of  $\text{Inst} = \text{MEM}[\text{PC}]$



- Consider – מילים נזקנויות ס
- All R-type and I-type ALU instructions
- LW and SW
- BEQ, BNE, BLEZ, BGTZ
- J, JR, JAL, JALR

# Putting It All Together



# Single-Bit Control Signals

|          | When De-asserted                                         | When asserted                                            | Equation                                                                     |
|----------|----------------------------------------------------------|----------------------------------------------------------|------------------------------------------------------------------------------|
| RegDest  | GPR write select according to $rt$ , i.e., $inst[20:16]$ | GPR write select according to $rd$ , i.e., $inst[15:11]$ | $opcode == 0$                                                                |
| ALUSrc   | $2^{nd}$ ALU input from $2^{nd}$ GPR read port           | $2^{nd}$ ALU input from sign-extended 16-bit immediate   | $(opcode != 0) \&\& (opcode != BEQ) \&\& (opcode != BNE)$                    |
| MemtoReg | Steer ALU result to GPR write port                       | Steer memory load to GPR Wr. port                        | $opcode == LW$                                                               |
| RegWrite | GPR write disabled                                       | GPR write enabled                                        | $(opcode != SW) \&\& (opcode != Bxx) \&\& (opcode != J) \&\& (opcode != JR)$ |

JAL and JALR require additional RegDest and MemtoReg options

# Single-Bit Control Signals

|                    | When De-asserted                | When asserted                                      | Equation                                                                        |
|--------------------|---------------------------------|----------------------------------------------------|---------------------------------------------------------------------------------|
| MemRead            | Memory read disabled            | Memory read port return load value                 | $\text{opcode} == \text{LW}$                                                    |
| MemWrite           | Memory write disabled           | Memory write enabled                               | $\text{opcode} == \text{SW}$                                                    |
| PCSrc <sub>1</sub> | According to PCSrc <sub>2</sub> | next PC is based on 26-bit immediate jump target   | $(\text{opcode} == \text{J}) \mid \mid (\text{opcode} == \text{JAL}) \mid \mid$ |
| PCSrc <sub>2</sub> | next PC = PC + 4                | next PC is based on 16-bit immediate branch target | $(\text{opcode} == \text{Bxx}) \ \& \ \& \ \text{bcond is satisfied}$           |

# ALU Control

## ■ case opcode

R-type: '0'  $\Rightarrow$  select operation according to funct

'ALUi'  $\Rightarrow$  selection operation according to opcode

'LW'  $\Rightarrow$  select addition

'SW'  $\Rightarrow$  select addition

'Bxx'  $\Rightarrow$  select bcond generation function

other inst.  $\Rightarrow$  don't care

## ■ Example of ALU CMD (operations)

- ADD, SUB, AND, OR, XOR, NOR, etc.
- bcond on equal, not equal, LE zero, GT zero, etc.



### MIPS ALU CMDs

- Here's a simple ALU with five operations, selected by a 3-bit control signal ALU CMD

| ALU CMD | Function |
|---------|----------|
| 000     | and      |
| 001     | or       |
| 010     | add      |
| 110     | subtract |
| 111     | slt      |



# R-Type ALU



# I-Type ALU



# LW





# Branch (Not Taken)

Some control signals are dependent on the processing of data



# Branch (Taken)

Some control signals are dependent on the processing of data



# Jump



# What is in That Control Box?

---

- Combinational Logic → Hardwired Control
  - Idea: Control signals generated combinationally based on instruction
  - Necessary in a single-cycle microarchitecture...
  
- Sequential Logic → Sequential/Microprogrammed Control
  - Idea: A memory structure contains the control signals associated with an instruction
  - Control Store

יוסבר במצגת נפרדת!

# Evaluating the Single-Cycle Microarchitecture

# A Single-Cycle Microarchitecture

---

- Is *this* a good idea/design?
- When is this a good design?
- When is this a bad design?
- How can we design a better microarchitecture?



# A Single-Cycle Microarchitecture: Analysis

---

- Every instruction takes 1 cycle to execute
  - CPI (Cycles per instruction) is strictly 1
- How long each instruction takes is determined by how long the slowest instruction takes to execute
  - Even though many instructions do not need that long to execute
- Clock cycle time of the microarchitecture is determined by how long it takes to complete the slowest instruction
  - Critical path of the design is determined by the processing time of the slowest instruction

# What is the Slowest Instruction to Process?

- Let's go back to the basics
- All six phases of the instruction processing cycle take a *single machine clock cycle* to complete
  - Fetch
  - Decode
  - Evaluate Address
  - Fetch Operands
  - Execute
  - Store Result
- Do each of the above phases take the same time (latency) for all instructions?



1. Instruction fetch (IF)
2. Instruction decode and register operand fetch (ID/RF)
3. Execute/Evaluate memory address (EX/AG)
4. Memory operand fetch (MEM)
5. Store/writeback result (WB)

סביר שהפקודה האיטית ביותר תזדקק לכל השלבים הללו

# דוגמה - Single-Cycle Datapath Analysis

## זיהוי רק דוגמה –

- memory units (read or write): 200 ps
- ALU and adders: 100 ps
- register file (read or write): 50 ps
- other combinational logic: 0 ps

| steps     | IF  | ID | EX  | MEM | WB | Delay      |
|-----------|-----|----|-----|-----|----|------------|
| resources | mem | RF | ALU | mem | RF |            |
| R-type    | 200 | 50 | 100 |     | 50 | 400        |
| I-type    | 200 | 50 | 100 |     | 50 | 400        |
| LW        | 200 | 50 | 100 | 200 | 50 | 600        |
| SW        | 200 | 50 | 100 | 200 |    | 550        |
| Branch    | 200 | 50 | 100 |     |    | 350        |
| Jump      | 200 |    |     |     |    | 200 $10^4$ |

# Let's Find the Critical Path



# R-Type and I-Type ALU → critical path







# Branch Taken



# Jump



# עד עכשיו עיכוביים רק בגל ה- datapath מה קורה עם עיכוביים הצד של הבקרה?

## What About Control Logic?

- How does that affect the critical path?
- Food for thought for you:
  - Can control logic be on the critical path? (\*)
  - A note on CDC 5600: control store access too long...(\*\*)



(\*) בהחלט כן. ראו לפני 2 שקיים עברו Branch Taken. אותן הבקירה תלויות בנתונים ונוצרות השהיות. דוגמה נוספת: אותן הבקירה הקשורות(cache hit/miss) ב- pipeline נמצאים על הנטיב הקרייטי.

(\*\*) Seymour Cray היה המהנדס הראשי ב-CDC. הם קיצרו את ה-pipeline כדי להתגבר על בעיית זמן הגישה הארון



# What is the Slowest Instruction to Process?

- Memory is not magic
- What if memory *sometimes* takes 100ms to access?
- Does it make sense to have a simple register to register add or jump to take {100ms+all else to do a memory operation}?
- And, what if you need to access memory more than once to process an instruction?
  - Which instructions need this?
  - Do you provide multiple ports to memory?



# Single Cycle μArch: Complexity

---

- Contrived (=מונען)
  - All instructions run as slow as the slowest instruction
- Inefficient
  - All instructions run as slow as the slowest instruction
  - Need to replicate a resource if it is needed more than once by an instruction during different parts of the instruction processing cycle
- Not necessarily the simplest way to implement an ISA
  - Single-cycle implementation of REP MOVS (x86) or INDEX (VAX)?
- Not easy to optimize/improve performance
  - Optimizing the common case does not work (e.g. common instructions)
  - Need to optimize the worst case all the time

# (Micro)architecture Design Principles

- Critical path design (Guy: optimize for performance)
  - Find and decrease the maximum combinational logic delay
  - Break a path into multiple cycles if it takes too long
- Bread and butter (common case) design
  - Spend time and resources on where it matters most
    - i.e., improve what the machine is really designed to do
  - Common case vs. uncommon case
- Balanced design
  - Balance instruction/data flow through hardware components
  - Design to eliminate bottlenecks: balance the hardware for the work

למרות הרצון לאופטימיזציה של  
הנתיב הקרייטי כדי שנוכל  
להעלות את התדר, לא תמיד  
הדבר רצוי בגלל צריכת ההספק



P is proportional to  $f^3$

# Single-Cycle Design vs. Design Principles

---

- Critical path design
- Bread and butter (common case) design
- Balanced design

*How does a single-cycle microarchitecture fare in light of these principles?*

# Aside: System Design Principles

---

- When designing computer systems/architectures, it is important to follow good principles
- Remember: “principled design” from our first lecture(\*)
  - Frank Lloyd Wright: “architecture [...] based upon principle, and not upon precedent”

(\*) השנה דילגנו על פרק לויד רייט.... ראו השקף הבא

Guy: **Patterson and Hennessy design principles:**

- (1) Simplicity favors regularity;
- (2) Make the common case fast;
- (3) Smaller is faster;
- (4) Good design demands good compromises;

# Aside: From Lecture 1

---

- “architecture [...] based upon principle, and not upon precedent”



# Aside: System Design Principles

---

- We will continue to cover key principles in this course
  - Here are some references where you can learn more
- 
- Yale Patt, “[Requirements, Bottlenecks, and Good Fortune: Agents for Microprocessor Evolution](#),” Proc. of IEEE, 2001. (Levels of transformation, design point, etc)
  - Mike Flynn, “[Very High-Speed Computing Systems](#),” Proc. of IEEE, 1966. (Flynn’s Bottleneck → Balanced design)
  - Gene M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities", AFIPS Conference, April 1967. (Amdahl’s Law → Common-case design)
  - Butler W. Lampson, “[Hints for Computer System Design](#),” ACM Operating Systems Review, 1983.
    - <http://research.microsoft.com/pubs/68221/acrobat.pdf>

# Aside: One Important Principle



- Keep it simple
- “Everything should be made as simple as possible, but no simpler.”
  - Albert Einstein
- And, do not forget: “An engineer is a person who can do for a dime what any fool can do for a dollar.”

הציטוט המדיוק:

“An engineer can do for a dollar  
what any fool can do for two”

| Arthur Mellen Wellington                                                            |                                                                                     |
|-------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
|  |  |
| Born                                                                                | December 20, 1847<br>Waltham, Massachusetts, US                                     |
| Died                                                                                | May 17, 1895 (aged 47)<br>Manhattan, New York, US                                   |
| Resting place                                                                       | Woodlawn Cemetery                                                                   |
| Occupation                                                                          | Civil Engineer                                                                      |
| Spouse(s)                                                                           | Agnes Bates                                                                         |

# Multi-Cycle Microarchitectures