

## • Topics

- Basic processing unit
- ALU operations
- Instruction execution
- Branch instruction
- Multiple bus organization
- Hardwired control
- Generation of control signals,
- Micro-programmed control
- Pipelining: Basic concepts of pipelining
- Performance
- Hazards-Data, Instruction and Control
- Influence on instruction sets.

## • Instruction Execution

1. Instruction Set processor (ISP) : The part CPU that reads n executes instructions
2. CPU : The main part
3. prog execution: progs run as a sequence of instructions
4. Execution : CPU executes

## • fundamental Concepts

1. Instruction fetch : The CPU fetches 1 instruc at a time
2. PC : prog. counter that holds the address of the next instruction
3. IR : holds instruc abt current execution

### Steps to execute :-

- ① Retrieve info in PC & load to IR
- ② PC increments by 4 bytes
- ③ Perform the op
- ④ CPU performs it using data transfer, ALU & mmry access

- Single bus organisation

### Parts

1. ALU
2. Regs - Store
3. digital circuits - gates
4. Internal path - to transfer data
5. Driver Circuits - transmit signal
6. Receiver Circuits - rec from external circuits

### Registers

- PC : Tracks add of next instruc
- MAR : Holds add of memy loc
- MDR : Holds data being transferred to memy
- Specific purpose reg : index / stack reg
- Temp reg : (y, z, temp)

Multiplexer : Select data to send to ALU , chooses b/w a reg's output & a const increment

Data path : refers to ALU , reg & bus together . It is the path data travels thru

- Four Steps of Instruction Execution

### 1. Register Transfer

Signals :

1.  $R_i$  in - Control loading data in  $R_i$  from bus
2.  $R_i$  out - " placing " in bus "  $R_i$

Clock cycles : All processor data transfers happen within defined clock cycles

multiple clock sig also used

data transfer : eg move data  $R_i$  to  $R_j$

$R_i$  set  $R_i$  out = 1  
then  $R_j$  set  $R_j$  in = 1

## 2. Performing ALU op

eg  $R_3 = R_1 + R_2$

S-1 :  $R_1$ 's Output &  $Y$ 's input are enabled to transfer

S-2 : Multiplexer selects signals

S-3 : Contents of  $Z$  in  $R_3$  as result

## 3. Fetching a word from memory

**MFC (Memory Func Completed)** : The processor waits for an MFC signal to indicate the completion of a memory op

eg : MOV (R1), R2

S-1 : Load  $R_1$ 's add to MAR

S-2 : Wait for MFC

S-3 : Then load data to MDR

S-4 : Transfer from MDR to  $R_2$

## 4. Storing a word to memory

eg : Move R2, (R1)

S-1 : Load MAR to  $R_1$ 's add

S-2 : Load MDR with  $R_2$ 's data & wait MFC

## • Branch Instruction

Updates the PC with a new add, directing the flow of the program

Offset  $\alpha$  : The difference b/w the target add & add next inst

eg : PC out, MAR in, Read, Add,  $Z_{ir}$   
 $Z_{out}$ , PC in,  $Y_{in}$   
IR out, Add,  $Z_{ir}$   
 $Z_{out}$ , PC in, EN $\#$

- Multiple Bus

3-bus setup : Uses 3 buses (A, B & C) for efficient data transfer

Regs : Stores all gen purpose regs with C input & A, B output

ALU : The ALU can directly pass one of its inputs (A/B) as specified by control signals

PC : It inc by 4 bytes after loading

| Component | I      | O       |
|-----------|--------|---------|
| PC        | PCin   | PCout   |
| Register  | R1 in  | R1 out  |
| ALU       | A, B4  | R = B   |
| IR        | IRin   | IR out  |
| MDR       | MDRin  | MDR out |
| MAR       | MAR in | MAR out |

### Hardwired Control

- Control Unit Signals

These sig control steps of instruction execution in the processor

i) Microprogrammed Control :-

- uses a seq of binary codes to gen sig
- Adaptable but slower eg (CISC)

ii) Hardwired Control

- They use dedicated circuits to gen sig
- finite state machine
- Fast but not flexible eg (RISC)

Here, Control sig are influenced by :-

- Step Counter : track current step
- IR : Holds active step
- Condition Flags : takes BRANCH decision

#### 4. External Sig : MFC type outside processor

- Each step completed in one clock cycle
- Counter keeps track of it

eg Add (R3), R1



##### Key Points

- PC: Holds the address of the next instruction.
- MAR: Holds the memory address for reading or writing data.
- MDR: Temporarily holds data read from or written to memory.
- IR: Holds the current instruction being executed.
- Y and Z: Temporary registers used for holding intermediate values.

#### • Logic function for Signals

##### 1. Zin

- It triggers for ADD & BRANC instruc
- Active based on Control steps

##### 2. END

- Marks completion of an instruc
- Triggered / Active in ADD & Unconditional branching

##### 3. RUN

- Advances the counter each clock cycle when set to 0
- Micro-programmed Control
  - Microprogrammed control generates control signals using a seq of bits similar to machine lang
  - Each bit indicates specific sig eg PC out at 1 or Pin at 0

- Control Word

eg Add ( $R_3$ ),  $R_1$

- Here Every step in an instruc execution requires generating a specific Control word (CW)
- each CW in this sequence is called a microinstruc
- Each CPU instruc has its own microroutine
- They are stored in the control store
- Basic Organisation of Microprog

- The microprog Counter (MPC) reads CWs from Control store
- When new instruc is loaded into IR
- Add generator provides starting add
- MPC inc with each clock cycle
- Basic microprog. lacks ability to check condition codes / external inputs.
- These microinstruc can branch to diff add based on external condition

- Branching in Microinstruc

1. Microinstruction can perform conditional jumps with microroutines
2. Control units are improved to handle these condtn and branch as needed

- Changes in starting add generator

They consider : IR

External Inputs

Condition codes

- Reducing Microinstruc size

Allocating 1 bit for each control sig would make microinstruc long & inefficient

To optimize : Group mutually exclusive sig  
Use binary coding

## • Pipelining

It's a technique used to improve processor performance by allowing simultaneous execution of multiple instructions

Utilizes faster hardware

Arranging hardware so mul ops can occur simultaneously

### Instruction pipelining

Here, multiple instruc executed in parallel  
So a current instruc doesn't wait for prev to finish

#### 2-stage pipeline

Cycle 1 : Fetch instruc  $I_1$  & store in buffer  $B_1$   
Cycle 2 : " "  $I_2$ , while  $I_1$  in "  $B_1$

This cycle continues

Here there are 2 units : fetch & execution

#### 4-stage pipeline

- 4 stages : 1. fetch  
2. decode  
3. execute  
4. write

Cycle 1 :  $I_1$  fetched

2 :  $I_2$  fetched,  $I_1$  decoded

3 :  $I_3$  " ,  $I_2$  " ,  $I_1$  executed

4 :  $I_n$  " ,  $I_3$  " ,  $I_2$  " ,  $I_1$  written

- Pipeline performance

- pipelining inc CPU instruc throughput
- 4-stage pipelining is 4 times faster

### Factors influencing performance :-

1. Cache miss
2. Hazard detection
3. execution time

### Benefits :-

1. Inc CPU throughput
2. reduces execution time

### Challenges

1. Handling stalls due to cache misses
2. Real-world factors often reduce actual gains
3. Manage sync

### • Hazards

- Hazards are situations that prevent the next instruc in a pipeline from executing on time
- They reduce performance

### • Types

#### 1. Structural

Happens during a resource conflict coz h/w can't handle multiple instruc

#### 2. Data

Occurs when curr step depends on result of prev step

#### 3. Control

Arise from branch causing delay

## • Structural Hazards

- They occur due to conflict in resource
- Single memory for 2 instruc
- Due to this pipeline waits



## • Data Hazards

Occurs when curr step depends on result of prev step

eg Add R<sub>2</sub>, R<sub>3</sub>, R<sub>1</sub> followed by SUB R<sub>4</sub>, R<sub>1</sub>, R<sub>5</sub>

Here SUB needs result of ADD

Stalls are necessary if instruc need data that isn't ready



- Branches

- Unconditional Branch

- Branch instruc can disrupt pipeline by changing PC
- Branch penalty : It's the time lost due to branch instruction

- Instruction Queue

- processors use instruction queues to minimize stalls
- The fetch unit fills queue with instruc
- Due to this pipeline is activi

- Branch Prediction

- It tries to guess whether a branch will be taken or not
- eg. if one branch not taken, then it shld correct mistake to avoid delay

Types :

1. Static : decides before prog runs
2. Dynamic : changes based on prog's history

- Algorithms

A simple prediction has 2 stats

LT (likely to be Taken)  
LNT ( " not "

- 4-state Algorithm

It's more advanced with 4 stats

1. ST : Strong likely to take
2. LT : "
3. LNT : " nor "
4. SNT : strongly not "

eg Initially LNT  
If taken then change to ST else SNT

- Addressing Modes

1. Complex AM

May save prog but cause pipeline stalls & harder for compilers

Adv : reduces no. of instruc  
disadv : causes pipeline stall

2. Simple AM

Faster & Easier to handle in a pipeline

- Conditional Codes

- Conditional code (flags) can limit the flexibility of instruction reordering
- To allow more reordering , it's better if flags are affected by few instruc as possible