

CSE340  
Chapter-4

4 Ishmael the  
cat

CPU time = Instruction count  $\times$  CPI  $\times$   
clock cycle time.

Two types of micro-V implementation:

- ① A simplified version (slow, too much time wasted)
- ② A more realistic pipelined version (optimized)

Instruction 3 category → Main memory associated

- ① memory reference : ld, sd.
- ② Arithmetic/logical : add, sub, and, or
- ③ Control transfer : beq.  
→ Branching.

## Instruction Execution.

PC → program counter register

→ Contains the address of the instruction in the program being executed.



# 8000 → add X<sub>20</sub>, X<sub>21</sub>, X<sub>0</sub>.

Fetch → Decode (source, destination).

|                                                    |                                              |                                              |                                        |                                                                                               |
|----------------------------------------------------|----------------------------------------------|----------------------------------------------|----------------------------------------|-----------------------------------------------------------------------------------------------|
| $\overline{\text{XXXX XXXO}}$<br>$\overline{f_2.}$ | $\overline{0101}$<br>$\overline{\text{rs2}}$ | $\overline{1010}$<br>$\overline{\text{rs4}}$ | $\overline{0XXX}$<br>$\overline{f_3.}$ | $\overline{0011}$<br>$\overline{0XXX}$<br>$\overline{\text{XXXX}}$<br>$\overline{\text{op.}}$ |
|----------------------------------------------------|----------------------------------------------|----------------------------------------------|----------------------------------------|-----------------------------------------------------------------------------------------------|

# Explanation of register file.



Reg write.

.....  
add  $x_{20}, x_{21}$ ,  $x_{20}$



## Explanation of ALU.



## Data memory & Immediate Generation unit



a. Data Memory Unit.



b. immediate gene. unit



## R-type format.

- Read two register operand
- Perform logical operation.
- write register result.

add  $X_{21}, X_{22}, X_{23}$ .



## Load/Store instruction

- Read register operands.
- Calculate address using 12-bit operand
  - \* USE ALU but sign extended oper. offset.
- Load: Read memory and update register
- Store: Write register value for memory.



LD  $x_{21}$ , 16 ( $x_{22}$ )  
SD  $x_{21}$ , 16 ( $x_{23}$ )

## Branch Instruction (BEQ).



Control unit

ALU cont. unit



ALU OP → 00 → ld / store / addi

01 → Bea

10 → R-type.

## ALU Control



অব্যাক হোলা 64 bit টি জ্বান মনে, তখন দুটি bit change হবে

## Datapath for Instruction Fetch.

### Building Datapaths.

#### Instruction Fetch:

- o Access the program counter.
- o Pass the address/location to 'inst mem.'
- o 'Ins' memory outputs the respective instruction.
- o Increment the PC.

Slow  
Ishmaan

## Datapath - R-type



## load instruction.

1d  $\times 21$



## Store Instruction

Lulinox®  
L'irriconoscibile INN

Azonox®  
Irriconoscibile USP



## Instruction







# Pipelining Analogy

Pipelined laundry:

Instruction :

- (I) Wash. (W)
- (II) dry. (D)
- (III) Fold. (F)
- (IV) store (S).

| Time | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 1 |   |   |
|------|---|---|---|---|----|----|----|---|---|---|
| A    | W | D | F | S |    |    |    |   |   |   |
| B    |   |   | W | D | F  | S  |    |   |   |   |
| C    |   |   |   |   | W  | D  | F  | S |   |   |
| D    |   |   |   |   |    |    | W  | D | F | S |
| Pipe |   |   |   |   |    |    |    |   |   |   |
| A    | W | D | F | S |    |    |    |   |   |   |
| B    |   | W | D | F | S  |    |    |   |   |   |
| C    |   |   | W | D | F  | S  |    |   |   |   |
| D    |   |   |   | W | D  | F  | S  |   |   |   |

## Clock period calc. for single cycle 8 pipeline.

(I) Instruction fetch  $\rightarrow$  10 ps

(II) Register file read/write  $\rightarrow$  5 ps.

(III) Memory read / write  $\rightarrow$  6 ps.

(IV) ALU operation  $\rightarrow$  10 ps.



$$\text{Add} \rightarrow \text{IF} + \text{RR} + \text{ALU} + \text{R.W} = 10 + 5 + 10 + 5 = 30 \text{ ps}$$

$$\text{SD} \rightarrow \text{IF} + \text{R.R} + \text{ALU} + \text{M.W} = 10 + 5 + 10 + 6$$

$$\text{LD} \rightarrow \text{IF} + \text{RR} + \text{ALU} + \text{MR} + \text{R.W} = 31 \text{ ps}$$

$$\text{ld } x_{21}, 34(x_{20}) \rightarrow \text{IF} + \text{RR} + \text{ALU} + \text{MR} + \text{R.W} = 10 + 5 + 10 + 6 + 5 = 36 \text{ ps.}$$

\* single cycle datapath  $\rightarrow$  instruction for run  
ব্যবহার করে সর্বোচ্চ সময় লাভ করে একটি ক্লক পরিয়ে.

Clock period 36 ps.

Pipelined  $\xrightarrow{\text{IF} \rightarrow \text{ID}}$

$$\text{IF} = 5$$

and

$$\text{ID} = 10$$

or

$$\text{EX} = 11$$

Add

$$\text{Mem} = 10$$

LD.

$$\text{WB} = 5$$

Clock period.

## Clock period calc. for single cycle 8 pipeline.

① Instruction fetch  $\rightarrow 10\text{ ps}$

(II) Register file read/write  $\rightarrow 5\text{ ps}$ .

(III) Memory read / write  $\rightarrow 6\text{ ps}$ .

(IV) ALU operation  $\rightarrow 10\text{ ps}$ .

$$\text{Add} \rightarrow \text{IF} + \text{RR} + \text{ALU} + \text{R.W} = 10 + 5 + 10 + 5 \\ = 30\text{ ps}$$

$$\text{SD} \rightarrow \text{IF} + \text{R.R} + \text{ALU} + \text{M.W} = 10 + 5 + 10 + 6 \\ = 31\text{ ps}$$

$$\text{LD} \rightarrow \text{IF} + \text{RR} + \text{ALU} + \text{MR} + \text{R.W} = \\ = 10 + 5 + 10 + 6 + 5$$

$$(\text{Id } x_{21}, 34(x_{20})) \\ = 36\text{ ps.}$$

\* single cycle datapath  $\Rightarrow$  instruction for run  
এবং highest time কাজের একটি হচ্ছে clock period.

Clock period 36ps.

Pipelined  $\xrightarrow{\text{IF, ID}}$

$$\text{IF} = 5$$

and

$$\text{ID} = 10$$

or

$$\text{Ex} = 11$$

Add

$$\text{Mem} = 10$$

LD.

$$\text{WB} = 5$$

Clock period.

## Structural Hazard

|   | cc1 | cc2 | cc3 | cc4 | cc5 | cc6 | cc7 | cc8 | cc9 | cc10                  |
|---|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----------------------|
| ① | IF  | ID  | Ex  | MEM | WB  |     |     |     |     |                       |
| ② |     | IF  | ID  | Ex  | MEM | WB  |     |     |     |                       |
| ③ |     |     | IF  | ID  | Ex  | MEM | WB  |     |     |                       |
| ④ |     |     |     | IF  | ID  | Ex  | MEM | WB  |     |                       |
| ⑤ |     |     |     |     | IF  | ID  | Ex  | MEM | WB  |                       |
|   |     |     |     |     |     |     |     |     |     | ↑                     |
|   |     |     |     |     |     |     |     |     |     | No structural hazard. |
|   |     |     |     |     |     |     |     |     |     |                       |
|   |     |     |     |     | IF  | ID  | Ex  | MEM | WB  |                       |
|   |     |     |     |     |     |     |     |     |     | ↑ Structural hazard   |

## Data Hazard

Add  $x_{21}, x_{22}, x_{23}$ .

Sub  $x_{24}, x_{21}, x_{20}$



⚠ Data Hazard.

## Resolving Data Hazard using stalling.

Id  $x_{20}, 40(x_{21})$

Add  $x_{22}, x_{20}, x_{21}$

sub  $x_{22}, x_{22}, x_{20}$

Add  $x_{22}, x_{22}, x_{21}$

slli  $x_{22}, x_{22}, 5$

srl  $x_{25}, x_{25}, 8$  → Independent



|     | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
|-----|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|
| IF  | X |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |
| ID  |   | X | X | X | X | X | X | X | X | X  | X  | X  | X  | X  | X  | X  | X  |
| Ex  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |
| Mem |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |
| WB  |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |

IF ID EX MEM WB (Because WB updates in first half  
and read in the second half).

Lorix® Plus

Sertacon®

Total 18 block cycle.

CPI = Clock cycle per instruction

$$= \frac{18}{6} = 3$$

→ 6 or instruction

## Resolving Data Hazard Using Forwarding

### Forwarding (Alka Bypassing)

- ① Use result when it is computed.
- ②
  - ③ Don't wait for it to be stored in a register
  - ④ Requires extra connection

R-type,      stall = 8  
Forward = 6

add  $x_1, x_2, x_3$ .



Sub  $x_4, \textcircled{x}_1, x_5$





So, only ~~stall~~ forwarding isn't enough.  
we need ~~=~~ stall + forwarding.

### Code scheduling

Rearranging the code sequence to overcome data hazard.

How? → If one line is independent  
 ↳ ~~JMP~~ stall ~~JNE~~ ~~CMP~~  
 ↳ ~~JNE~~ ~~CMP~~.