

# Lecture 3

R type: (Add, Sub) (rd, rs, rt)

$$A = B + C$$

$s_0 = 16$   
 $s_1 = 17$   
 $s_2 = 18$

<sup>32</sup> add \$5<sub>0</sub>, \$5<sub>1</sub>, \$5<sub>2</sub>;

| OP     | RS    | RT    | RD    | Shamt | funct  |
|--------|-------|-------|-------|-------|--------|
| 0      | 17    | 18    | 16    | 0     | 32     |
| 000000 | 10001 | 10010 | 10000 | 00000 | 100000 |

T type : (*lw*, *sw*, *addi*, *subi*, *beq*, *bne*) : (*rl*, offset(*rs*))

$$\frac{s_0 = 16}{s_0 = 16} \quad s_1 = 17$$

$$A[5] = A[10] + B$$

35  $\downarrow$  des( $nt^2$ )  $\nwarrow$  sourc  
lw \$to  $4 \times 10 (\$50)$

| OP     | RS     | RT   | Offset           |
|--------|--------|------|------------------|
| 35     | 16     | 8    | 40               |
| 100011 | 100000 | 1000 | 0000000000101000 |

add \$to, \$to, \$s1;

45 \$W \$10, 4x5(\$50);  
 source(πx) dest(πs)

| OP     | RS    | RT    | Offset          |
|--------|-------|-------|-----------------|
| 45     | 16    | 8     | 20              |
| 101101 | 10000 | 01000 | 000000000010100 |

## J type:

J Lab 1.100

|                    |                                                      |
|--------------------|------------------------------------------------------|
| OP.<br>2<br>000010 | offset/address<br>100<br>000000000000000000001100100 |
| 6 bit              | 26 bit                                               |

TOPIC NAME :

DAY:

TIME

DATE / /

$$\overset{s_0=16}{\tilde{A}[3]} = \overset{s_0}{\tilde{A}[2]} + \overset{s_1=17}{B} + 5$$

Assembly code:

35  $\text{lw } \$t_0, 2 \times 4(\$s_0)$   
 32  $\text{add } \$t_0, \$t_0, \$s_1$   
 8  $\text{addi } \$t_0, \$t_0, 5$   
 45  $\text{sw } \$t_0, 3 \times 4(\$s_0)$

machine Code:

| OP           | rs          | rt         | offset                |
|--------------|-------------|------------|-----------------------|
| 35<br>100011 | 16<br>10000 | 8<br>01000 | 8<br>0000000000001000 |

  

| OP            | rs         | rt          | rd         | shamt      | funct        |
|---------------|------------|-------------|------------|------------|--------------|
| 30<br>000 000 | 8<br>01000 | 17<br>10001 | 8<br>01000 | 0<br>00000 | 32<br>100000 |

| OP          | rs         | rt         | immediate             |
|-------------|------------|------------|-----------------------|
| 8<br>001000 | 8<br>01000 | 8<br>01000 | 5<br>0000000000000101 |

| OP           | rs          | rt         | offset                 |
|--------------|-------------|------------|------------------------|
| 45<br>101101 | 16<br>10000 | 8<br>01000 | 12<br>0000000000001100 |

Booth Algo:

$$2 \times (-3) \quad 2 \Rightarrow 0010$$

$$\downarrow \quad \downarrow$$

$$M_n \quad M_X \quad -3 \Rightarrow 1101$$

| $M_X$ | $B$                 |
|-------|---------------------|
| 0     | $0 \rightarrow NOP$ |
| 0     | $1 \rightarrow ADD$ |
| 1     | $0 \rightarrow SUB$ |
| 1     | $1 \rightarrow NOP$ |

| iteration | step                                                              | $M_n$ | $P(L)$       | $P(R)$       | $B$    |
|-----------|-------------------------------------------------------------------|-------|--------------|--------------|--------|
| 0         | init <sup>n</sup>                                                 | 0010  | 0000         | 1101         | 0      |
| 1         | 1. $P_0 B = 10$<br>1a. $P(L) = P(L) - M_n$<br>2. $\frac{ARS}{PB}$ |       | 1110<br>1111 | 1101<br>0110 | 0<br>1 |
| 2         | 1. $P_0 B = 01$<br>1a. $P(L) = P(L) + M_n$<br>2. $\frac{ARS}{PB}$ |       | 0001<br>0000 | 0110<br>1011 | 1<br>0 |
| 3.        | 1. $P_0 B = 10$<br>1b. $P(L) = P(L) - M_n$<br>2. $\frac{ARS}{PB}$ |       | 1110<br>1111 | 1011<br>0101 | 0<br>1 |
| 4.        | 1. $P_0 B = 11$<br>1e. NOP<br>2. $\frac{ARS}{PB}$                 |       |              | 1010         | 1      |





## Divide Version 2

$$D_n = 7 \Rightarrow 0111$$

$$D = 2 \Rightarrow 0010$$

| Iteration | Step                                                                                                                                                                       | Quotient | Divisor | Reminder                                         |
|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------|--------------------------------------------------|
| 0         | init                                                                                                                                                                       | 0 000    | 0010    | 0000 0111                                        |
| 1.        | 1. $\overleftarrow{R}$ 1bit<br>2. $R_{(L)} = R_{(L)} - D$<br>3b. $R_{(L)} = R_{(L)} + D$<br>$\overleftarrow{R}$ 1bit $R_0 = 0$<br>$\overleftarrow{Q}_U$ 1bit $Q_{U_0} = 0$ | 0 000    |         | 0000 1110<br>1110 1110<br>0000 1110<br>0001 1100 |
| 2.        | 2. $R_{(L)} = R_{(L)} - D$<br>3b. $R_{(L)} = R_{(L)} + D$<br>$\overleftarrow{R}$ 1bit $R_0 = 0$<br>$\overleftarrow{Q}_U$ 1bit $Q_{U_0} = 0$                                | 0 000    |         | 1111 1100<br>0001 1100<br>0011 1000              |
| 3.        | 2. $R_{(L)} = R_{(L)} - D$<br>3a. $\overleftarrow{R}$ 1bit $R_0 = 0$<br>$\overleftarrow{Q}_U$ 1bit $Q_{U_0} = 1$                                                           | 0 001    |         | 0001 1000<br>0011 0000                           |
| 4.        | 2. $R_{(L)} = R_{(L)} - D$<br>3a. $\overleftarrow{R}$ 1bit $R_0 = 0$<br>$\overleftarrow{Q}_U$ 1bit $Q_{U_0} = 1$                                                           | 0 011    |         | 0001 0000<br>0010 0000                           |
|           | $\overrightarrow{R}_{(L)}$ 1 bit                                                                                                                                           |          |         | 0001 0000                                        |

Dataflow diagram:



Hardware:



Divide Version 3 :

$$D_n = 7 \Rightarrow 0111$$

$$D = 2 \Rightarrow 0010$$

| Iteration | Step                                                                                                               | Divisor | Remainder                                        |
|-----------|--------------------------------------------------------------------------------------------------------------------|---------|--------------------------------------------------|
| 0         | init                                                                                                               | 0010    | 0000 0111                                        |
| 1.        | 1. $\overleftarrow{R}$ 1bit<br>2. $R(L) = R(L) - D$<br>3b. $R(L) = R(L) + D$<br>$\overleftarrow{R}$ 1bit $R_o = 0$ |         | 0000 1110<br>1110 1110<br>0000 1110<br>0001 1100 |
| 2.        | 2. $R(L) = R(L) - D$<br>3b. $R(L) = R(L) + D$<br>$\overleftarrow{R}$ 1bit $R_o = 0$                                |         | 1111 1100<br>0001 1100<br>0011 1000              |
| 3         | 2. $R(L) = R(L) - D$<br>3a. $\overleftarrow{R}$ 1bit $R_o = 1$                                                     |         | 0001 1000<br>0011 0001                           |
| 4.        | 2. $R(L) = R(L) - D$<br>3a. $\overleftarrow{R}$ 1bit $R_o = 1$                                                     |         | 0001 0001<br>0010 0011                           |
|           | $\overrightarrow{R}(L)$                                                                                            |         | 0001 0011                                        |

Dataflow diagram:



Hardware:



Read → lw  
Write → sw

TOPIC NAME:

Data path

DAY:

TIME:

DATE:

R-format Instructions:



I-format Instructions: (Load word)



Both R, I type



In MUX:  $R \rightarrow 0$

$I \rightarrow 1$

$\text{MemRead} = 1 \Rightarrow \text{lw}$

$\text{MemWrite} = 1 \Rightarrow \text{sw}$

### Hazards:

Structural hazards: different instructions, at different stages, in the pipeline want to use the same hardware resource.

Control hazards: succeeding instruction, to enter pipeline, depends on the outcome of a previous branch instruction, already in pipeline.

Data hazards: an instruction in the pipeline requires data to be computed by a previous instruction still in the pipeline.

$$CPI = \text{total clock cycle} / \text{total \# of instructions}$$

$$\text{speed up} = \frac{\text{nonpipeline time}}{\text{pipeline time}}$$

$$\text{Efficiency} = \frac{\text{total block}}{\text{used block}}$$

Nonpipeline latency = sum of time of instruction

Pipeline latency = Max time of a instruction<sup>(stage)</sup> \* # of instruction<sup>(stage)</sup>

Non pipeline time:

latency \* # of instruction

Pipeline time:

latency + (Max time of instruction \* (# of instruction - 1))

$$T = \frac{N}{P} + \frac{S}{P} \cdot \text{Latency}$$

Latency = 100ns, Number of instructions = 1000ns  
Number of pipeline stages = 4

Latency = 100ns, Number of instructions = 1000ns

Latency = 100ns, Number of instructions = 1000ns

Latency = 100ns, Number of instructions = 1000ns

### Mapping functions:

1. Direct mapped: one unique cache location for each memory block.

$$\text{cache block address} = (\text{requested address}) \% (\text{cache size})$$

2. Set Associative: each memory block can place in a unique set of cache location

$$\# \text{ of set} = \frac{\text{cache size}}{\# \text{ of way}} = \frac{8}{2} = 4$$

$$\text{cache block address} = (\text{requested address}) \% (\# \text{ of set})$$

3. Fully Associative: each memory block can place anywhere in cache

Miss  $\Rightarrow$  Data not in the cache, add the data.

Hit  $\Rightarrow$  Data present in the cache

For overwrite use LRU

12, 8, 25, 3, 8, 12, 11

For 11 overwrite it over 25

Mapping function:

12, 8, 25, 3, 8, 12, 11

For a cache with four 1-word blocks

1. Direct mapped

4 block in cache

|                              |         |                   |
|------------------------------|---------|-------------------|
| MEM[12]<br>MEM[8]<br>MEM[12] | MEM[25] | MEM[11]<br>MEM[3] |
| 0                            | 1       | 2                 |

6 Misses

2. Set Associative (2-way):

$$\# \text{ of set} = \frac{\text{Cache size}}{\# \text{ of way}} = \frac{4}{2} = 2$$

| 0        |        | 1                  |        |
|----------|--------|--------------------|--------|
| MEM[12]  | MEM[8] | MEM[11]<br>MEM[25] | MEM[3] |
| 0        | 1      | 0                  | 1      |
| 5 Misses |        |                    |        |

3. Fully Associative:

|         |        |                    |        |
|---------|--------|--------------------|--------|
|         |        | MEM[11]<br>MEM[25] | MEM[3] |
| MEM[12] | MEM[8] |                    |        |

$12 \rightarrow 0$  Miss  
 $8 \rightarrow 0$  Miss  
 $25 \rightarrow 1$  Miss  
 $3 \rightarrow 3$  Miss  
 $8 \rightarrow 0$  Hit  
 $12 \rightarrow 0$  Miss  
 $11 \rightarrow 3$  Miss

$12 \rightarrow 0$  Miss  
 $8 \rightarrow 0$  Miss  
 $25 \rightarrow 1$  Miss  
 $3 \rightarrow 1$  Miss  
 $8 \rightarrow 0$  Hit  
 $12 \rightarrow 0$  Hit  
 $11 \rightarrow 1$  Miss

$12 \rightarrow$  Miss  
 $8 \rightarrow$  Miss  
 $25 \rightarrow$  Miss  
 $3 \rightarrow$  Miss  
 $8 \rightarrow$  Hit  
 $12 \rightarrow$  Hit  
 $11 \rightarrow$  Miss