

## \* Chapter 1:-

- ✓ Moore's prediction
- ✓ Computers are pervasive

### ✓ IC cost:-

$$\begin{aligned} \cdot \text{Dies per wafer} &= \frac{\text{Wafer area}}{\text{Die area}} \\ \cdot \text{Cost per die} &= \frac{\text{Cost per wafer}}{\text{Die per wafer} \times \text{Yield}} \quad \text{Proportion of working dies} \\ \cdot \text{Yield} &= \frac{1}{1 + \left( \frac{\text{Defects per area} \times \text{Die area}}{2} \right)^2} \end{aligned}$$

### ✓ Judge performance on the basis of Throughput (efficiency)

$$\checkmark \text{Performance} = \frac{1}{\text{Execution time}}$$

$$\checkmark n = \frac{\text{Performance}_A}{\text{Performance}_B} = \frac{\text{Execution time}_B}{\text{Execution time}_A}$$

### ✓ CPU time = Burst time

### ✓ Elapsed time = CPU time + Others (idle, context switch, ...)

### ✓ CPU clock



✓ clock speed/rate

$$\checkmark \text{Clock cycle time} = \frac{1}{\text{clock rate}}$$

$$\checkmark \text{CPU time} = \text{No. of cycles required} \times \text{Clock cycle time} = \frac{\text{No of cycles required}}{\text{Clock rate}}$$

## ✓ CPI (Cost per instruction $\rightarrow$ R, I, J)

✓ Clock cycles required = Instruction Count  $\times$  CPI

✓ Avg CPI from an instruction mix

✓ Power = Capacitive load  $\times$  Voltage<sup>2</sup>  $\times$  Frequency

✓ Amdahl's Law:- Parallel Processing.

There will always be some series part along with parallel part.

$$\therefore T_{\text{new}} = \frac{T_{\text{parallel}}}{(\text{no. of cores})} + T_{\text{series}}$$

Improvement factor

$T_{\text{affected}} \rightarrow T_{\text{unaffected}}$

## W Chapter 2:-

- Addition, subtraction

add rd, rs1, rs2

Ex:- add a, b, c //  $a = b + c$

- Word = 32 bits

- 32 registers (64bit each)

- x0: the constant value 0
- x1: return address
- x2: stack pointer
- x3: global pointer
- x4: thread pointer
- x5 – x7, x28 – x31: temporaries
- x8: frame pointer
- x9, x18 – x27: saved registers
- x10 – x11: function arguments/results
- x12 – x17: function arguments



$f = (g + h) - (i + j); \quad \left. \begin{array}{l} \text{add } x5, x20, x21 \\ \text{add } x6, x22, x23 \\ \text{sub } x19, x5, x6 \end{array} \right\} f, \dots, j \text{ in } x19, x20, \dots, x23$

• Loading data from RAM :-  $ld \quad rd, \text{offset(Base)}$

• Storing data to RAM :-  $sd \quad rs, \text{offset(Base)}$

•  $A[12] = h + A[8]; \quad \left. \begin{array}{l} \text{ld } x5, 64(x22) \\ \text{add } x5, x21, x5 \\ \text{sd } x5, 96(x22) \end{array} \right\} h \text{ in } x21, \text{ base address of } A \text{ in } x22$

• Adding constants:-  $\text{addi } rd, rs, \underbrace{\text{const}}_{\text{immediate}}$

• R-format instruction:- (Arithmetic instruction)

| funct7 | rs2    | rs1    | funct3 | rd     | opcode |
|--------|--------|--------|--------|--------|--------|
| 7 bits | 5 bits | 5 bits | 3 bits | 5 bits | 7 bits |

• I-format instruction:- (constant operation, load)

| immediate | rs1    | funct3 | rd     | opcode |
|-----------|--------|--------|--------|--------|
| 12 bits   | 5 bits | 3 bits | 5 bits | 7 bits |

- S-format instruction :- (stone)

| imm[11:5] | rs2    | rs1    | funct3 | imm[4:0] | opcode |
|-----------|--------|--------|--------|----------|--------|
| 7 bits    | 5 bits | 5 bits | 3 bits | 5 bits   | 7 bits |

- Shift operation :-

| funct6 | immed  | rs1    | funct3 | rd     | opcode |
|--------|--------|--------|--------|--------|--------|
| 6 bits | 6 bits | 5 bits | 3 bits | 5 bits | 7 bits |

- Logical operations :-

#### ■ Instructions for bitwise manipulation

| Operation      | C  | Java | RISC-V     |
|----------------|----|------|------------|
| Shift left     | << | <<   | slli       |
| Shift right    | >> | >>>  | srli       |
| Bit-by-bit AND | &  | &    | and, andi  |
| Bit-by-bit OR  |    |      | or, ori    |
| Bit-by-bit XOR | ^  | ^    | xor, xorri |
| Bit-by-bit NOT | ~  | ~    |            |

} R-format  
→ NOT (all 1).  $1 \oplus 0 = 1$   
 $1 \oplus 1 = 0$

- Conditional / Branching :-

if  $x_5 = x_6$ :  
 $a += 1$   
else:  
 $b += 1$

bne  $x_5, x_6, \text{Else}$   
addi  $x_5, x_5, 1$   
beq  $x_0, x_0, \text{End}$   
Else:  
addi  $x_6, x_6, 1$

Endi.....

## • Looping:-

while (save[i] == k) i += 1;  
■ i in x22, k in x24, address of save in x25

Loop: sll i x5, x22, 3 //  $i \times 8$ ,  $\geq^3 = 8$   
add x5, x5, x25 // save +  $i \times 8$   
ld x9, 0(x5)  
bne x9, x24, Exit  
addi x22, x22, 1  
beav x0, x0, Loop

Exit: .....

## • More conditions:-

- blt (<)
- bge ( $\geq$ )

• Procedure call: jal x1, procedureLabel1  
" return: jalr x0, 0(x1)

## Leaf Procedure Example

### C code:

```
long long int leaf_example (  
    long long int g, long long int h,  
    long long int i, long long int j) {  
    long long int f;
```

### RISC-V code:

leaf\_example:

```
addi sp,sp,-24  
sd x5,16(sp)  
sd x6,8(sp)  
sd x20,0(sp)  
add x5,x10,x11
```

Save x5, x6, x20 on stack  
 $x5 = a + h$



```

long long int g, long long int h,
long long int i, long long int j) {
    long long int f;
    f = (g + h) - (i + j);
    return f;
}



- Arguments g, ..., j in x10, ..., x13
- f in x20
- temporaries x5, x6
- Need to save x5, x6, x20 on stack

```

```

sd x6,8(sp)
sd x20,0(sp)
add x5,x10,x11
add x6,x12,x1
sub x20,x5,x6
addi x10,x20,0
ld x20,0(sp)
ld x6,8(sp)
ld x5,16(sp)
addi sp,sp,24
jalr x0,0(x1)

```

x5 = g + h  
x6 = i + j  
f = x5 - x6  
copy f to return register  
Resore x5, x6, x20 from stack  
Return to caller

## Non-Leaf Procedure Example

### C code:

```

long long int fact (long long int n)
{
    if (n < 1) return 1
    else return n * fact(n - 1);
}

```

### fact:

|                     |                                          |
|---------------------|------------------------------------------|
| addi sp,sp,-16      | Save return address and n on stack       |
| sd x1,8(sp)         |                                          |
| sd x10,0(sp)        |                                          |
| addi x5,x10,-1      | x5 = n - 1                               |
| bge x5,x0,L1        | if n >= 1, go to L1                      |
| addi x10,x0,1       | Else, set return value to 1              |
| addi sp,sp,16       | Pop stack, don't bother restoring values |
| jalr x0,0(x1)       | Return                                   |
| L1: addi x10,x10,-1 | n = n - 1                                |
| jal x1,fact         | call fact(n-1)                           |
| addi x6,x10,0       | move result of fact(n - 1) to x6         |
| ld x10,0(sp)        | Restore caller's n                       |
| ld x1,8(sp)         | Restore caller's return address          |
| addi sp,sp,16       | Pop stack                                |
| mul x10,x10,x6      | return n * fact(n-1)                     |
| jalr x0,0(x1)       | return                                   |

### fact:

```

addi sp,sp,-16
sd x1,8(sp)           // return addr
sd x10,0(sp)          // n
addi x5,x0,1           // x5=1

```

bge x10, x5, Else // n ≥ 1

addi x10,x0,1 // return x10=1  
addi sp,sp,16  
jalr x0,0(x1)

### Else:

```

addi x10,x10,-1       // n = n - 1
jal x1,fact
addi x6,x10,0          // x6 = fact(n-1)

```

ld x10,0(sp) // x10 = caller's n

ld x1,8(sp) // caller's return address

addi sp,sp,16

mul x10,x10,x6 // n \* fact(n-1)

jalr x0,0(x1)