

## \* Fetch - Decode - Execute Cycle

↳ Fetch current instruction, execute it & fetch the next instruction and continue

## \* Hard Disc → is not memory

→ I/O device

input output

## \* Cache → 3 levels (Multilevel cache CPU)

Level 1 Cache → 2 MB  
(approx)

Level 2 Cache → 6 MB  
(<sup>11</sup>)

Level 3 Cache → 10 MB  
(<sup>11</sup>)



## \* Main Memory = DRAM

Secondary Memory = Floppy, Hard Disk, CD

Tertiary Memory = Tape, RAID

Cache " " = SRAM

## \* MIPS → unless otherwise mentioned, variables are always integers & signed.

→ comments start with #

→ Name of a register starts with '\$'

\* Processor → Memory : LOAD

\* Memory → Processor : STORE

Only operations  
in MIPS involving Memory

## \* Register numbers

$\$t_0 - \$t_7 \rightarrow \text{reg's } 8 - 15$

$\$t_8 - \$t_9 \rightarrow \text{reg's } 24 - 25$

$\$s_0 - \$s_7 \rightarrow \text{reg's } 16 - 23$

\*  $\$zero \rightarrow$  we can't ~~store~~ store anything  
↓ in it.

'0' is stored in it by default

\* Barrel shifter  $\rightarrow$  shift any no. of bits in  
registers in 'i' clock pulse

\* LOOP: ← Address B

Address A ← bne  $\$t_1, \$zero, \text{LOOP}$

In the encoding of this instruction,

bne  $\rightarrow$  6 bits

$\$t_1 \rightarrow$  5 bits

$\$zero \rightarrow$  5 bits

16 bits gone

⇒ Loop's address can't be  
stored in remaining 16 bits

Calcs displacement

$\therefore A + 4 - B$  is stored in the  
encoding

\*  $=, \neq$  are faster than  $<, >, \leq, \geq$  because to check if two numbers are equal, XOR them bitwise. To check if we got all 0's, ~~NOR~~ all the bits to see '1'.

\* `slt $rd, $rs, $rt`

If  $\$rs < \$rt \Rightarrow \$rd = 1$   
else  $\Rightarrow \$rd = 0$

\* procedure calling

$\$a0 - \$a3 \rightarrow$  arguments

$\$v0, \$v1 \rightarrow$  return values

$\$t0 - \$t9 \rightarrow$  temporaries

$\$s0 - \$s7 \rightarrow$  saved <sup>into memory</sup> (similar to static variables)

called callee saved registers

that which is called

Callee saves some values into memory. Before returning to the caller, it restores those values in these registers. [i.e. they are needed for caller & they're passed from caller]

$\$sp \rightarrow$  stack pointer

memory works like a stack which grows downward.

~~→ leaf procedure → no need of stack pointer~~



\* \$ra → return address

[address of the instruction in  
the caller after the call to  
callee].

main()

jal \$2  
add \$t1, \$t2, \$t3

\$2  
jr \$ra

\$SP → points to the last  
element inserted  
in stack.



\*



\* we cannot load a 32-bit constant in  
one instruction since the instruction  
itself is of 32 bits.

~~stop~~

\* LUI \$50, G1



\* ori \$50, \$50, 2304

decimal number

bitwise or of \$50, with (2304)<sub>2</sub> and store the result in \$50

\* ori \$50, \$51, 0x2121

0x is written to tell the assembler, it is hex notation.

\* MIPS → belongs to RISC Architecture

Reduced Instruction Set Computers

\* Response time (or) Latency

→ After start, the time after which results appear

\* throughput

→ Total work done per unit time

\* Instructions per Clock Cycle (IPC)

→ The average # of instructions executed per clock cycle.

For processors with a single execution unit,

$$|IPC|_{max} = 1$$

In practice, IPC is always < 1

$$CPI = \frac{1}{IPC}$$

CPI |<sub>ideal</sub> = 1,

Clock cycles Per Instruction. CPI |<sub>actual</sub> > 1

→ Goal of computer architecture, → IPC ↑ ⇔ CPI ↓

\* ~~I~~ CCT  $\leftarrow$  clock cycle Time  
IC  $\leftarrow$  no. of instructions in the program

Then, CPU-time = ~~IC  $\times$  CCT  $\times$  CPI~~  
 $= IC \times CPI \times CCT$

\* MIPS  $\rightarrow$  another MIPS  
(Millions of Instructions per Second)  
 $= \frac{\# \text{ of instructions executed}}{\text{Execution time of CPU}} \cdot \frac{1}{\text{Ex/CPU}} \times 10^6$

not used  
Since even if both instructions & Ex/CPU increases MIPS remains same

\*  $f \rightarrow$  clock

\* Amdahl's Law ("Law of Diminishing Returns"):

Time taken to execute a program using one CPU:  $T_1$

" "

'n' CPUs:  $T_n$

Then, speed up =  $\frac{T_1}{T_n}$

For one processor situation, let

Time taken to execute parallelizable part:  $P$

They can be run in parallel

non-parallelizable part:  $S$

$$* S+P = 1 \text{ (normalized)}$$

$$\therefore T_1 = P+S = 1$$

$$T_n = \frac{P}{n} + S$$

$$\therefore (\text{Speed up})_n = \frac{1}{\frac{P}{n} + S} = \frac{1}{S + \frac{P}{n}}$$

when  
 $P+S=1$   
(normalised)

$$\text{If } n \rightarrow \infty, \text{ Speedup}_n = \frac{1}{S}$$



\* SATA (Serial) Advanced Technology Attachment



↳ a fuzzy guide to the controller's instruction set in Hard disk

\* SCSI → Small Computer System Info.  
pronounced as  
skuzzy (or) Iscaze

\* Binary (Integer) Addition:



| $xy$ | $x+y$ | $x \oplus y$ |
|------|-------|--------------|
| 00   | 0     | 0            |
| 01   | 1     | 1            |
| 10   | 1     | 1            |
| 11   | 1     | 0            |

If we use  $Cout = xy + Cin(x \oplus y)$ ,



### \* FA with MUXes

→ MUXes can be implemented very efficiently using pass transistor logic (CMOS circuits).

$$\text{Sum} = (xy)Cin + (x\bar{y})\bar{Cin} + (\bar{x}y)\bar{Cin} + (\bar{x}\bar{y})Cin$$

$\downarrow$   
 $S_1, S_0$

optional for non-pass transistor style.



$$\text{Carry} = xy + cin(x \oplus y)$$

$$= xy \cdot 1 + Cin(x\bar{y} + \bar{x}y) + \bar{x}\bar{y} \cdot 0$$





### \* Overflow bits

→ For 2's complement number addition, usually  
Cout is discarded. However, Cout is useful in detecting overflows.

### \* Overflow

→ Two numbers of same sign are added but the result is of opposite sign.

$$\begin{array}{r}
 \text{A} \quad \begin{matrix} x_0 \\ 0x \end{matrix} \quad \begin{matrix} x_1 \\ x \end{matrix} \quad \begin{matrix} x_2 \\ x \end{matrix} \quad \dots \quad \begin{matrix} x_{n-1} \\ x \end{matrix} \quad \begin{matrix} x_n \\ x \end{matrix} \\
 \text{B} \quad \begin{matrix} x_0 \\ 0x \end{matrix} \quad \begin{matrix} x_1 \\ x \end{matrix} \quad \begin{matrix} x_2 \\ x \end{matrix} \quad \dots \quad \begin{matrix} x_{n-1} \\ x \end{matrix} \quad \begin{matrix} x_n \\ x \end{matrix} \\
 \hline
 \text{(1)} \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots
 \end{array}$$

$\rightarrow OV \stackrel{\text{def}}{=} x_{n-1}y_{n-1}s_{n-1} + \bar{x}_{n-1}\bar{y}_{n-1}s_{n-1}$

$= C_{n-1} \oplus C_n$

Homework

### \* Carry Lookahead Adder (CLA):

→ Idea: Unroll  $C_i$  recurrence relation to remove dependency on  $C_1, C_2, \dots, C_{i-1}$

→ ideally, leads to  $O(1)$  delay addition of two n-bit numbers.

But not feasible practically.

→ carry generate:  $g_i \stackrel{\text{def}}{=} x_i y_i$

$g_i = 1 \Rightarrow i\text{-th stage generated a carry of its own.}$

→ carry propagate:  $P_i = x_i \oplus y_i$

$P_i = 1 \Rightarrow i^{\text{th}}$ -stage will propagate the i/p carry  $C_i$  to  $(i+1)$ -th stage.

$$C_{i+1} = g_i + C_i P_i$$

$$* C_1 = g_1 + P_1 C_0$$

$$C_2 = g_2 + P_2(g_1 + P_1 C_0)$$

$$C_4 = g_4 + P_4 [g_3 + P_3 \{g_2 + P_2(g_1 + P_1 C_0)\}]$$

\* carry absorb,  $a_i = \bar{x}_i \bar{y}_i$

## \* Carry-save Adders:

→ Goal → assuming I have a fast carry lookahead adder with logarithmic delay, which I call "Logarithmic carry (look ahead) Adder" (LCA)

gives  $O(\lg n)$  delay for  $n$ -bit addition.

→ I will add  $m$  numbers of  $n$ -bit in time  $\underline{O(m\lg n)}$ .



$x: 1 \ 2 \ 3 \ 4 \ 5$

$y: 3 \ 8 \ 1 \ 7 \ 2$

$$\begin{array}{r} + \\ z: 2 \ 0 \ 5 \ 8 \ 7 \\ \hline s: 6 \ 0 \ 9 \ 9 \ 4 \end{array}$$

not  
considering  
carry

$x: 1 \ 2 \ 3 \ 4 \ 5$

$y: 3 \ 8 \ 1 \ 7 \ 2$

$$\begin{array}{r} + \\ z: 2 \ 0 \ 5 \ 8 \ 7 \\ \hline c: 1 \ 0 \ 1 \ 1 \ 0 \leftarrow c_{in} \end{array}$$

$\therefore \text{sum} = s+c.$

$$s_i = (x_i + y_i + z_i) \% 10$$

$$c_{i+1} = (x_i + y_i + z_i)/10$$

$\left[ \frac{1}{2}, \frac{1}{2} \right]$   
for binary

\*  $\{c, s\}$  generation of 3 binary numbers with any no. of bits can be done in

$O(1)$  time  
delay of one full adder.

\*  $c+s = \text{sum}$  can happen in  $O(\lg n)$  time where 'n' is the no. of bits in each number.

⇒ 3 numbers, each 'n' bit long can be added in  $O(\lg n)$  time.



\* One way of implementation :-



\* Another way:-





### \* Unsigned Multiplication

$$X = x_{n-1}x_{n-2} \dots x_0 = \sum_{j=0}^{n-1} x_j 2^j$$

$$Y = y_{n-1}y_{n-2} \dots y_0 = \sum_{j=0}^{n-1} y_j 2^j$$

$x_j, y_j \in \{0, 1\}$

$$P = X * Y$$

$$= \left( \sum_{i=0}^{n-1} 2^i x_i \right) Y = \sum_{i=0}^{n-1} \left[ \sum_{j=0}^{n-1} x_i y_j 2^j \right] 2^i$$

$y_{n-1} y_{n-2} \dots y_0$

$x_{n-1} x_{n-2} \dots x_0$

$$\begin{array}{r} x_0 y_{n-1} \dots x_0 y_0 \\ x_1 y_{n-1} \dots x_1 y_0 \\ x_2 y_{n-1} \dots x_2 y_0 \\ \vdots \\ x_{n-1} y_{n-1} \dots x_{n-1} y_0 \\ \hline \end{array} \Rightarrow x_0 2^0$$

### \* Sign magnitude Mult.

$$X = \cancel{x_s} x_{n-1} \dots x_0$$

$\uparrow$  Sign bit       $\underbrace{\quad}_{(n-1)\text{ bit}}$  Magnitude

$$Y = y_s y_{n-1} \dots y_0$$

$$P = XY$$

$$P = \{P_S, P_M\} \rightarrow \text{concatenate}$$

$$\boxed{P_S = X_S \oplus Y_S}$$

$$P_M = X_M Y_M$$

magnitude

## \* Unsigned sequential Multiplication

### 1. Left-shift Version

~~In  $i$  th iteration~~

In  $(i+1)$  th iteration, calculate

$$P_{i+1} = P_i + x_i 2^i Y, P_0 = 0$$

then  $\boxed{P_n = P} \rightarrow \text{Prove}$

### 2. Right-shift Scheme

$$P_0 = 0, P_{i+1} = (P_i + x_i Y 2^n) 2^{-1}$$

with  $\boxed{P_n = P} \rightarrow \text{Prove}$

$x_0$  terms: LS by  $n$ , RS by  $n \Rightarrow$  zero shift

$x_1$  terms: LS by  $n$ , RS by  $n-1 \Rightarrow 1$  shift



$$P = XY$$

$$P = \{P_S, P_M\} \rightarrow \text{concatenate}$$

$$P_S = X_S \oplus Y_S$$

$$P_M = X_M Y_M$$

magnitude

### \* Unsigned sequential Multiplication

#### 1. Left-shift Version

~~In  $i^{\text{th}}$  iteration~~

In  $(i+1)^{\text{th}}$  iteration, calculate

$$P_{i+1} = P_i + x_i \cdot 2^i Y, P_0 = 0$$

$$\text{then } \boxed{P_n = P} \rightarrow \text{Prove}$$

#### 2. Right-shift Scheme

$$P_0 = 0, P_{i+1} = (P_i + x_i \cdot Y \cdot 2^n) \cdot 2^{-1}$$

$$\text{with } \boxed{P_n = P} \rightarrow \text{Prove}$$

$x_0$  terms: LS by  $n$ , RS by  $n \Rightarrow$  zero shift

$x_1$  terms: LS by  $n$ , RS by  $n-1 \Rightarrow 1$  shift

### \* Signed (2's complement) Integer Mult. (sequential)

→ Booth's multiplication Algo (1951)

→ Main idea is to reduce the no. of addition/subtraction steps needed to be performed in n-bit multiplication

↓ useful in circuits that cannot perform add/sub & shift in the same clock cycle

↓

In ckts. where this can be done, B.M. does not save anything.

- Skips over range of 0's or 1's.
- Uses right shifts
- $n \times n = n^2$  multiplication  $\Rightarrow n$  clock cycles.

Let,

$$\begin{aligned}\text{multiplier } X &= x_{n-1} x_{n-2} \dots x_0 \\ &= x_{n-1} 2^{n-1} + \sum_{i=0}^{n-2} x_i 2^i\end{aligned}$$

both  
 $x, y$  in  
2's  
complement  
form

Multiplicand  $y = y_{n-1} y_{n-2} \dots y_0$

Consider bit strings / in  $X$

case (i):  $X^* = x_i x_{i-1} x_{i-2} \dots x_{i-k-1}$

Substring of  $X$  of length  $k+2$

$$= \underbrace{01111 \dots 10}_{k \text{ 1's flanked by two zeroes}}$$

case (ii):  $X^* = x_i x_{i-1} x_{i-2} \dots x_{i-k-1}$

$$= \underbrace{10000 \dots 01}_{k \text{ 0's flanked by two 1's}}$$

### \* Booth's Multiplication Rules

1. If  $x_s x_{s-1} = 01$ , add  $y$  to partial product & Right shift with sign extension

= 10, sub  $y$  from partial product (i.e add  $-y$ ) & Right shift with sign extension

= 00 or 11, only Right shift.

We need to justify the corrections of the rules for case(ii) and case(iii).

### Case(i):-

contribution of  $x^*$  in the product

$$= \sum_{j=i-k}^{i-1} 2^j y$$

According to Booth's rules, contribution of  $x^*$  to the product is

$$= 2^i y - 2^{i-k} \cdot y$$

$$= 2^{i-k} (2^k - 1) y$$

$$= 2^{i-k} \left( \sum_{j=0}^{k-1} 2^j \right) y$$

$$= \left( \sum_{j=0}^{k-1} (2^j + i-k) \right) y = \sum_{j=i-k}^{i-1} 2^j y$$

### Case(ii):-

contribution of  $x^*$  in the product

$$= -2^i y + 2^{i-k-1} y$$

$$= (-2^i + 2^{i-k-1}) y$$

According to Booth's rules, contribution of  $x^*$  to the product is

$$= -2^i y + 2^{i-k-1} y = (-2^i + 2^{i-k-1}) y$$

### Example:-

4-bit representation

$$x = -5 = 1011_1, y = 0111 = 7 \Rightarrow -y = 1001$$

Accumulator  $\rightarrow$  n-bit register

$\& \rightarrow (n+1)$ -bit register.

| <u>Step No.</u>               | <u>Action</u>        | <u>Accumulator (A)</u> | <u>Partial Product</u> | <u>Q</u>                        |
|-------------------------------|----------------------|------------------------|------------------------|---------------------------------|
| (init) 0                      | init                 | 0000                   |                        |                                 |
| 1                             | Sub $y$ from A & RS. | 1001<br>1100           | 10110                  | $Q[n]$<br>$10110 = \{x, 1b_0\}$ |
| 2                             | RS                   | 1110                   | 11011                  |                                 |
| 3                             | $A + y$<br>RS        | 0101<br>0010           | 01101                  |                                 |
| 4                             | $A - y$<br>RS        | 1011<br>1101           | 10110<br>11010         |                                 |
| $P = X * Y = \{A, Q[n+1:1]\}$ |                      |                        |                        |                                 |
| $= 11011101$                  |                      |                        |                        |                                 |

$$P = X * Y = \{A, Q[n+1:1]\} = 11011101$$

$$\therefore P = -35$$

$$*X = 2 = 0010, \quad Y = 6 = 0110 \Rightarrow -Y = 1010$$

| <u>Step No.</u> | <u>Action</u> | <u>Accumulator (A)</u> | <u>Q</u>       |
|-----------------|---------------|------------------------|----------------|
| 0               | init          | 0000                   | 00100          |
| 1               | RS            | 0000                   | 00010          |
| 2               | $A - y$<br>RS | 1010<br>1101           | 00010<br>00001 |
| 3               | $A + y$<br>RS | 0011<br>0001           | 00001<br>10000 |
| 4               | RS            | 0000                   | 11000          |

$$\therefore P = X * Y = 0000\ 1100 = 12$$

\*  $X = -77 = 1011\ 0011 \quad Y = -43 = 1101\ 0101$

$$P = 3311$$

$$= 0000\ 1100\ 1110\ 1111$$

→ Quiz-2 Syllabus

1. Computer Arithmetic

2. Hard disk performance

\*

### Computer Arithmetic



Q. Let the content of memory location  $X$  in MIPS architecture be  
 $0x8FFFC000$ . Who am I?

& In binary,  $1000\ 1111\ 1110\ 1111\ 1100\ 0000\ 0000$  ↗ 2's complement number

$X \Rightarrow -1880, 113, 152$  ↗ unsigned number

$X \Rightarrow 2,414, 854, 144$  ↗ 10-bit number

$X \Rightarrow \text{lw } \$t1, -16384(\$ra)$  ↗ Reg. 31

OP Code is 1000 11 ↗ Reg. 15

$$x \Rightarrow (1.873 \times 10^{-96})$$

## \* Representing real numbers in binary form

$$N = 17.25$$

$$= 172.5 \times 10^{-1}$$

$$= 1725 \times 10^{-2}$$

$$= 1.725 \times 10^3$$

$$= 1.725 \times 10^2$$

N may be very large  $+93 \times 10^{32}$

N maybe very small  $-93 \times 10^{-32}$

## IEEE 754 Single precision Floating point format

not stored  
↓  
implicit because it will always be '1' as binary has only 0's or 1's

implicit base

$$1.725 \times 10^3$$

exponent

sign (+/-)

Significand (or) mantissa



$$\therefore N = (-1)^S \cdot 1.M \times 2^{E-127}$$

In FP-arithmetic, no role of 2's complement

Q. Which decimal number is represented by:

$$N = \boxed{1 | 1000\ 0001 | 01000 \dots -11:00}$$

$$\therefore N = -1.01 \times 2^{129-127} = -1.01 \times 2^2$$

$$N = -101 = -5$$

Exponent E  $\Rightarrow$  8 bits floating point representation



$E = e + 127$   $\times$   $(2^{(2^{N-1}-1)})$

Floating point exponent

Actual exponent

Representation for

Q. Derive  $1.FFFF\ 754$

$$N = -575 \times 10^{-2}$$

6 steps

$N = -5.75 \times 10^0$  [ $1.000\ 0000/0$ ]

Convert to binary  $\rightarrow 0.111\ldots 111$

$N = -101.11$

Normalise  $\Rightarrow 1.0111 \times 2^2$

$N = -1.0111 \times 2^4$

Fp exponent  $E = 2 + 127 = 129$

$\rightarrow F = 1000\ 0001$

$1\ 1000\ 0001\ 011100\ \dots\ -0$

\* Distribution of exponents setting these values



Q. What is the largest positive normalized number representable in IEEE 754?

|               |               |
|---------------|---------------|
| E             | M             |
| 0   1111 1110 | 1111 ..... 11 |

$$\begin{aligned} \text{Max. +ve} &\Rightarrow 1.\underbrace{1111\dots11}_{23} \times 2^{254-127} \\ &= 1.\underbrace{1111\dots11}_{23} \times 2^{127} \\ &= (2 - 2^{-23}) \times 2^{127} \quad \left( \begin{array}{l} : 1.111\dots11 \\ + 1 \\ \hline 10.0\dots000 \\ \frac{1}{2} \end{array} \right) \end{aligned}$$

Min. +ve

|               |             |
|---------------|-------------|
| 0   0000 0001 | 000 ..... 0 |
|---------------|-------------|

$$\begin{aligned} \text{Min. +ve} &\Rightarrow +1.0 \times 2^{-127} \\ &= 2^{-126} \end{aligned}$$

\* Distribution of FP numbers on Real line



\*  $N_1$        $N_2$

two consecutive FP numbers

|              |              |
|--------------|--------------|
| 0   10000000 | 000000\dots0 |
|--------------|--------------|

|              |             |
|--------------|-------------|
| 0   10000000 | 0000\dots01 |
|--------------|-------------|

two consecutive +ve FP numbers

Because of the exponent, the separation b/w consecutive FP #'s increases.