

## Computer Organization and Design

D.A. Patterson

5th edition, Morgan Kaufmann

## • DLD

- Simplifications (not k-map; use Boolean Algebra)

## • MUX (multiplexers design)

## • ALU (arithmetic logic unit)

## • Adder

## • Registers / Shift registers

## • Internal Structure of Register File

## 32-bit Architecture

$$x = \underbrace{\$10}_{y} + \underbrace{\$11}_{z}$$

Add \$10, \$11, \$12 - R

D S<sub>1</sub> S<sub>2</sub>

| OP    | R <sub>s</sub> | R <sub>t</sub> | R <sub>d</sub> | Shft Fun          |
|-------|----------------|----------------|----------------|-------------------|
| 00000 | 01011          | 01100          | 01010          | 00000   x x x x x |

feed this and this is what the processor will understand

- convert into assembly instructions and identify types (R, I, J)
- each form has diff types of encoding
- all the instructions will be 32-bits.

38A + 58B + 38C + 38D + 38E + 38F = H

38A + 58B + 38C + 38D + 38E + 38F = P = F

(38A) 58B (38C) 58D (38E) 58F = B E

• (38A + B) • (38B + C) • (38C + D) • (38D + E) •

(38E + F) • (38F + A) =

$$Y = \overline{A \cdot B \cdot C} (A \cdot B + A \cdot C)$$

| A | B | C | $A \cdot B \cdot C$ | $\overline{A \cdot B \cdot C}$ | $A \cdot B$ | $A \cdot C$ | $A \cdot B + A \cdot C$ | Y                                  |
|---|---|---|---------------------|--------------------------------|-------------|-------------|-------------------------|------------------------------------|
| 0 | 0 | 0 | 0                   | 1                              | 0           | 0           | 0                       | 0                                  |
| 0 | 0 | 1 | 0                   | 1                              | 0           | 0           | 0                       | 0                                  |
| 0 | 1 | 0 | 0                   | 1                              | 0           | 0           | 0                       | 0                                  |
| 0 | 1 | 1 | 0                   | 1                              | 0           | 0           | 0                       | 0                                  |
| 1 | 0 | 0 | 0                   | 1                              | 0           | 0           | 0                       | 0                                  |
| 1 | 0 | 1 | 0                   | 1                              | 0           | 1           | 1                       | $\rightarrow (A \cdot B \cdot C')$ |
| 1 | 1 | 0 | 0                   | 1                              | 1           | 0           | 1                       | $\rightarrow (A \cdot B \cdot C')$ |
| 1 | 1 | 1 | 1                   | 0                              | 1           | 1           | 1                       | 0                                  |



$$\bar{Y} = A'B'C' + A'B'C + ABC' + A'BC + AB'C + ABC$$

$$\bar{Y} = Y = \overline{A'B'C'} + \overline{A'B'C} + \overline{ABC'} + \overline{A'BC} + \overline{AB'C} + \overline{ABC}$$

(product of sum)

POS

$$\Rightarrow Y = \overline{(A'B'C')} \overline{(A'B'C)} \overline{(ABC')} \overline{(A'BC)} \overline{(AB'C)} \overline{(ABC)}$$

$$= (A+B+C) \cdot (A+B+C') \cdot (A'+B+C) + (A+B+C') \cdot (A'+B+C')$$

$$(A'+B+C) \cdot (A'+B'+C')$$

# If S

Then  $y := B$

ELSE  $y := A$



$$S = A + \bar{B}A = A$$

$$S = \bar{B} + BA = B$$

$$S = \bar{B}A + \bar{B}A = 0$$

Encoded result

in binary form S

is 00000000000000000000000000000000

00000000000000000000000000000000

8 nippled  
stage.



PLA → Programmable Logic Array

Functionality - Boolean expressions



$$y_1 = AB + \bar{A}B\bar{C}$$

$$y_2 = A\bar{B} + ABC$$

$$y_3 = \bar{A}B\bar{C} + ABC$$



3 combinations in

$y_1, y_2, y_3$ :

$A\bar{B}$ ,  $\bar{A}B\bar{C}$ , and  $ABC$



$$y_1 = A\bar{B} + \bar{A}B\bar{C}$$

$$y_2 = A\bar{B} + ABC$$

$$y_3 = AB' + B'C' + A$$

$$y_1 = \underline{\underline{ABC}} + \underline{\underline{ABC}}$$

$$y_2 = \underline{\underline{ABC'}} + \underline{\underline{ABC}}$$

$$y_3 = \underline{\underline{AB'C'}} + \underline{\underline{AB'C}}$$

6 AND  
3 OR

PLA is always  
one circuit

regardless of how many expressions,  
PLA is one circuit.

#

RCA

Ripple Adder (full adder)



Ripple Carry Adder  
(RCA)

not efficient

It is called a ripple carry adder because each carry-bit gets rippled into the next stage.

The carry has to propagate from LSB to MSB.



### Conditional Sum Adder / Fast Adder



$$A - B = A + \bar{B} + 1$$

$A + B$

Binary



00: Add/Sub

01: A and B

10: A or B

ALU - Arithmetic Logic Unit

(2 mux)



To build the register, we need flipflops.

## # Registers



32-bit register

- When we read, we're to read the entire registers.

\* exception: 1b (allows to access first 8-bit of the registers)  
byte load byte

- 1b → load byte (allows to access the first 8 bits of the register)

otherwise, we've to take /read values for the entire register

|           |         |
|-----------|---------|
| 1 0 0     | 1 0 0 0 |
| 1 0 0 0   | 0 1 0 0 |
| 1 0 0 0 0 | 0 0 1 0 |

left shift

L.S.

$x * 2^i$

right shift

R.S.

$x / 2^i$

i → no. of bit shift

left shift operation (mult)

R.S.

$x / 2^i$

right shift operation (div)

our notation



Q2 Q3 - 13

$$x = 3y + 5z$$

$$y = 4x + 3z$$

$$z = 5x - 6y$$

Add \$10, \$11, \$12 — R type ins

mult \$11, \$12 — R type ins

Result [Lo, hi]

### Designing Right Shift Registers



Right Shift Register

Working of RSR

F 00 : Feedback

random no.

R 01 : Reset

coming from some  
external source

10 : Set

11 : External data

82J

(shift right 28)

82M

(0 fid)

(IE fid)



Left Shift Registers

|   |                    |
|---|--------------------|
| 0 | 00 : External data |
| 1 | 01 : Set           |
| 2 | 10 : Reset         |
| 3 | 11 : FB            |



See:  
next page

|                    |
|--------------------|
| 00 : LS            |
| 01 : RS            |
| 11 : R S           |
| 01 : Reset         |
| 10 : External data |



00 : LS

11 : RS

01 : Reset

10 : External data

00 : RZ

01 : ZR



Chapter 1

- Moore's Law

- API

- Interface

- Parallel Processing



Application Software is written in high-level language.

Controller

Control Unit responsible for allowing CPU to carry out operations.

- datapath: components & path connecting CPU

Abstraction - hiding the detail from user

Computer Architecture

- designing, selecting, establishing interconnection
- interface to achieve some goals like functionalities.





\* definitionen schatte diagramm.

memory tends to get smaller higher-up. the memory hierarchy.



• multiprocessors share a RAM / memory.



We need trade-offs when considering performance.

• Response time and throughput.

✓ Relative performance

$$\frac{P_x}{P_y} = \frac{E_y}{E_x} = n$$

$P \rightarrow$  performance

$E \rightarrow$  execution time

standard for fastest with slowest times

$x$  is  $n$  times faster than  $y$



Q. If program has 8 clock cycles, then, CPU time?

Ans: CPU Time = 25s

for comp B; we need higher clocks i.e. higher GHz.  $\frac{\$}{GHz}$   $\frac{\$}{GHz} \approx \$28$

| T <sub>1</sub> | T <sub>2</sub> | T <sub>3</sub> | T <sub>4</sub> | T <sub>5</sub> | $R_E = 3 + 4 \times 2 + \text{others waiting}$ |
|----------------|----------------|----------------|----------------|----------------|------------------------------------------------|
| 1s             | 1s             | 1s             | 1s             | 1s             | $(3+4) - (4+3) = 6 \text{ cycles}$             |



RISC vs. CISC

longer programs  $\Rightarrow \$28 \approx \$8$

faster programs  $\Rightarrow \$28 \approx \$2$

most number  $\Rightarrow \$2 \approx \$8$

Computer Language

Machine Language → Assembly  
Instruction

- identify the registers which store the values of variable



(Aug 2019 notes)

$\$t_0 \sim \$t_7$  → standard argument registers used in assembly code  
 $\$S_0 \sim \$S_7$

$$S \rightarrow f = (g + h) - (l + j)$$

position create  $t_1 \times t_2 = L^A$



\*  $\$0 / \$2000$

\*  $\$ra / \$31$  → give return address to main func

$\$sp$  → points to stack

$\$fp$  → frame-pointer points to stack

$\$a_0 \sim \$a_3$  → argument registers used in assembly instruction

$\$v_0 \sim \$v_1$  → return func



load (memory  $\rightarrow$  register)

store (register  $\rightarrow$  memory)



$$\boxed{\begin{array}{ccc} \text{store} & & \text{load} \\ A[10] & = & y + A[6] \end{array}}$$

• memory is byte-addressed

Processor operations mostly involve processing data.

To speed up the processor operations, the processor includes some internal memory storage locations, called registers.

The registers store data elements for processing without having to access the memory.

A limited no. of registers are built into the processor chip.

## # Registers vs. Memory

- Load and store operations are Memory Operations.

32 registers with each 32-bit.

Some of these are reserved.

T-series (Temporary value) } available to us  
S-series (safe value) }

$$\begin{array}{c} \text{load} \quad \text{store} \\ [0]A + B = [01]A \end{array}$$

- spill

- optimization

$$\begin{array}{ccc} \$10 & \$11 & \$12 \\ x = y + 2; \end{array}$$

load, store, immediate

$$x = y + 10;$$

$$\text{Add } \$10, \$11, \$12 \rightarrow R\text{-type}$$

$$\text{Add } i: \$10, \$11, (10) \rightarrow I\text{-type}$$

$$x = y - 10;$$

~~x Subi \$10, \$11, 10 (doesn't exist; No subtract immediate instruction; Just use a negative constant)~~

~~$$\text{Addi } \$10, \$11, -10$$~~

Please Note:

Registers are processor components that hold data and address.

Fig: Registers file



ALUOP



Add \$10, \$11, \$12

D Rs

RT

Data-link to carry data

- Registers size on operation by



(no. of register  
changed,  
bit-length changed)

- \* Better to encode numbers rather than names.

| OP     | R <sub>s</sub> | R <sub>t</sub> | offset    |
|--------|----------------|----------------|-----------|
| 100110 | 01011          | 01010          | 0-0101000 |

I-format

lw, \$10(\$40(\$11))  
load  
D offset

lw \$10, 40(\$11)  
load

sw \$12, 44(\$11)  
store

sw \$12, 44(\$11)  
store

Address where the first element of array stays is the base element.

Base Address

Address where the array's first element resides.

$$y = \overline{A \cdot B \cdot C} (A \cdot B + A \cdot C)$$

| A | B | C | $A \cdot B \cdot C$ | $\overline{A \cdot B \cdot C}$ | $A \cdot B$ | $A \cdot C$ | $A \cdot B + A \cdot C$ | y |
|---|---|---|---------------------|--------------------------------|-------------|-------------|-------------------------|---|
| 0 | 0 | 0 | 0                   | 1                              | 0           | 0           | 0                       | 0 |
| 0 | 0 | 1 | 0                   | 1                              | 0           | 0           | 0                       | 0 |
| 0 | 1 | 0 | 0                   | 1                              | 0           | 0           | 0                       | 0 |
| 0 | 1 | 1 | 0                   | 1                              | 0           | 0           | 0                       | 0 |
| 1 | 0 | 0 | 0                   | 1                              | 0           | 0           | 0                       | 0 |
| 1 | 0 | 1 | 0                   | 1                              | 0           | 1           | 1                       | 1 |
| 1 | 1 | 0 | 0                   | 1                              | 1           | 0           | 1                       | 1 |
| 1 | 1 | 1 | 1                   | 0                              | 1           | 1           | 1                       | 1 |

$$y = AB'C + ABC'$$

SOP

OR (choice) ✓ T F

AND (no choice) T F

A B C



## Programmable Logic Array - PLA

PLA → You need SOP expressions only.

The only limitation is the number of AND gates that are available.

It is used to build reconfigurable digital circuit.

It is a combination of memory and logic.

PLA has programmable AND Gate array and programmable OR Gate array.

adv: flexibility

disadv: not easy to use

(All the minterms are not realized, but only reqd minterms are implemented)

PLA is used as a decoder, counter, used to provide control over datapath.



Q. Design a PLA for the expression below.

$$y_1 = A\bar{B} + \bar{A}B\bar{C}$$

(A-B)

$$y_2 = A\bar{B} + ABC$$

$$y_3 = \bar{A}B\bar{C} + ABC$$



Q. Design a ripple adder and explain its limitations.  
Design another adder that overcomes the limitations of a ripple adder.

### # Ripple Carry Adder (RCA)



Q. What slows down adder?

A: The 'curse of the carry'.

The most significant outputs of the adder depends on the least significant inputs.

The speed of the adder depends on how fast the carry can be propagated / pre calculated.

Adders differ only in the way the carry-propagation is calculated.

CLA

- adv:
  - reduced propagation delay
  - provides fastest addition logic

disadv:

- complicated circuit as no. of variables increase
- costlier circuit as more numbers of hardware involved.

Fast (Parallel) Adder  
# Conditional Sum Adder / CLA / Fast Adder

# A carry look-ahead adder (CLA) or fast adder improves speed by reducing the amount of time reqd to determine carry bits.

RCA is slower comparatively.

→ Here, carry bit calc-ed alongside sum bit, and each stage must wait until the prev. carry bit has been calc-ed.

CLA calculates one or more carry bits before the sum, which reduces the wait time to calc the result of the lower value bits of the adder.

- reduces propagation delay by more complex hardware, so costlier.

This method makes use of logic gates so as to look at the lower order bits of augend + addend to see whether higher order carry is to be generated or not.



G. Design an ALU with logical gates

### ALU - Arithmetic Logic Unit

ALU is a part of CPU that carries out arithmetic and logic operations on the operands in computers.

\* ALU is the building block of CPU.

$$A - B = A + \overline{B} + 1 \quad (\text{2's complement})$$



MUX is a combinational circuit that is mostly used to select path between multiple sources and a single destination.

Demux parallel + 3.25.2.3

source side digital multiplexer

(multiple outputs)

CB-controlled (Controlled)

multiple inputs

multiple outputs

single path to output stage

multiple paths output

(parallel outputs)

multiple paths output

(parallel outputs)

multiple paths

multiple parallel paths output

multiple paths

multiple paths output from different parallel sources to same output

CB control bus controls the output paths

multiple paths output

Sequential Circuit consists of a feedback path and employs some memory elements.

$$S.C. = C.C + \text{Memory elements}$$

Flip-flops: Bistable logic devices  
(2 stable states)



- \* Flip-flops are memory elements that change state on clock signals.

Memory element: a device which can remember value indefinitely, or change value on command from its inputs.

Srl (shift right logical)  
 Sra (shift right arithmetic)



sign of original sign not reserved by Srl.

Q.

\$11, \$10, \$11, 3

\$10

- identify type and then encode it.

$$\text{sll} = \$10, \$11, 3$$

R-type

op. code: 0

because shamt

| OP    | RS    | RT    | Rs    | Shamt | fun <sup>c</sup> |
|-------|-------|-------|-------|-------|------------------|
| 00000 | 01011 | xxxxx | 01010 | 00011 | xxxxx            |

don't put  
0 here,

because  
not a valid  
format

for srl,  
+ \$11, \$10 \$11, \$0, 3

shamt  
(shift amount = 3)

$$= 00011$$

|   |   |   |   |   |   |
|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 0 |

\*

beq ] I-type register  
bne ]

beq \$10, \$11, lb 1

| OP | Rs | Rt | Immediate |
|----|----|----|-----------|
| 6  | 5  | 5  | 16        |

bne \$10, \$11, lb 2

|   |      |    |           |
|---|------|----|-----------|
| J | lb 1 | OP | Immediate |
|---|------|----|-----------|

Unconditional

S. 111, 010 110

$$PC \rightarrow \{ \text{if } (i == j) \}$$

$$PC \leftarrow PC + 4 \rightarrow x = y + z;$$

$$PC \leftarrow PC + 4 \quad z = x - y;$$

$$y = x + z;$$

{ for loop

S = Unconditional

11000 { else }

|       |       |       |       |       |
|-------|-------|-------|-------|-------|
| 11000 | 11000 | 01010 | 11000 | 11000 |
|-------|-------|-------|-------|-------|

Target address

PC + 4

+

Sign extension of offset l.s. by 2

PC: 20

Offset: 8

What will be the memory address of statement?

0 | 0-0 | 000

→ offset

0 - 0 | 0 - 01000 | 00

left shift by 2

0 - 010100

0 - 0111000

Target address: 56

0032168 - 00111 = 09

Binary Addition

$$\begin{array}{r} 010100 \\ 000100 \\ \hline 011000 \\ 100000 \\ \hline 111000 \end{array}$$

(Add 1000)

## Lecture

I-type instruction

|       |        |   |
|-------|--------|---|
| OP    | 25     | 0 |
| Iexit | offset |   |

|     |       |
|-----|-------|
| 0 0 | 1 0 0 |
| 0 1 | 0 0   |
| 1 0 | 0 0   |
| 1 1 | 0 0   |

→ 32 bits  
L8B always zero

Q) msb 4-bit that is missing adds up with -28 and makes it 32 bit

Q) Concatenating



PC : 1100 - 0010100

28 address begin

\$20

| 0-010100 |

-

| 1100 | - - - 100 |

Concatenation → high address

0 0 1 0 1 0

0 0 1 0 0 0

0 0 0 1 1 0

0 0 0 0 0 1

0 0 0 1 1 1

#   
 while ( $\text{save}[5] == k$ )  $i++ = 1;$   
 MIPS code:  
 we add **loop** so we can branch back to that instruction at the end of the loop.  
 loop:  
 $lw \$t_0, 20(\$S_c) \# \$t_0 \leftarrow \text{Save}[5]$   
 $bne \$t_0, \$S_5, \text{Exit}$   
 $add i \$S_3, \$S_3, 1$   
 loop  
 Exit:  
 (multiply variable by 4; left-shift by 2-bit is equivalent to multiplying by 4. ( $\# i * 4$ ))

To go to exact address, add base address with  $i * 4$   
 remove base address.  $X \text{ lw } \$t_0, 4 * i(\$S)$   
 #   
 while ( $\text{save}[i] == k$ ).   
 $sw$   
 $A[B[i]] = B[i] + k;$   
 $\uparrow$   
 $\text{save}[i]$   $\uparrow$   
 $B[i]$   
 $\uparrow$   
 load  $\$S_2$  on  $\$t_1$  content  
 $\uparrow$   $\text{pab}_0$ , content tail  
 $\uparrow$   $\text{main memory address.}$   
 $\uparrow$   
 loop:  
 $sll \$t_0, \$S_1, 2 \# i * 4$   
 $\uparrow$   
 $\text{save}[i]$   
 $add \$t_0, \$t_0, \$S_6$   
 $lw \$t_0, 0(\$t_0)$   
 $bne \$t_0, \$S_2, \text{Exit}$   
 $sll \$t_0, \$S_1, 2$   
 $\uparrow$   
 $B[i]$   
 $add \$t_0, \$t_0, \$S_3$   
 $lw \$t_0, 0(\$t_0)$   
 $add \$t_0, \$t_0, \$S_2$   
 loop:  
 $sll \$t_1, \$t_1, \$S_2$   
 $add \$t_1, \$t_1, \$S_3$   
 $lw \$t_1, 0(\$t_1)$   
 $sll \$t_1, \$t_1, \$S_4$   
 $add \$t_1, \$t_1, \$S_4$   
 $lw \$t_1, 0(\$t_1)$   
 $sw \$t_0, 0(\$t_1)$   
 loop:  
 Exit

#

Basic Blocks

(no branch or jumping)

 $\$11 : 4$  $\$12 : 5$  $\$10 : 1$  $\$11 : 5$  $\$12 : 5$  $\$10 : 0$ Slt \$10, \$11, \$12 — Rstays one full float  
value remains

lower than this

Slti \$10, \$11, 100 — I

&lt;, ≤, &gt;, ≥

(implement)

expensive to branch more

→ intensive operation

We're working on embedded economical architecture.

Addl \$10, \$11, \$12

Addu \$10, \$11, \$12

2.22.118

118

22.118.118

(118)0.118.118

p# 2.22.118

022.011.0118

(118)0.0118

31x2.22.118

## # IEEE 754 Floating Point Representation

| S | E        | F          |
|---|----------|------------|
| 1 | 10000121 | 0110111011 |

1 8 23

$$-91.6875 = -1011011.1011$$

binary representation of 91

$$= 1011011$$

0.6875 (multiply by 2)

$$= 0.1011$$

Convert into normalized form

$$= -1.011011011 \times 2^6$$

$$\alpha = \textcircled{127} + 6 = 133 = 10000101$$

$$0.101 = 1.01 \times 2^{-1}$$

$$0.010 = 1.0 \times 2^{-2}$$

$$2^{E-1} - 1 \\ = 128$$

Q. Convert it into HEX value?

#

Binary to equivalent decimal value

$$\begin{array}{r}
 3 \quad C \quad C \quad 8 \quad 0 \\
 0011 \quad 1100 \quad 100 \quad 1000 \quad 0 \\
 -\underline{1100} \quad \underline{100} \quad \underline{1000} \quad \underline{0} \\
 \end{array}$$

$$01111001 = 121$$

$$1101.11001 =$$

$$E = 121 - 127 = -6$$

$$\begin{aligned}
 & 1 \cdot 1001 \times 2^{-6} \\
 & = 0.0000011001
 \end{aligned}$$

$$= 2^{-6} + 2^{-7} + 2^{-10} \approx 0.0244$$

$$1101.11001 =$$

(S per digit)

1101.11001 =

month basam

30 x 1000

10100010 = 89 =

1 - 1-35

89 =

$1 - 2 \times 10$

$-2 \times 0$

main () {

... . . . PC  
... . . . PC + 4

Test () : ← PC 0x1234

← (PC + 4) 0x1238

{ \$ra

JAL

Jump and Link

} Test () {

... . . .

... . . .

... . . .

... . . .

return () : ← PC 0x1444

{ PC + 4 0x1448

} \$ra (Jump return to address in register ra)

\$ra / \$31

→ func call<sup>return</sup>  
whether, to address to  
store home machinera register  
will hold that  
value.\* starting  
address after a  
func call, otherwise,  
infinite loop.\* Stack to handle  
more than one  
func\* Recursive/Cascading  
func call

## Procedure

leaf  $\rightarrow$  non-leaf

main fun.

calls a supporting

fun and

this returns

a value to

main fun;

no cascading  
here

A0 A3

(to carry/pas  
arguments  
values.)

preserve the value of

so for each program;  
save the prev. state

P<sub>1</sub> parameter

P<sub>2</sub> and P<sub>3</sub> (var address in members of pointer arrays)

P<sub>8</sub> return

Stack  $\rightarrow$  memory location

in the RAM

Stack

P<sub>i</sub>  $\Rightarrow$  \$0 = 5

\$V0  $\rightarrow$  into  
return register

P<sub>i+1</sub> \$S0 ?

P<sub>3</sub> float

next array

mb

subroutine

thus main

## # Non-leaf Procedure

↳ calls a cascading/recursive func/procedure  
for every step, maintain a stack.

$X = \underline{\hspace{1cm} \text{MSB } 16\text{-bit}} \quad \underline{\hspace{1cm} \text{LSB } 16\text{-bit}}$

$\$t_0$

can't store it at one go, so we use

$lui \quad \$t_0, \underline{\hspace{1cm} \text{MSB } 16\text{-bit}}$

$\downarrow$   $\$t_0 : \underline{\hspace{1cm} 110 \dots 0} \quad \underline{\hspace{1cm} 0 \dots 0}$

load upper immediate

$\text{MSB } 16\text{-bit} \quad \text{LSB } 16\text{-bit}$

$ori \quad \$t_0, \$t_0, \underline{\hspace{1cm} 00 \dots 110}$

$\downarrow$  otherwise on operation

$110 \dots 0 \quad | \quad 0 \dots 0$

$00 \dots 0 \quad | \quad 00 \dots 110$

$\$t_0 \leftarrow \underline{\hspace{1cm} 110 \dots 000 \dots 110}$

so now  $\$t_0$  contains the result of first iteration

now for second iteration,  $\$t_0$  contains the result of first iteration

so now  $\$t_0$  contains the result of first iteration

## # Computer Abstractions and Technology

### Lecture 01

- # Computer makes applications feasible
  - Computers in automobile, cellphones, search engines, Human genome project, World Wide Web.
- \* Computers are pervasive → super common.
- # Computers are used in 3 different classes of applications. (Classes of Computers)
  - (i) Desktop Computers
  - (ii) Servers
  - (iii) Embedded Computers
- general purpose
  - network-based
  - hidden as components of systems
  - processor
  - microprocessors in I/O capacity.
  - largest class of computers
- Super computers
- TV
- are a class of "computers" with the highest performance and cost; they're configured as servers.
- An embedded computer is a computer inside another device used for running one predetermined app./collection of software.
- Both software and hardware affect program performance
- Multicore processor - A microprocessor containing multiple cores (processors) in a single integrated circuit.



Shape indicates speed, cost and size.

Computers need to be fast and dependable.

Since any physical device can fail, we make systems dependable by including redundant components that can take over when a failure occurs and to help detect failures.

## # Understanding Performance

(How this affects performance)

- Algorithm determines the no. of operations executed.
- Programming language, compiler, and architecture determines the no. of machine instructions executed per operation
- Processor and memory system determines how fast instructions are executed.
- I/O system (hardware and OS) determines how fast I/O operations are executed

The performance of a program depends on a combination of the effectiveness of the algorithms used in the program, the software systems used to create and translate the program into machine instructions, and the effectiveness of the computer in executing those instructions, which may include I/O operations.

## # Below your Program



### 1. Application Software

(written in high-level language)

### 2. System Software

→ Compiler (translates HLL code to machine code)

→ OS (service code) → An OS is a system software

Software that provides services that are useful, including OS, compilers, loaders, and assemblers.

that acts as an interface between a user and a hardware, and provides a variety of services and supervisory functions.

\* imp func's of OS \*

- Handling input/output operations
- Allocating storage & memory
- Scheduling tasks, and sharing resources.

### 3. Hardware

Processor (CPU),

memory (RAM), I/O controllers

examples of OS: Linux, iOS, Windows.

### Q. What is a compiler?

HLL → Java, C, C++

Ans: A compiler is a program that translates high-level language code into assembly language statements/code.  
(into instructions that the hardware can execute)

stored-program : the idea that data and instructions can be stored in memory as numbers

concept

Binary digit / Bit : (0. or 1) that are the components of information

Instructions : A command that computer hardware understands and obeys.

We use numbers for instructions and data. } foundation of computing

Assembler : a program that translates a symbolic version of instructions into the binary version.

(one line  
every  
instruction)

Assembly language (Symbolic language) - A symbolic representation of machine instructions.

Machine Language (Binary Language) - A binary representation of machine instructions.

High-level Language → Compiler

HLL program compiled into assembly language and then assembled into machine language.

H.L.L are portable across multiple systems.

Assembly Language  
(for MIPS)

Assembly language is a low-level programming language specific to a particular comp. architecture.

Assembler

Binary Machine Language  
(for MIPS)

- HLL
  - natural language
  - productivity improved
  - conciseness (fewer lines to express idea)
  - allow programs to be independent of the computer on which they were developed.

# Basic functions of the underlying hardware  
inputting data, outputting data, processing data, and storing data.

### 5 computer components

- |             |                        |                                                                                                       |
|-------------|------------------------|-------------------------------------------------------------------------------------------------------|
| 1. Input    | (keyboard, microphone) | • input writes data to memory                                                                         |
| 2. Output   | (display, speakers)    | • output reads data from memory                                                                       |
| 3. Memory   |                        |                                                                                                       |
| 4. Datapath | Processors             | • Processor gets instructions and data from Memory.                                                   |
| 5. Control  | / CPU                  | • Control sends the signals that determine the operations of the datapath, memory, input, and output. |
- datapath is the component of processor that performs arithmetic operations.

**Cache memory:** small, fast, SRAM memory for immediate access to data.

Cache memory acts as a buffer for a slower, larger memory.

SRAM is faster, expensive but less dense than DRAM  
Static dynamic

- ISA (instruction set architecture) — software/hardware interface contains info to run ML correctly
- ABI (app. binary interface) — ISA + system software interface
- implementation — hardware that obeys the architecture abstraction

Architecture: RISC, CISC, VLIW, SIMD

Processor: RISC, CISC, VLIW, SIMD

Processor architecture

Processor architecture: RISC, CISC, VLIW, SIMD

Processor architecture

response time / execution time: Total time reqd on the computer to complete a task.

throughput / bandwidth: no. of tasks completed per unit time.

(total amount of work done in a given time)

Decreasing response time almost always improves throughput.

$$\text{Performance}_x = \frac{1}{\text{Execution time}_x} \quad (\text{reciprocals})$$

$$\frac{P_x}{P_y} = \frac{E_y}{E_x} = n$$

If  $X$  is  $n$  times as fast as  $Y$ , then the execution time on  $Y$  is  $n$  times as long as it is on  $X$ .

improve performance  $\rightarrow$  increase performance  
improve execution time  $\rightarrow$  decrease execution time

elapsed time

system performance

$$\text{CPU Time} = \frac{\text{no. of CPU clock cycles}}{\text{clock cycle time}}$$

CPU time

CPU performance

$$\text{CPU Time} = \frac{\text{no. of CPU clock cycles}}{\text{clock rate}}$$

$$\text{clock period} \propto \frac{1}{\text{clock rate}}$$

clock period - time for a complete clock cycle

clock cycle - time for one clock period (runs at a constant rate)

clock rate - cycles per unit second

unit Hz

$$\frac{\text{no. of CPU clock cycles}}{\text{Instructions run on a program}} = I \times CPI$$

$$\frac{\text{no. of CPU clock cycles}}{\text{Instructions run on a program}} = I \times \frac{\text{Average clock cycles per instruction}}{(CPI)}$$

# CPI → determined by CPU hardware

Clock Cycles per Instruction

• Average no. of clock cycles per instruction for a program.

Each computer executes the same number of instructions for the program.

Instruction count: the no. of instructions executed by the program.

(determined by  
program, ISA  
and compiler)

CPU time = Instruction Count  $\times$  CPI  $\times$  Clock cycle time

CPU time = Instruction Count  $\times$  CPI  
Clock rate

Processor clock rate

Performance → if Registers no. increased,  
Improved then fast logic.



Fig:  
Instruction  
Format



- we take address from memory,  
not directly from memory.

## MIPS (million instructions per second)

MIPS is an instruction execution rate.

- Faster computers have a higher MIPS rating.

$$\text{MIPS} = \frac{\text{Instruction Count}}{\text{Execution time} \times 10^6}$$

$$\left( \frac{1. \text{count}}{\text{clock rate}} \right) = \frac{\text{Clock Rate}}{\text{CPI} \times 10^6}$$

$$1. \text{count} \times \text{CPI} \times 10^6$$

clock rate

Clock Rate

CPI  $\times 10^6$



for enhancement

enhancing memory



Cache

program need a lot of data

memory unit priority

#

## Von Neumann's Architecture (Stored Memory Concept):

Memory  $\xrightarrow{\text{Program}}$  we keep  $\rightarrow$  (Data & Instructions)  $\xrightarrow{\text{Program}}$   
 we store in main memory



Q.) Why Registers, and not MM?  $\Rightarrow$  speed

Q.) Why Registers, and not MM?  $\Rightarrow$  speed

Q.) Why Registers, and not MM?  $\Rightarrow$  speed

Control Unit handles all the processor control signals.

It directs all input & output flow, fetches code or instructions, and controlling how data moves around the system.

(increase throughput)

Parallel Processing : a class of techniques which enables the system to achieve simultaneous data-processing tasks to increase the computational speed of a computer system.

A parallel processing system carries out data-processing to achieve faster execution time.

For e.g. while an instruction is being processed in the ALU component of the CPU, the next instruction can be read from memory.



Parallel processing with the following steps:  
1. Memory reading and processing with the parallel ALU.  
2. Data selection, with the ALU & Control the parallel ALU.  
3. At the same time read the parallel ALU, parallel ALU.

## Reduced Instruction Set Computer

### Category of Processor (ISA)

## Complex Instruction Set Computer

### RISC

(single-clock)

1. RISC processors have a smaller set of instructions with few addressing modes.

2. Fixed instruction format (Uniform)

3. Pipelining

4. More registers used

5. Emphasis on software

6. Simple addressing modes

7. Hardwired control

8. Complex compiler design

9. Calculations are faster and precise

10. Execution time is very less

11. No memory unit

### Uses

image processing,  
telecommunication  
(high-end apps)

Q. Compare bet'n RISC and CISC architecture

### Category of Processor (ISA)

## Complex Instruction Set Computer

### CISC

(memory to memory)

(multi-clock)

1. CISC processors have a larger set of instructions with many addressing modes.

2. Variable instruction format.

3. B) No pipelining / (sequential)

4. Less registers

5. Emphasis on hardware

6. Complex addressing modes

7. Microcode control

8. Simple compiler design

9. Calc's are slow and precise

10. Execution time is very high.

11. Has memory unit to implement complex instructions

(low-end apps)

home automation,  
security systems.

Q. Draw the diagram of Harvard model of computers  
(Ans)

### Harvard Architecture

A computer architecture with physically separate storage and signal pathways for instructions and data.

It contrasts with the von Neumann architecture, where program instructions and data share the same memory and pathways.

Fig: A harvard architecture has a separate bus for signals to address instruction and storage (memory).



Q. Draw the diagram of von Neumann model.

Fig: A von Neumann Architecture.



G. Compare between von Neumann and Harvard architecture. (3 marks)

| V.N. Architecture                                                                  | Harvard Architecture                                                 |
|------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| 1. It uses <u>same</u> physical memory address for data and instructions.          | It uses <u>separate</u> memory addresses for data and instructions   |
| 2. Processor needs two clock cycles to execute an instruction.                     | Processor needs one cycle to complete an instruction                 |
| 3. Simple CU (Control Unit) design; cheaper and faster development of CU.          | CU for two buses is more complicated; costlier development of CU.    |
| 4. Data fetches transfers & instruction fetches cannot be performed simultaneously | Data transfers and instruction fetches can be performed concurrently |
| 5. Used in laptops, PCs                                                            | Used in microcontrollers and signal processing.                      |

$$P \propto \frac{1}{E.t.}$$

$$C.\text{period} \propto \frac{1}{C.\text{rate}}$$

1.  $\text{CPU Time} = \frac{\text{no. of CPU clock cycles}}{(\text{E.t. time})} * \text{Clock Period}$  (CCT)  
→ Clock Cycle Time

2.  $\text{CPU Time} = \frac{\text{no. of CPU clock cycles}}{\text{clock rate (in Hz)}}$

3.  $\text{no. of CPU clock cycles} = I * \text{CPI}$  (avg. c. cycles per I)  
(instructions)

4.  $\text{CPU Time} = I * \text{CPI} * \text{CCT}$

5.  $\text{CPU Time} = \frac{I * \text{CPI}}{c.\text{rate}}$

6.  $\text{MIPS} = \frac{I}{E.t. \times 10^6}$

$= \frac{I}{\frac{I * \text{CPI}}{c.\text{rate}} \times 10^6}$

$= \frac{c.\text{rate}}{\text{CPI} \times 10^6}$

Q. A processor P has an 8 GHz clock rate and a CPI of 1.0. If P executes a program within 20s, then find (i) no. of cycles (ii) no. of instructions

(i) CPU Time = 20s  
(Execution time)

$$\text{Clock rate} = 8 \times 10^9 \text{ Hz}$$

$$\text{CPI} = 1 \Rightarrow \text{Time} = 19.2 \times I = \text{Execution time}$$

(Constitutive)

$$\text{no. of CPU cycles} = I$$

$$\text{CPU Time} = \frac{\text{no. of cycles}}{\text{clock rate}}$$

$$\Rightarrow \text{no. of cycles} = 20 \times 8 \times 10^9$$

then 0.9

$$= 1.6 \times 10^{11}$$

(ii) no. of cycles =  $I \times \text{CPI}$

$$\Rightarrow 1.6 \times 10^{11} = I \times 1$$

$$\Rightarrow I = 1.6 \times 10^{11}$$

I =  
201 x 19.2  
201 x 19

201 x 19.2  
201 x 19

# IEEE Standard 754 Floating Point Numbers

(most common representation for real numbers on computers)

- Single precision ✓ (32 bits)
- Double precision (64 bits)

F.P. is a representation for non-integral numbers (including very small and very large numbers)

## # Single Precision



## # Double Precision



### Sign of Mantissa

0 → +ve numbers

1 → -ve numbers

### Biased exponent

Exponent field

represents both  
+ve and -ve exponents.

A bias is added to the  
actual exponent in order  
to get the stored exponent

### Normalised Mantissa

- consists of 1 significant digit

→ Decimal to IEEE 754 Floating Point Representation

- repeated division by 2 (whole no.)

- repeated multiplication by 2 (for fractions)

Example  
1

263.3

|       |               |             |
|-------|---------------|-------------|
| 263   | $\Rightarrow$ | 2   263 rem |
|       |               | 2   131     |
|       |               | 2   65      |
|       |               | 2   32      |
| (754) |               | 2   16   0  |
|       |               | 2   8   0   |
|       |               | 2   4   0   |
|       |               | 2   2   0   |
|       |               | 2   1   0   |
|       |               | 0   1       |

\* Q. Rem

263  
= 100000111

repeated  
divide  
by 2

\* until Quotient  
is zero

end pd

order of dividend #

|     |               | Carry M | down | fraction =?     | pd                  |
|-----|---------------|---------|------|-----------------|---------------------|
| 0.3 | $\Rightarrow$ | 2 x 0.3 | 0    |                 |                     |
|     |               | 2 x 0.6 | 1.   |                 |                     |
|     |               | 0.2 x 2 | 0    |                 |                     |
|     |               | 0.4 x 2 | 0    |                 |                     |
|     |               | 0.8 x 2 | 1    |                 |                     |
|     |               | 0.6 x 2 | 1    | error in base 2 | positive M in orig. |
|     |               | 0.2 x 2 | 0    | left fraction   | maximum sv + 0      |
|     |               | 0.4 x 2 | 0    | Mod fraction    | maximum sv - 0      |
|     |               | 0.8 x 2 | 1    | error in base 2 | maximum sv - 1      |
|     |               | 0.6 x 2 | 1    | error in base 2 |                     |
|     |               | 0       |      |                 |                     |
|     |               | 0       |      |                 |                     |
|     |               | 1       |      |                 |                     |
|     |               | 0       |      |                 |                     |
|     |               | 1       |      |                 |                     |
|     |               | 1       |      |                 |                     |
|     |               | 1       |      |                 |                     |

Binary  
 $\pm 1.xxxxxx \times 2^{yyyy}$

263.3

Step 1

10.00001110100110011001... 2<sup>8</sup> + .19 -

Significand

Step 2

1.00000110100110011001...  $\times 2^8$

original mantissa

(scientific form)

Step 3 (IEEE format)

| S     | E      | M       |
|-------|--------|---------|
| 1 bit | 8 bits | 23 bits |

mantissa follows  
the dp in

scientific  
notation

+ve,  
so add to bias

Since 263.3 is a +ve no,

sign bit  $\Rightarrow 0$

exp. bias: 127

S E M  
0 127 + 8 0000011010011001100

-ve no  $\Rightarrow$  subtract  
to bias

= 135  
(decimal)

+ve no.  $\Rightarrow$  add to  
bias

↓  
(binary)  
10000111

1101.1101101 =

IEEE representation

10100011000001010010100100100100

10100011000001010010100100100100

10100011000001010010100100100100

Example  
2

-91.6875

91

int whole

1011011

0.6875 = 1011

Fraction

2 | 91 zoom up

2 | 45 1 ↑

2 | 22 1

2 | 11 0

2 | 5 1

2 | 2 1

2 | 1 0

0 1

+ 1 0 1 0 1 1

0.6875 × 2

0.375 × 2

0.75 × 2

0.5 × 2

Carry Down

1

0

1

1

1

1

0

1

0

-91.6875

= 1011011.1011

= 1.011011011 × 2<sup>6</sup>

+ve original bias = 6  
exponent

so add to bias

IEEE standard representation

Sign ⇒ 1

since -ve original value

|      |          |    |
|------|----------|----|
| S    | 8        | 23 |
| Empt | Mantissa |    |

|   |       |                                   |
|---|-------|-----------------------------------|
| 1 | 6+127 | 0110110110 01011100 00011100 0010 |
| 1 | =133  |                                   |

↓  
01111011

10000101

Step 3

IEEE format

1100 0010 1011 0111 011

C 2 B 7 3

(-882.0)

101110

= (P2)

Convert it into hex value

C2B73D

$2^{15} \times 0.8$

$2^0 \times 1.0 \times 2$

qu. part

P2

Hexadecimal numbers in computing is used for abbreviating lengthy binary representations.

1100 0010 0000 1110 0011 1010

$2^5 \times 10.0$

C 2 F 0

$2^5 \times 298.0$

$2^1 \times 298.0$

0110 0011 1110 0011 1010

$2^5 \times 821.0$

$2^0 \times 288.0$

$2^0 \times 282.0$

$2^1 \times 248.0$

$2^0 \times 88.0$

$2^1 \times 258.0$

$2^0 \times 228.0$

$2^1 \times 202.0$

$2^0 \times 800.0$

$2^0 \times 210.0$

Example  
3

Convert  $-29.632$  into IEEE 754 floating point representation.  
Show the hexadecimal representation of your conversion.

29

whole

= 11101 ✓

0.632

fraction

Carrying down

2 | 29 (rem-up)

2 | 14. 1

2 | 7 0

2 | 3 1

2 | 1 1

0 1

$$2 \times 0.632$$

$$2 \times 0.264$$

$$2 \times 0.528$$

$$0.056 \times 2$$

$$0.112 \times 2$$

$$0.224 \times 2$$

$$0.448 \times 2$$

$$0.896 \times 2$$

$$0.792 \times 2$$

$$0.584 \times 2$$

$$0.168 \times 2$$

$$0.336 \times 2$$

$$0.672 \times 2$$

$$0.344 \times 2$$

$$0.688 \times 2$$

$$0.376 \times 2$$

$$0.752 \times 2$$

$$0.504 \times 2$$

$$0.008 \times 2$$

$$0.016 \times 2$$

$$0.032 \times 2$$

$$0.064 \times 2$$

$$0$$

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9

Step 10

Step 11

Step 12

Step 13

Step 14

Step 15

Step 16

Step 17

Step 18

Step 19

Step 20

Step 1  
11101.1010000111000101011

Step 2  
S.F. w base 2  
1.11011010000111000101011  $\times 2^4$

positive exponent  
so add to bias 127

Step 3  
$$\begin{array}{r} S \quad 1 \\ 1 \end{array} \quad \begin{array}{r} E \\ 2 \times (127 + 4) \\ = 131 \end{array}$$

$\downarrow$   
10000011

M 11011010000111000101011

H . 00000000

FP Bim  
1100 0001 110 1101 0000 1110 0010 1011  $\times 1.1$

C 1 E D 0 E 2 B

C1ED0E2B

(F81+1-)

281

↓

0111110

0000 0000 0000 0000 0000 0010 1111 1101

Example

4

Convert  $-0.75$

0  
whole

Binary: 0000000

$0 \times 10^{-1} = 0.75$

fraction

|                 |   |   |            |
|-----------------|---|---|------------|
| $0.75 \times 2$ | 1 | 0 | carry down |
| $0.5 \times 2$  | 1 | 1 |            |
| 1000            |   |   |            |
| 1000001         |   |   |            |

Step 1:

0000000.11

Step 2:

$1.1 \times 2^{-1}$

3 2 1 0 3 2 1 0

Step 3

S

E

M

1 (-1 + 127)

126

10000000000000000000



0111110

1011 1111 0100 0000 0000 0000 0000 0000

IEEE 754 Floating Point to its Decimal

Scientific notation  
 $1.\underbrace{xxxxxx}_m \times 2^e$

0100 0011 0101 0100 0000 0000 0000 0000  
 exponent bits

+ve sign bit

10000110 (binary)

↓ decimal

so +ve decimal

184

$$\text{bias} + \text{exponent} = 127 + 134$$

$$127 + \text{expt} = 134$$

$$\Rightarrow \text{exponent} = 7$$

$$e = 7$$

$$1.1010100000000000000 \times 2^7$$

$$= 11010100.0000000000000$$

$$m = 1 \times 2^{-1} + 1 \times 2^{-3} + 1 \times 2^{-5}$$

$$= 0.65625$$

my way

$$212.0$$

$$2 \times 212.0 (-1)^s \times (1+m) \times 2^e$$

$$2 \times 212.0 = (-1)^0 \times (1+0.65625) \times 2^7$$

$$2 \times 212.0 = 1.65625 \times 2^7$$

$$2 \times 212.0 = 212$$

$$2 \times 212.0$$

$$2 \times 212.0$$

$$2 \times 212.0$$

Example  
5

-47.523

whole

47

 $= 101111$ 

fraction

0.523

Carry-Down

2 | 47 mem-up

2 | 23 1

2 | 11 1

2 | 5 1

2 | 2 1

2 | 1 0

0 1

Invert &amp;

P.S.

$$2 \times 0.523$$

$$0.092 \times 2$$

$$0.184 \times 2$$

$$0.368 \times 2$$

$$0.736 \times 2$$

$$0.472 \times 2$$

$$0.944 \times 2$$

$$0.888 \times 2$$

$$0.776 \times 2$$

$$0.552 \times 2$$

$$0.104 \times 2$$

$$0.208 \times 2$$

$$0.416 \times 2$$

$$0.832 \times 2$$

$$0.664 \times 2$$

$$0.328 \times 2$$

$$0.656 \times 2$$

$$0.312 \times 2$$

$$0.624 \times 2$$

$$0.248 \times 2$$

Step 1

~~101111.1000010111000110101~~

Step 2

~~1.01111000010111000110101 × 2<sup>5</sup>~~

base 2

S.H.

Step 3

$$\begin{array}{r} S \\ \hline 1 \end{array} \quad \begin{array}{r} E \\ (5+127) \end{array}$$

$$= 132$$

↓

~~10000100~~

M

~~011110000101110001101~~

~~1100 0010 0011 110 0001 0111 1000 1101~~

C 2 3 E 1 6 8 D

Hexadecimal: C23E168D

## Chapter 2 Instructions: Language of the Computer

Every computer must be able to perform arithmetic.

Each MIPS arithmetic instruction performs only one operation and must always have exactly 3 variables.

(Each line contains at most one instruction unlike other programming languages.)

Fast locations for data. In MIPS, data must be in registers to perform arithmetic.

register \$zero always equals to 0.

\$at register reserved by assembler to handle language constants.

Memory holds arrays, data structures, and spilled registers.

MIPS uses byte addresses, so sequential word addresses differ by 4.

|                      |                       |                              |
|----------------------|-----------------------|------------------------------|
| 01 > 224 = 132       | 01,224,132 332        | 72d, 43,00                   |
| 01 < 224 = 128       | 01,224,132 inc        | 72d, 43,00                   |
| 01 op (224 = 128) ?i | 22,224,128 and        | 72d, 43,00                   |
| 001 + 8 + 39         | 00,224,128 add        | 72d, 43,00                   |
| left leaves to u     | 01 op (224 = 128) ?i  | 72,224,128 and               |
| 001 + 8 + 39         | 00,224,128 add        | 72d, 43,00                   |
| and, or, or equal    | 0 = 124 (824 > 224)?i | 82,224,128 and, or, or equal |
| and, or, or equal    | 0 = 124 024           | 82,224,128 and, or, or equal |
| or not of quat       | 00001 or op           | 00001 or op                  |
| or not, relative add | 0001 or op            | 0001 or op                   |
| two branching info   | 0f op, f + 39 = 00f8  | 00001 or op                  |
|                      | 0001                  |                              |

# MIPS Assembly Language

from → [MR. load] → to

| Category           | Instruction           | Example              | Meaning                                     | Comments                            |
|--------------------|-----------------------|----------------------|---------------------------------------------|-------------------------------------|
| Arithmetic         | R add                 | add \$s1, \$s2, \$s3 | \$s1 = \$s2 + \$s3                          | Three register operands             |
|                    | R subtract            | sub \$s1, \$s2, \$s3 | \$s1 = \$s2 - \$s3                          | Three register operands             |
| Data Transfer      | I* add immediate      | addi \$s1, \$s2, 20  | \$s1 = \$s2 + 20                            | Used to add constants               |
|                    | I load word           | lw \$s1, 20(\$s2)    | \$s1 = Memory[\$s2 + 20]                    | Word from memory to register        |
|                    | I store word          | sw \$s1, 20(\$s2)    | Memory[\$s2 + 20] = \$s1                    | Word from registers to memory       |
|                    | I load byte           | lb \$s1, 20(\$s2)    | \$s1 = Memory[\$s2 + 20]                    | Byte from memory to register        |
| Logical            | R and                 | and \$s1, \$s2, \$s3 | \$s1 = \$s2 & \$s3                          | Three reg. operands; bit-by-bit AND |
|                    | R or                  | or \$s1, \$s2, \$s3  | \$s1 = \$s2   \$s3                          | Three reg. operands; bit-by-bit OR  |
|                    | R and immediate       | andi \$s1, \$s2, 20  | \$s1 = \$s2 & 20                            | Bit by bit AND with constant        |
|                    | R or immediate        | ori \$s1, \$s2, 20   | \$s1 = \$s2   20                            | " " OR reg with constant            |
| Conditional Branch | R shift left          | sll \$s1, \$s2, 10   | \$s1 = \$s2 << 10                           | Shift left by constant              |
|                    | R logical             |                      |                                             |                                     |
|                    | R shift right         | srl \$s1, \$s2, 10   | \$s1 = \$s2 >> 10                           |                                     |
|                    | R logical             |                      |                                             |                                     |
| Unconditional jump | I branch on equal     | beq \$s1, \$s2, 25   | if (\$s1 == \$s2) go to PC + 4 + 100        | Equal test                          |
|                    | I branch on not equal | bne \$s1, \$s2, 25   | if (\$s1 != \$s2) go to PC + 4 + 100        | Not equal test                      |
|                    | I set on less than    | slt \$s1, \$s2, \$s3 | if (\$s2 < \$s3) \$s1 = 1;<br>else \$s1 = 0 | Compare less than;<br>for beq, bne  |
|                    | I jump                | j 2500               | go to 10000                                 | Jump to target address              |
|                    | I jump register       | jr \$ra              | go to \$ra                                  | for switch; procedure return        |
|                    | I jump and link       | jal 2500             | \$ra = PC + 4; go to 1000                   | for procedure call                  |

\* There is no sub instruction in RISC which makes it more difficult to implement.

- Design Principle 1: Simplicity favors regularity
- ⇒ Hardware for a variable no. of operands is more complicated than hardware for a fixed number.
- Translation from C to MIPS assembly language instructions is performed by the compiler. MIPS code produced by a compiler.

Example 1

C

$$a = b + c;$$
  
$$d = a - e;$$

MIPS

add a, b, c

sub d, a, e

one destination operand

two source operands

Example 2

C

$$f = (g + h) - (i + j)$$

The compiler creates a temporary variable.

MIPS

add t0, g, h

5 variables: f, g, h, i, j

add t1, i, j

\* Only one operation

per MIPS instruction.\*

sub f, t0, t1

Temporary variable t0 is created to store the sum of g and h.

Temporary variable t1 is created to store the sum of i and j.

Temporary variable f is created to store the result of the subtraction.

The size of a register in the MIPS architecture is 32 bits.

An arithmetic operations occurs only on registers in MIPS instruction; so MIPS include data transfer instructions that moves data b/w memory and registers.

The operands of Arithmetic Instructions are restricted; so they must be from a limited number of special locations built directly in hardware called registers.

(Bricks of computer construction)

- Design Principle 2: Smaller is faster
- A very large no. of registers may increase the clock cycle time simply because it takes electronic signals longer when they must travel farther.  
→ reason for the limit of 32 registers.

Effective use of registers is critical to programs performance.

32 general-purpose registers

- \$0, \$1, \$2, ..., \$31

\$26 → hardwired to 0

\$t0 - \$t9 → temp

\$s0 - s7

\$s0 - s7 → save

\$ra → return address

The processor can keep only a small amount of data in registers, but computer memory contains billions of data elements.

Hence, data structures (arrays & structures) are kept in memory.

## MIPS Memory

MIPS uses byte-addressing,

with each word representing 4 bytes.

|    |                 |
|----|-----------------|
| 0  | 32 bits of data |
| 4  | 32 bits of data |
| 8  | 32 bits of data |
| 12 | 32 bits of data |

Registers hold 32 bits of data

In MIPS, a word is 32-bits or 4-bytes.

every instruction increments by 4-bytes.

### 3 instruction formats (all 32-bits wide)

| R-format (computational) |    |    |                     |                  |        |                                                       |
|--------------------------|----|----|---------------------|------------------|--------|-------------------------------------------------------|
| 6 bits                   | 5  | 5  | 5                   | 5                | 6 bits |                                                       |
| OP                       | rs | rt | rd                  | shamt            | funct  |                                                       |
|                          |    |    |                     |                  |        |                                                       |
| 6                        | 5  | 5  | 16 bits             |                  |        |                                                       |
| OP                       | rs | rt | immediate           | constant address |        | I-format (branching, immediate loading, storing word) |
|                          |    |    |                     |                  |        |                                                       |
| 6                        |    |    |                     |                  |        |                                                       |
| OP                       |    |    | jump target address | 0                |        | J-format (jumping)                                    |
|                          |    |    |                     |                  |        |                                                       |

op - opcode → The field that denotes the operation and format of an instruction

format of instruction

multiple instructions can be stored in memory and executed sequentially

multiple instructions can be stored in memory and executed sequentially

program processes multiple programs and can also execute multiple

Load word and store word are instructions that copy words between memory and registers in the MIPS architecture.

**Load word:** data transfer instruction that copies data from memory to a register.  
(lw)

**Store word:** copies data from a register to a memory location.  
(sw)

Example

$$g = A[5] + i$$

Array [something]

lw \$t0, 20(\$s3)

lw \$t0, something \* 4 (\$s3)

5x4

Example

$$A[12] = h + A[8]$$

lw \$t0, 32(\$s3) # load word

add \$t0, \$s2, \$t0 # add word

sw \$t0, 48(\$s3) # store word

Now to implement this, we have to take address of first byte of storage - go

Data accesses are faster if data is in registers instead of memory.

Memory is slower than registers.

Registers take less time to access and have higher throughput than memory, making data in registers both faster to access and simpler to use i.e. data is more useful when in a register.

Accessing registers also uses less energy than accessing memory.

(Cost from 2 to 8) + mem: 2 qsl

### # Constant / Immediate Operands

Load instruction: 2 qsl

We've to load a constant from memory to use one.

An alternative to this is the arithmetic instruction called add immediate add i

Constant operands occur frequently, so carrying out addi is much faster and uses less energy than if constants were loaded from memory. (Design Principle 3: common case fast)

Reg. Register file

\* # MIPS support negative constants, there is no need for subi in MIPS (subtract immediate)

Numbers are kept in hardware as a series of high and low electronic signals; → base 2 numbers / binary number.

| High | Low   |
|------|-------|
| ON   | OFF   |
| 1    | 0     |
| True | false |

## Two's Complement Representation

Convention to represent Signed Binary Numbers

Step 1 : invert ( $0 \rightarrow 1$  and  $1 \rightarrow 0$ )

Step 2 : increment by 1

00000000000000000000000000000000

2914 mit 3 bits mit boole von 21 nicht, 21 mit 20000 without for  
Gesamt 30 bits

and then add 30 zeros for 30 combined on top von  
and then parallel adder parallel adder von 21 nicht

was nicht

300 00000000000000000000000000000000

D

32bit 31

## MIPS Register File

Q. Describe the internal structure of Register file with diagram (7 marks)

MIPS register-file includes 32 32-bit general purpose registers.

\* A register-file is an array of processor registers in a CPU.  
A register-file is a small set of high-speed storage cells inside CPU.

MIPS is a load-store architecture, which means that only load and store instructions can access memory.

All other instructions (add, sub, mul, and, or, etc.) must get their operands from registers and store their results in a register.

Rig: Register file



Fig: Internal Structure of Register File



Fig: Block Diagram


 $S + P = 20$ 
 $128, 128, 028, 028$ 
 $S - P = 20$ 
 $128, 128, 028, 028$ 

$$S_P + [01]_x = [11]_x$$

Q. What is stored in the Register file?

Ans: The current data

The R. File store the pieces of data that are being used by the instructions.

We must use load/store instructions to access the memory, and it is much slower and uses much more energy than the Registers file

Because it must go through memory

Registers are fast

(R) Address + w)

Memory, +, + 112

Memory, +, + bba

(1), 0, d, address bus

② ↗

(F28) 21, 042, w

2, 028, 042, 112, 8

328, 042, 028, bba, 7

((P18)A, w, address bus))

(042) 0, 042, w, a

042, 128, 028, due

## MIPS Instructions

# 10

$$x = y + z$$

# \$s0 \$s1 \$s2

MIPS format

[op dest, snc1, snc2]

add \$t0, \$s0, \$s1, \$s2

# a = b + 12

add i a, b, 12

#

$$x = y - z$$

sub \$s0, \$s1, \$s2

#

$$\zeta x[11] = x[10] + y$$

lw \$t0, 40(\$s0)

add \$t1, \$t0, \$s1

sw \$t1, 44(\$s0)

offset \* 4

load

load temp, 4 \* offset (array variable)

word

store temp, 4 \* offset (array variable)

word

store

$$f = g - A[B[4]]$$

\* array thakle load into a temp registers.

lw \$t0, 16(\$s7)

sll \$t0, \$t0, 2  
power

②

next

(lw t, offset \* 4 (B[4])  
sll t, t, power  
add t, t, bather ghor  
load contents t, 0 (t))

\* add \$t0, \$t0, \$s6

lw \$t0, 0(\$t0)

/ load contents for A(B[4])

sub \$s0, \$s1, \$t0

#

for (int i=0; i<10; i++) {

$$a = a + b; \quad A[i] = A[i-1] + B[i];$$

}

Kontext

b

m

n

o

p

q

r

s

t

u

v

w

x

y

z

g50

g70

g51

g71

12

55

13

56

14

57

15

58

16

59

17

60

18

61

19

62

20

63

21

64

22

65

23

66

24

67

25

68

ψ

↓

→

2nd step

+207

Name

step 2 = 1st step

wall

split

borrow is one digit less

to size add 2 to borrow A

no add 28) one digit

(else P)

so add 28 blot last digit.

(borrow) note

borrow is to add &amp; tend \*

.00 is available



each register is 1 word.

- A word is the size of a register (32 bits or 4 bytes)
- Registers hold 32 bits of data (1 word)

\* Last 2 bits of a word address is 00.

shifting left by  $i$  bits gives the same result as multiplying by  $2^i$

If  
if statement with a goto (conditional branches) go to statement

beg  $m_1, m_2, l_1$

branches)

done  $m_1, m_2$

(branch if equal)

(branch if not equal)

beg  $m_1, m_2, l_1$

## Lecture

#

Branch Addressing

new and older

 $PC + 4 + (\text{sign ext?} \text{ and l. shift by } 2) \text{ offset}$ 


false

#

Jump AddressTarget Addressing

Instruction in Int. Instruction Memory.

(RAM &amp;��)

bne \$t0, \$s5, Exit

underlying arithmetic op: Subtraction

# Branching form away

bne \$10, \$11, ELSE

Q. How many bits of data this offset can take?

Ans : 16-bit offset

How does  
assembler  
handle this?



Assembler imposes the instruction

# 5 Addressing Modes in MIPS

Give diagrams w appropriate example

1. Immediate Addressing

Add i \$10, \$11, 20

2. Registers Addressing

Add \$10, \$11, \$12

### 3. Base Addressing

lw \$10, 40(\$11)

sw \$10, 44(\$11)

### 4. PC-relative addressing

lne \$10, \$11, ELSE

beg \$10, \$11, ELSE

### 5. Pseudo-direct Addressing

] exit

# ARM & MIPS

→ add. modes

32 registers

→ 15 registers

## # 8lide Lecture ③

$$x = 3.538$$

$$y = 5.319$$

Find  $x+y=?$

Using IEEE FP Rep^n

|            |                  |          |                       |
|------------|------------------|----------|-----------------------|
| $x$        | $0.538 \times 2$ | <u>1</u> |                       |
| $y = 0011$ | $0.076 \times 2$ | <u>0</u> | 0.076 + 0.152 = 0.228 |
| $0.538 =$  | $0.152 \times 2$ | <u>0</u> | 0.152 + 0.304 = 0.456 |
|            | $0.304 \times 2$ | <u>0</u> |                       |
|            | $0.608 \times 2$ | <u>1</u> |                       |
|            | $0.216 \times 2$ | <u>0</u> |                       |
|            | $0.432 \times 2$ | <u>0</u> |                       |
|            | $0.864 \times 2$ | <u>1</u> |                       |
|            | $0.728 \times 2$ | <u>1</u> |                       |
|            | $0.456 \times 2$ | <u>0</u> |                       |

$$x = 0011.1000100110$$

$$x = 1.110001 \times 2^1$$

y

$$y = 0101$$

$$0.319$$

|                  |          |
|------------------|----------|
| $0.319 \times 2$ | <u>0</u> |
| $0.638 \times 2$ | <u>1</u> |
|                  | <u>0</u> |
|                  | <u>1</u> |
|                  | <u>0</u> |

$$y = 1.010101 \times 2^2$$

\* make both expt equal

$$x = 1.110001 \times 2^1 = 0.1110001 \times 2^{(2)}$$

$$y = 1.010101 \times 2^{(2)}$$

$$x+y = (0.1110001 + 1.010101) \times 2^2$$

$$= 11.000110 \times 2^2 = 10.0011011 \times 2^2$$

$$= 1000.0110 \times 2^2 = 1000.011011 \times 2^2 = 10$$

$$= 8 + 2^{-1} + 2^{-2} + 2^{-4} + 2^{-5}$$

$$\approx 8.8437510 = B$$

$$OS = FS - FO = 3$$

$$\begin{array}{r} 1.0101010 \\ 0.1110001 \\ \hline 10.0011011 \end{array}$$

$$OS = FS - FO = 3$$

$$FS = 11100101 = A$$

$$FO = FS - FO = 3$$

$$FO = FS \times 101110.1 = B$$

$$OS = FS \times 100010.1 = D$$

$$OS = FS \times 1000101.0 - Q.0 =$$

$$x = \begin{array}{r} 3 \ 5 \ A \ 2 \ 0 \end{array} - 0$$

$2 \times 100011.0 + 110 = 2 \times 100011.1 = 8$

$$y = \begin{array}{r} 5 \ 8 \ BA \ 0 \end{array} - 0$$

$2 \times 101010.1 = 0$

$$x+y = ?$$

$$x = \begin{array}{r} 00110101000100100 \end{array} - 0 = y + 8$$

$$x = \begin{array}{r} 00110101010110100 \end{array} - 001000000 =$$

$$\alpha = 01101011 = 107$$

$$E = 107 - 127 = -20$$

$$x = 1.010001 \times 2^{-20}$$

$$\begin{array}{r} 0101010 \end{array} . \begin{array}{r} 0001110 \end{array}$$

$$\begin{array}{r} 1101100 \end{array} . \begin{array}{r} 01 \end{array}$$

$$y = \begin{array}{r} 01010011101110100 \end{array} - 0$$

$$\alpha = 10100111 = 167$$

$$E = 167 - 127 = 40$$

$$y = 1.011101 \times 2^{40}$$

$$x = 1.010001 \times 2^{-20}$$

$$= 0. \underbrace{0 - 0}_{59} 1010001 \times 2^{40}$$

$$x+y = (1.0111010 \overline{1} \quad \overline{01010001}) \times 2^4$$

$$= 10111010 \overline{1} \cdot \overline{0} - \overline{01010001}$$

$$= 2^{40} + 2^{38} + 2^{37} + 2^{36} + 2^{34} + 2^{-20} + 2^{-22} + 2^{-25}$$

$$\approx 1.597 \times 10^{12}$$

$$110001.101 = \text{FFC.3}$$

$$100001.001$$

$$110001101 = \text{E8A.1}$$

(erste Zahl ist negativ, zweite Zahl ist positiv)

+ zweite Zahl vorzeichen wechselt nicht, resultierende Zahl ist negativ

(daher ist die zweite Zahl negativ)

$$\text{Zweite Zahl: } 100001001$$

$$(100001101)$$

$$\text{Addiert: } 100001001 + 100001101$$

$$100001001$$

## # FP Adder Hardware \* (Imp)

- shifting opt. done on lower exponent.

$$x = 4.517$$

$$y = 5.547$$

$$x * y = ?$$

Step 1: Find out the normalized forms of  $x$  and  $y$  (0.1 × F.P. 1)

$$\begin{array}{|c|} \hline x \\ \hline \end{array}$$

$$\begin{array}{l} 4.517 = 100.100001 \\ = 1.00100001 \times 2^2 \end{array}$$

$$\begin{array}{|c|} \hline y \\ \hline \end{array}$$

$$\begin{array}{l} 5.547 = 101.100011 \\ = 1.01100011 \times 2^2 \end{array}$$

(both the exponents are same)

\* In multiplication, don't worry about Exponent because we add them.

\* Take care of exponents in Add<sup>n</sup>, Sub<sup>n</sup>.

$$x * y = ?$$

$$\begin{array}{r} 1.00100001 \\ \times 1.01100011 \\ \hline \end{array}$$

$$\begin{array}{r} 100100001 \\ \times 10010001 \\ \hline \end{array}$$

$$\begin{array}{r} 100100001 \\ \times 10010001 \\ \hline 0000000000 \\ 0000000000 \\ 0000000000 \\ 1001000001 \\ 1001000001 \\ \hline \end{array}$$

$$\begin{array}{r} 100100001 \\ \times 10010001 \\ \hline 11001000011 \times 2^4 \\ \checkmark \end{array}$$

$$= 11001.000011000011 \times 2^0$$

"test" relevant segment

$\text{Q} = 16 + 8 + 1 + 2^{-5} + 2^{-6} + 2^{-11} + 2^{-12} = 001.001_2 = 0.1$

$x = 16 + 8 + 1 + 2^{-5} + 2^{-6} + 2^{-11} + 2^{-12} = 25.047$

floating point representation notes

#  $x = 5A2C0 - 0$  floating point number 4.7 fixed point 508,000 8.000

$y = 3B5A0 - 0$  floating point number 2.8 floating point 158.000 8.000

$x * y = ?$  floating point number 9.7 floating point 98.000 8.000

\* Registers file is designed to hold integers values



Add. S \$f<sub>0</sub>, \$f<sub>1</sub>, \$f<sub>2</sub>

(single point operation)

Add. d \$f<sub>0</sub>, \$f<sub>1</sub>, \$f<sub>2</sub>

Add \$t<sub>0</sub>, \$t<sub>1</sub>, \$t<sub>2</sub>

Add. S \$f<sub>0</sub>, \$t<sub>0</sub>, \$f<sub>2</sub> X (Wrong)

floating point + integer evaluate later,

you must convert

\$f<sub>2</sub> = 5.04

+ 5 (integer)

\$t<sub>0</sub> = 5

5.00

\* Always convert Integers to F. Point.

## Memory Transfer Inst

l.s \$f0,100(\$t2); load word into \$f0 from \$t2 + 100  
 s.s \$f0,100(\$t2); store word from \$f0 into \$t2 + 100

## Data movement betn Registers

mov.s \$f0,\$t2 ; move betn F.P. registers conversion  
 mfs \$t2,\$f2 ; move from FP register (no conversion)

mfc1 \$t2,\$f2 ; move to FP register (no conversion)

## Data Conversion

cvt.w.s \$f2,\$f4 ; convert from single precision to integer

✓ original FP to Integer

cvt.s.w \$f2,\$f4 ; convert from Integer to single precision FP

W3 integer

5.1,5.1,5.1,5.1

2.32 32 3.32  
 (padding to 32 bits)

ptp,\$f2,\$f4 2.32 3.32

Convert X 2.32,3.32 2.32

float 32 bits register, it may result

overflow, underflow

W3 float

Convert float of register format should be

$$\begin{aligned}
 \$f_0 &\quad \$S_0 \quad \$Z \\
 (\text{float})x &= (\text{int})y + (\text{float})@z; \\
 (\text{int})y &= (\text{float})x + (\text{float})z; \\
 (\text{int})z &= (\text{float})x + (\text{float})y;
 \end{aligned}$$

Datapath  $\rightarrow$  functionalities & connectivity

(all those instructions executed over the datapath)

parity addition & opcodes  $(R, I, J)$

- fetch
- decode } 3 stages in the lifecycle of
- execute } an instruction



instruction / instruction



address  
provided by  
program counter  
inside the memory

Fig. Datapath



P Counter is the first  
in Datapath



AND \$10, \$11, \$12  
Note: except shifting operation

to deposit out in register & spaces  
operations seen in slides.

Computer architecture

Notes

# Chapter 4  
The Processor

Datapath is inside the processor

Program Counter need to fetch instruction from Instruction Memory.

→ (provides the address of 2M)

\* ALU is not needed for J-type instructions.

• Can't just join wires together. Use multiplexers

Add \$8, \$8

R → Add \$5, \$6, \$7

Addi \$5, \$6, 20

lw \$5, 40(\$6)

R → sub \$5, \$6, \$7

- two inputs, one of them selected based on operation



R → AND \$5, \$6, \$7

For these 3 R-type operations, they perform diff. operations.

SD State element

→ (flip flop, register)

SD state, state preserve kore rakhte pare unless asked to change.

(ALU is combinational so cannot hold data)

Single Cycle Datapath

- we set clock cycle by the load req'd.



• longest delay determines clock period

all instructions  
within this range

\* [load takes the longest]  
instruction  
for finishing execution

### \* Building a datapath

Datapath: Elements that process data and addresses in the CPU

First component needed for instruction fetch: Program Counter

- during decode, we understand the type & what the instruction is.

Add \$5, \$6, \$7



# Single cycle datapath

Read op  $\Rightarrow$  no need of signal

Write op  $\Rightarrow$  we need write op signal

ALU needs signal to know to perform what operation.

This datapath can execute almost all types of R-instructions, but not SLL.

Sub \$5, \$6, \$7  
Add \$5, \$6, \$7  
AND \$5, \$6, \$7

(more extended)

lw \$4, 40(\$5); sw \$6, 48(\$5)



I-type instruction

\* (backlink and Reg Write  
theakbera, see pic) \*

MEM write, not MEM read

For shift, we need a diff. datapath.

Designing a datapath for executing

#  $bne \$10, \$11, \text{ELSE}$



#

J Exit



26bit

4bit

26-bit

100

PC



# Design datapath for

Add i \$10, \$20, 100



\* datapath with control (Slide)

→ imp for finals

1. ADD  
2. Sub  
3. lw } R.F. used by all these instrn

one clock cycle → how much time load needs

|                | 1 | 2 | 3 | 4 | 5  | 6  | 7 | clock cycle |
|----------------|---|---|---|---|----|----|---|-------------|
| I <sub>1</sub> | F | D | E | M | WB |    |   |             |
| I <sub>2</sub> | F | D | O | E | M  | WB |   |             |

Instruction 1

2

.

.

|                | 1 | 2 | 3 | 4 | 5  | 6 | 7 | 8 | 9 |
|----------------|---|---|---|---|----|---|---|---|---|
| I <sub>1</sub> | F | D | E | M | WB |   |   |   |   |
| I <sub>2</sub> | F | D | E | M | WB |   |   |   |   |
| I <sub>3</sub> | F | D | E | M | WB |   |   |   |   |
| I <sub>4</sub> | F | D | E | M | WB |   |   |   |   |
| I <sub>5</sub> | F | D | E | M | WB |   |   |   |   |

Pipeline improves the

performance of

system and throughput

uses less width + less width

$A[\text{offset}]$

$$A[5] = A[6] + A[B[4]] + 5;$$
$$B[i] = A[5] - 6;$$

$$A = \$10$$
$$B = \$11$$
$$i = \$8$$

- $lw \$t0, 24(\$s10)$  //  $A[6]$
- $lw \$t1, 16(\$s11)$  //  $B[4]$
- $sl \$t1, \$t1, 2$  // shift 2 bits
- $add \$t1, \$t1, \$s10$
- $lw \$t1, 0(\$t1)$
- $add \$t2, \$t0, \$t1$  //  $A(6) + A(B(4))$
- $addi \$t2, \$t2, 5$  //  $A(6) + A(B(4)) + 5$
- $sw \$t2, 20(\$s10)$  //  $A[5] \dots \dots$
- ~~addi \$t~~
- $lw \$t0, 20(\$s10)$
- $addi \$t0, \$t0, -6$

## Process Synchronization



Bounded Buffer / Producers Consumers Problem



We use Semaphore to solve this

① Mutual Exclusion

S : 0 or 1 (mutex / Binary) [C.S]

E : size of the buffers [how many empty slots]

F : 0 [no. of filled slots]

Quiz 2 - 28<sup>th</sup>

Threading, CPU Scheduling

Create a class  
that will extend thread

Quiz 03 Deadlock, Semaphore

myclass

thread  
(overide)

concurrently

myclass.run(); // myclass.start();

Thread.currentThread.getId();

Get current thread.

sleep → synchronize  
(idle time like okhi)  
try catch  
button middle  
work

public

start;

synchronized

d. Thread T1, T2

Method { run model(); → gah, I'm in r. mode / sleep(1000)  
wake up(); → Emmm got up from sleep

Output:

## Computer Organization and Design

D.A. Patterson

5th edition, Morgan Kaufmann

## • DLD

- Simplifications (not k-map; use Boolean Algebra)

## • MUX (multiplexers design)

## • ALU (arithmetic logic unit)

## • Adder

## • Registers / Shift registers

## • Internal Structure of Register File

## 32-bit Architecture

$$x = \underbrace{y}_{\$10} + \underbrace{z}_{\$11} + \underbrace{w}_{\$12}$$

Add \\$10, \\$11, \\$12 - R

D S<sub>1</sub> S<sub>2</sub>

| OP    | R <sub>s</sub> | R <sub>t</sub> | R <sub>d</sub> | Shft Fun       |
|-------|----------------|----------------|----------------|----------------|
| 00000 | 01011          | 01100          | 01010          | 00000   xxxx++ |

feed this and this is what the processor will understand

- convert into assembly instructions and identify types (R, I, J)
- each form has diff types of encoding
- all the instructions will be 32-bits.

38A + 58B + 38C + 38D + 38E + 38F = H

38A + 58B + 38C + 38D + 38E + 38F - P = E

(38A) (58B) (38C) (38D) (38E) (38F) = E

(38A + 58B) . (38C + 58D) . (38E + 58F) =

(38A + 58B) . (38C + 58D) =

$$Y = \overline{A \cdot B \cdot C} (A \cdot B + A \cdot C)$$

| A | B | C | $A \cdot B \cdot C$ | $\overline{A \cdot B \cdot C}$ | $A \cdot B$ | $A \cdot C$ | $A \cdot B + A \cdot C$ | $Y$                                |
|---|---|---|---------------------|--------------------------------|-------------|-------------|-------------------------|------------------------------------|
| 0 | 0 | 0 | 0                   | 1                              | 0           | 0           | 0                       | 0                                  |
| 0 | 0 | 1 | 0                   | 1                              | 0           | 0           | 0                       | 0                                  |
| 0 | 1 | 0 | 0                   | 1                              | 0           | 0           | 0                       | 0                                  |
| 0 | 1 | 1 | 0                   | 1                              | 0           | 0           | 0                       | 0                                  |
| 1 | 0 | 0 | 0                   | 1                              | 0           | 0           | 0                       | 0                                  |
| 1 | 0 | 1 | 0                   | 1                              | 0           | 1           | 1                       | $\rightarrow (A \cdot B \cdot C')$ |
| 1 | 1 | 0 | 0                   | 1                              | 1           | 0           | 1                       | $\rightarrow (A \cdot B \cdot C')$ |
| 1 | 1 | 1 | 1                   | 0                              | 1           | 1           | 1                       | 0                                  |



$$\bar{Y} = A'B'C' + A'B'C + ABC' + A'BC + AB'C + ABC$$

$$\bar{Y} = Y = \overline{A'B'C'} + \overline{A'B'C} + \overline{ABC'} + \overline{A'BC} + \overline{AB'C} + \overline{ABC}$$

(product of sum)  
pos

$$\Rightarrow Y = \overline{(A'B'C')} \overline{(A'B'C)} \overline{(ABC')} \overline{(A'BC)} \overline{(AB'C)} \overline{(ABC)}$$

$$= (A+B+C) \cdot (A+B+C') \cdot (A'+B+C) + (A+B+C') \cdot (A'+B+C')$$

$$(A'+B+C) \cdot (A'+B'+C')$$

# If S

Then  $y := B$

ELSE  $y := A$



$$\bar{D}B A + \bar{D}A = B$$

$$D B + \bar{D} A = A$$

$$D B A + \bar{D} B \bar{A} = B C$$

Encoded term

in standard form S

$S = D B + \bar{D} A$

or  $D B + \bar{D} A + B C$

8 muddled  
stage.



PLA → Programmable Logic Array

Functionality - Boolean expressions



$$y_1 = A\bar{B} + \bar{A}B\bar{C}$$

$$y_2 = A\bar{B} + A\bar{B}C$$

$$y_3 = \bar{A}B\bar{C} + ABC$$



$$y_1 = A\bar{B} + \bar{A}B\bar{C}$$

$$y_2 = A\bar{B} + ABC$$

$$y_3 = AB' + B'C' + A$$

$$y_1 = \underline{\underline{ABC}} + \underline{\underline{ABC}}$$

$$y_2 = \underline{\underline{ABC'}} + \underline{\underline{ABC}}$$

$$y_3 = \underline{\underline{AB'C'}} + \underline{\underline{AB'C}}$$

6 AND  
3 OR

PLA is always  
one circuit

regardless of how many expressions,  
PLA is one circuit.

#

RCA

Ripple Adder (full adder)



Ripple Carry Adder  
(RCA)

not efficient

It is called a ripple carry adder because each carry-bit gets rippled into the next stage.

The carry has to propagate from LSB to MSB.



### Conditional Sum Adder / Fast Adder



$$A - B = A + \bar{B} + 1$$

$A + B$

Borrow



00: Add/Sub

01: A and B

10: A or B

ALU - Arithmetic Logic Unit  
(2 mux)



To build the register, we need flipflops.

## # Registers



32-bit register

- When we read, we're to read the entire registers.

\* exception: 1b (allows to access first 8-bit of the registers)  
byte load byte

- 1b → load byte (allows to access the first 8 bits of the register)

otherwise, we've to take /read values for the entire register

|           |         |
|-----------|---------|
| 1 0 0     | 1 0 0 0 |
| 1 0 0 0   | 0 1 0 0 |
| 1 0 0 0 0 | 0 0 1 0 |

left shift

L.S.

$x * 2^i$

right shift

R.S.

$x / 2^i$

i → no. of bit shift

left shift operation (multi)

R.S.

$x / 2^i$

right shift operation (div)

our notation

other friend

current bitwidth

MSB  
(bit 31)

(32 bits wide)

LSB  
(bit 0)

31 30 29 ... L ← R 28 27 26 25 24 23 22 21 20

3 bits latched = 11

i starts at 0 and increases from R to L

Q2 Q3 - 53

$$x = 3y + 5z$$

$$y = 4x + 3z$$

$$z = 5x - 6y$$

Add  $\$10, \$11, \$12$  — R type ins

mult  $\$11, \$12$  — R type ins

Result  $[L_0, R_1]$

### Designing Right Shift Registers



Right Shift Register

Working of RSR

F 00 : Feedback

random no.

R 01 : Reset

coming from some  
external source

10 : Set

11 : External data

82J

(Shift right 2S)

82M

(0 fid)

(IE fid)



Left Shift Registers

|   |                    |
|---|--------------------|
| 0 | 00 : External data |
| 1 | 01 : Set           |
| 2 | 10 : Reset         |
| 3 | 11 : FB            |



See:  
next page



00 : LS

11 : RS

01 : Reset

10 : External data

Chapter 1

- Moore's Law

- API

- Interface

- Parallel Processing



Application Software is written in high-level language.

Controller

Control Unit responsible for allowing CPU to carry out operations.

- datapath: components & path connecting CPU

Abstraction - hiding the detail from user

Computer Architecture

- designing, selecting, establishing interconnection
- interface to achieve some goals like functionalities.



set of instructions { Add, sub, br,  
beq, S11, }

$$x = y + z$$

$\downarrow \quad \downarrow \quad \downarrow$

$\$10 \quad \$11 \quad \$12$

{ MIPS }

Add \$10, \$11, \$12  
 $\downarrow \quad \downarrow$   
Destination Register  
 $S_1 \quad S_2$

\* definitionen schatte diagramm.

memory tends to get smaller higher-up. the memory hierarchy.



• multiprocessors share a RAM / memory.



We need trade-offs when considering performance.  
• Response time and throughput.

### Relative performance

$$\frac{P_x}{P_y} = \frac{E_y}{E_x} = n$$

$P \rightarrow$  performance

$E \rightarrow$  execution time

standard for fastest with slowest times

$x$  is  $n$  times faster than  $y$



Q. If program has 8 clock cycles, then,  
CPU time?

Ans: CPU Time = 25 s

for comp B; we need higher clocks i.e. higher GHz. \$ data  
\$8\$ ~ 28

| T <sub>1</sub> | T <sub>2</sub> | T <sub>3</sub> | T <sub>4</sub> | T <sub>5</sub> | $R_E = 3 + 4 \times 2 + \text{others waiting}$ |
|----------------|----------------|----------------|----------------|----------------|------------------------------------------------|
| 1s             | 1s             | 1s             | 1s             | 1s             | $(3+4) - (1+2) = 6 \text{ cycles}$             |

so program takes 5 seconds to complete



\$8\$ / \$28\$ \*

RISC vs. CISC

longer programs → \$8\$ ~ \$28

instructions in bus

→ fast memory → \$8\$ ~ \$28

Computer Language

Machine Language → Assembly Instruction

- identify the registers which store the values of variable



(Aug 2019 notes)

$\$t_0 \sim \$t_7$  → symbols represent base offset arguments  
 $\$S_0 \sim \$S_7$

$$S \rightarrow f = (g + h) - (l + j)$$

position offsets  $+ 1 \times 4 = L$



\*  $\$0 / \$2000$

\*  $\$ra / \$31$  → give return address to main func

$\$sp$  → points to stack

$\$fp$  → frame-pointer points to stack

$\$a_0 \sim \$a_3$  → argument registers used in assembly instruction

$\$v_0 \sim \$v_1$  → return func



load (memory  $\rightarrow$  register)

store (register  $\rightarrow$  memory)



$$\boxed{\begin{array}{ccc} \text{store} & & \text{load} \\ A[10] = y + A[6] & & \end{array}}$$

• memory is byte-addressed

Processor operations mostly involve processing data.

To speed up the processor operations, the processor includes some internal memory storage locations, called registers.

The registers store data elements for processing without having to access the memory.

A limited no. of registers are built into the processor chip.

Registers located on (processor) chip (or the board)  
Registers are fast (or the fastest)

## # Registers vs. Memory

- Load and store operations are Memory Operations.

32 registers with each 32-bit.

Some of these are reserved.

T-series (Temporary value) } available to us

S-series (Save value)

$$\text{load} \quad \text{store}$$

$$[0]A + B = [01]A$$

- spill

- optimization

\$10      \$11      \$12

$$x = y + 2;$$

$$x = y + 10;$$

Add \$10, \$11, \$12  $\rightarrow$  R-type ✓

Add i \$10, \$11, (10)  $\rightarrow$  I-type ✓

load, store,  
immediate

$$x = y - 10;$$

X Subi \$10, \$11, 10 (doesn't exist; No subtract immediate)

Instruction; Just use a negative

Addi \$10, \$11, -10 constant)

Please  
Note:

Registers are processor components that hold data and address.

$$2^6 = 32$$

Fig: Registers file



Source A 32 - 10

ALUOP

32

Add \$10, \$11, \$12

ALU

32

D

Rs

Rt

$$2^9 = 128$$

- Registers size on same operation by

Size



(no. of register

changed,

bit-length changed)

- \* Better to encode numbers rather than names.

| OP     | Rs    | Rt    | offset    |
|--------|-------|-------|-----------|
| 100110 | 01011 | 01010 | 0-0101000 |

I-format

lw, \$10, (\$40(\$11))  
load  
D offset

lw \$10, 40(\$11)  
load

sw \$12, 44(\$11)  
store

sw \$12, 44(\$11)  
store

offset \* 4

Address where the first element of array stays is the base element.

Base Address

Address where the array's first element resides.