

# Computer Architecture

Moore's law: Observation (Pseudo law) that number of transistors in an IC doubles every 2 years.



Processing power increased by 50% every year until 2002, then 20% because of heating (max power consumption limit).

## Hierarchy of Languages



## Von Neumann Architecture (Stored program concept)

5 components





### Secondary Memory

Magnetic disks, optical (CD/DVD), tape, flash-based  
USB pendrives, CF cards, floppies

### IC Technology

#### IC cost

$$\text{Cost per die} = \frac{\text{Cost per wafer}}{\text{Dies per wafer} \times \text{yield}}$$

$$\text{Dies per wafer} \approx \frac{\text{Area of wafer}}{\text{Area of die}}$$

$$\text{Yield} = \frac{1}{[1 + (\text{Defects per area} \times \text{Area of die})]^2}$$

no. of transistors/chip increased

SOI, mSi, LSi, vLSI, uLSI  
small scale Integration      ultra large

unit cost of chip  
decreases with volume  
of production

→ Physical dimension limit  
→ Power consumption limit.

for sinks heat      freq of clk  
reduced base on usage

### Instruction Set design

Instruction set → interface between hardware and software.

↓  
defines a machine

Processor →  
 1) Fetches instruction from memory  
 2) Executes instruction from memory } loop

Eg:  $a := b + c$

a, b, c → operands

a addition (+) → operation

MIPS → 32 registers      each of 32 bits (word)

2 bytes → half word      4 bytes → word



# MIPS Instructions

1) add <res>, <op1>, <op2>  
Eg: add \$S3, \$S2, \$S1

2) addi <res>, <op1>, <const>

Eg: addi \$S2, \$S1, 100

3) lw <dst-reg>, <offset>(<base-reg>)

Eg: \$S1, 4(\$S0), 850 stores an address

4) sw <offset>(<base-reg>) <src-reg>

Eg: 12(\$S0), \$S5  
L must be multiple of 4  
multiple

5) sub

6) mul R0 Rs Rt

7) muli

8) sll (<<)

9) srl (>>)

10) and, andi (&)

11) or, ori (|)

12) nor (~not)

13) beq <reg1>, <reg2>, <branch-lgt>

(branch if equal to)

14) bne <reg1> <reg2> (branch if not equal)

15) j <jumptarget>

label in assembly  
offset in machine

16) slt <dst>, <reg1>, <reg2>  
(set less than) if reg1 < reg2 → dst = 1, else 0

17) slti

18) addu, addiu (unsigned addition), subu, sllu, slli

19) mult Rs Rt → need to use HI, LO, multu

20) MFLI, MFLO

div → R Q  
mul → higher 32 lower 32

21) jal, jalr, jtr, jr

v-return  
a-args  
sp-stack  
ptr  
gp-global  
ptr  
k-kernel  
ra-return  
address  
gp, sp, fp, ra

MIPS registers  
\$a0-\$a3, \$v0-\$v3 (\$1, 2, 3, 4-7)  
\$s0-\$s7 are 16-23 registers  
\$zero → 0th register (always 0)  
\$t0-\$t7 (tempregs) (8-15)  
\$t8, \$t9 (24, 25), \$t10-\$t13 (26, 27)  
\$t14-\$t17 (28, 29, 30, 31)

## Program Counter (PC)

L 32-bit reg pointing to curr ins.  
PC+4 (if no special instruction  
to jump)  
4 bits in 32 reg

## Pseudo instructions

move, li, la, bge, bgt, ble

lw can be a Pseudo inst if used as  
lw \$S1 LABEL

BGTZ, BGEZ, BLEZ, BLTZ, BNEZ  
if Z=0

## Load and Stores

LB, LBU, LH, LHU, LW

↑  
[ Unsigned ]

the MSB is extended for the higher bits

LSB, SB, SH

Eg: MFLI \$S2, \$S2 = \$HI



Week - 2

## Instruction Encoding

32 bit encoding

3 types

I

1) R-type (reg-reg operations)

|        |    |    |    |       |       |
|--------|----|----|----|-------|-------|
| opcode | rs | rt | rd | shamt | funct |
| 6      | 5  | 5  | 5  | 5     | 6     |

2) I-type :- loads, stores, add, multi, etc., bne, beq, j, jr

|        |    |    |                                |
|--------|----|----|--------------------------------|
| opcode | rs | rt | imm / const / offset rel to PC |
| 6      | 5  | 5  | 16                             |

3) J-type

|        |                      |
|--------|----------------------|
| opcode | offset rel to PC + 4 |
| 6      | 26                   |

j, jl, trap and return



## Function Call Support

jal → jump and link

jal F-LBL

↳ \$ra = PC+4

↳ One of MIPS reg  
↳ control to F-LBL

\$a0, \$a1, \$a2, \$a3 → args

\$v0, \$v1 → return

In the function T

:

:

jr \$ra

↳ Jump reg.

Nested calls → Use stack type memory

Memory (Stack goes down)



- Caller saved <sup>↑ unpreserved</sup> regs: \$a, \$v, \$at → saved by caller if cares
- Callee saved <sup>↑ preserved</sup> \$ra, \$s, \$t → must be saved restored by callee

Frame pointer (\$fp - MIPS32) + easier use



← \$FP → easier to use (offset)

well wrt fp → ref will be same

but sp → changes with change in sp

and easier to res' pop the stack

## Memory Organisation



↳ \$gp → points to global vars in data section

↳ \$apc (program memory)

## Pseudo Instructions

li - addi/ori

beqz - beq \$0 \$zero

bnez - bne \$zero

• Half words/bytes

lb, lh, lbu, lhu

sb, sh, but no sbu, shu

lb → sign extends the MSB, i.e. if  $1001001 \rightarrow$   $\underbrace{111}_{24} \underbrace{11001001}$

lbu → appends 0's →  $\underbrace{0000}_{24} 01001001$

\$at → assembler temporary → manages pseudo instructions

Linker:

HLL code to Running

Assembly Language (.S) → Object files (.O)

GNU <sup>gcc</sup> <sub>gt</sub> → compiler, assembler, linker

Machine language (.Object) → executable

Linker loaded → run  
Loading → brings program from secondary memory to main memory and ready for execution

Contents of Object file

1. Header

2. Text Segment

3. Static data

4. Relocation Table (list of unresolved inst's) / off

5. Symbol Table (table of unresolved symbols)

→ where vars are loaded into memory

6. Debugging info

Linker → generate exe files

- Organize text and data as it would appear in memory
- Resolve symbols (memory allocated)
- Rewrite instns referred in reloc table

No Order

- Place text and data in mem
- Init stack, other reg., including reg

→ main

Dynamic Linking

of loads

- Small exe files better memory
- new libs can be used easily
- $\text{fptr} \rightarrow$  function pointers supported

main.s → aux.s

data Y

data X

main:

F:

jal F - I<sub>1</sub>

Jal G

lw \$50x - I<sub>2</sub>

Lw \$51x - I<sub>3</sub>

G:

Sw \$4 \$53 - T

Lw \$51, Y - T

Reloc Table

I<sub>1</sub>, I<sub>2</sub>, I<sub>3</sub>

(no X) (no Y)

addresses of v

in main mem

not determined

Reloc Table

I<sub>1</sub>, I<sub>2</sub>, I<sub>3</sub>

(no X) (no Y)

Symbol Table

F, G, Y

main.f

Boot loader loads OS

OS loader

calls loader



## Arithmetic In MIPS

MIPS → ~~uses~~ 2's complement

unsigned representations.

add → addu, addi → addiu, sub → subu, slt → sltu, slti → sltiu,  
addu, addi → separate ~~if~~ branch conditions

mult, multu → hi, lo

Upper 32 Lower 32

div, divu → hi, lo

remainder Quotient

hi, lo → not part of 32 bits

mfhi <destreg> (moves from hi to destreg)

mflo

## Floating Point representation

• Sign, significand, exponent

↳ Single Precision →  $1 + 8 + 23$  significand

↳ Double Precision →  $1 + 11 + 52$  significand

Subnormal numbers → NAN

Arith

floating point

↳ represent out of bounds

add.s, add.d, sub.s, sub.d, mul.s/d, div.s/d

single prec double prec

Cmp eq, ne, lt, le, gt, ge

c.eq.s, c.c.le.d.

12 cmp instructions

sets a special bit to 1 if true

bcl t, bclf → next instruction

↳ takes branch

if

doesn't take branch

true  
tif  
false

tif

coprocessor

used to do FPA

lwcl, ldcl, swcl, sdcl

double

## Dynamic linking

↳ Instruction (function call) points to a dynamic linker  
that resolves the instruction  
jahr → the pointer to func is loaded in reg and  
jumped to add in reg



MIPS → ~~32~~ 32 × (32 bit regs) For FP

can be considered

16 × (64 bit regs) for double precision

### Memory operations

lwct, ldc1, swct, sdct

## Computer Performance

### → Execution Time

#### ① Computer Performance equation

$$\text{Prog. exec. time} = \# \text{instructions} \times [\# \text{cycles/instr}] \times \text{clock-cycle time}$$

can be got from only no. of exec programs

- Profiling
- simulators
- h/w counters

can be easily calc from calc from instruction mix clk freq

$$CPI_{avg} = \sum CPI \times f_i$$

#### Factors affecting performance

Types: arithmetic  
memory  
branch

#### Factor      Affected aspects

Algorithm      #ins, CPI

HLL      #ins, CPI

Compiler      #ins, CPI

ISA      #ins, CPI, cycle time

H/W impl      CPI, cycle time

Moore's Law: Performance improvement is limited by the fraction of program you are improving



## Benchmark, Workload ?

and computer performance

- ↳ a specific program that measures the CPI analysis
- SPEC Benchmarks → System Performance Evaluation Corporation
- SPEC-2000 → glibc, gcc
- ↳ collection of programs

## Hardware Implementation of MIPS

Sequential circuit



Seq + comb



Setup time (s) :- Time before active edge for which inputs must be held constant

Hold time (h) :- The min time inputs to be held const. after active edge

Delay time (d) :- time for seq circuit

Then  $C > (s+h+d)$

↳ clock cycle time

## Data and Control Path

↳ State element  
↳ Combinational  
↳ Interlock

↳ when to write  
↳ what to write  
↳ state transition

## Single Cycle Implementation

↳ CPI = 1

Subset of instructions ↳ add, sub, and, or, sll (R-type),  
↳ lw, sw (I-type)

↳ beq (I-type)

↳ j (J-type)

↳ reg-reg + opcode - 000000 differentiated by func

- Fetch (inst)
- Execute ↳ (1) read register  
(2) Instruction specific  
(3)  $PC = PC + 4$

## Fetching of Instruction



Sequential!

LW is only instruction, where only one reg is mod  
Instruction Execution



RegWR → Bit to determine if a register is needed to be written

## ALU: Arithmetic Logic Unit



## Data Memory



## PC computation



For beg exec :





11 bit Control lines in the above diagram

- **RegWr** (Should Reg be written)
- **Reg Dst** (Rd vs Rt)
- **Mem2 Reg** (ALUout vs MemOut?)
- **ALUSrc** (Imm vs Reg2 Val)
- **ALU Op(4)** (Type of ALU operations)
- **MemRd** (Mem be Read?)
- **MemWr** (Mem be Written?)
- **Branch** (Branch?)



To ~~use~~ reason for 2 control units to generate 1bit ctrl lines faster to reduce clk cycle

|         | RegDst | RegWr | PC Src<br>Registers | MemRd | Memory | MemWr | MemReg  | Branch ALUOp |
|---------|--------|-------|---------------------|-------|--------|-------|---------|--------------|
| Reg-Reg | Rd(0)  | 1     | 0                   | 0     | 0      | 0     | 0       | 10           |
| LW      | Rt(1)  | 1     | (SignExtend)        | 1     | 0      | 1     | Branch! | 00           |
| SW      | x      | 0     | 1                   | 0     | 1      | x     | 0       | 00           |
| beq     | x      | 0     | 0                   | 0     | 0      | x     | 1       | 01           |
|         |        |       |                     |       |        | x     | x       | x            |

PC Out → 10 → Depend on funct  
 Auxiliary → 00 → Add  
 → 01 → Subtract



jump(jr)



|    | RegDst | RegWr | PC Src | MemRd | Memory | MemWr | MemReg | Branch ALUOp | j jr |
|----|--------|-------|--------|-------|--------|-------|--------|--------------|------|
| JR | x      | 0     | x      | 0     | 1      | 0     | x      | x            | x    |

Alternate DP, CP, Modif to J, JR



The Cycle should be level triggered since there are dependencies.

PC-write is triggered with latter half of cycle to prevent PC writing multiple times in same clk cycle

Reg-write also on latter half  
Number also on latter half (efficiency improved only)

→ CLK cycle adjusted for longest instruction

Inefficiencies

→ Cost (lot of hardware)