

IT

Performance and NumericalChapter - 1 (Patterson)★ (1.6 Performance)1.1) Introduction:

Traditional classes of computer appl'n

1) PC (Personal Computer)

comp used under another  
device,

2) Servers

3) Supercomputer, Embedded computers.

Decimal term

Binary form.

Kilobyte KB  $10^3$  $2^{10}$ megabyte MB  $10^6$  $2^{20}$ Gigabyte GB  $10^9$  $2^{30}$ Terabyte TB  $10^{12}$  $2^{40}$ 

## \* 8 great ideas in computer design

1) Design for Moore's law.

2) Use abstraction to simplify design.

3) Make common case fast - enhance performance

4) Performance via parallelism - oper in parallel.

5) Performance via pipelining

6) Performance via prediction

7) Hierarchy of memory. - cheap at bottom level

8) Dependability via redundancy. - keeping backup.



| M         | T | W | T | F | S | S     |
|-----------|---|---|---|---|---|-------|
| Page No.: |   |   |   |   |   |       |
| Date:     |   |   |   |   |   | YOUVA |

Integrated circuit - also called chips - they do the advance technology.

Processor - operates at home clock rate.  
- also called as CPU.

5 part of computer - Input - Memory  
- Output - Data flow path  
- Control

Processor has two part, data path, control.

data path - performs the arithmetic opern.  
control - it tells datapath, memory etc. what to do.

Memory is build from DRAM chips. RAM provide access to any location in memory.

Processor has one more type of memory - Cache memory

Cache memory - consists of small, fast memory that acts as buffer for DRAM.

Cache is build using SRAM - static RAM) it is faster but less dense & is expensive.

#) Abstraction interface between Hardware and lowest-level software is Instruction Set Architecture.

it is all that programmer needs to know to run and write machine language programs.

- Page No. \_\_\_\_\_  
Date. \_\_\_\_\_ Yousuf
- \* Application Binary interface (ABI) - instruction set plus the operating system interface used by programmers is (ABI)
  - \*) each lower level hides details from above level called abstraction.
  - One key interface between levels of abstraction is IEN.

#] Volatile memory — Hold data when power prest.

Non volatile memory — Store data.

Primary memory — Volatile. — DRAM's

Secondary memory — Non Volatile.

↓  
magnetic disk, flash memory

#] Performance

Performance metrics are many like.

Response time also called Execution time.

Exn time. — total time required for computer to complete a task

Throughput: - Bandwidth = no of task completed per unit time.

To maximize performance we want to minimize the execution time.

of computer X.

$$\leftarrow \boxed{\text{Performance}_X = \frac{1}{\text{Execution time}_X}}$$

Performance of X > Performance of Y

$\therefore \text{Performance}_X > \text{Performance}_Y$

Execution time Y > Execution time X.

\* X is n times faster than Y

Exn time of Y is n times that of X.

$$\frac{\text{Performance}_X}{\text{Performance}_Y} = \frac{\text{Exn time}_Y}{\text{Exn time}_X} = n.$$

Q Computer A run in 10s.  
Computer B run in 15 sec.

How much A is faster than B

$$\frac{\text{Perf A}}{\text{Perf B}} = \frac{\text{Exe time B}}{\text{Exe time A}}$$

$$= \frac{15}{10}$$

$$\text{Perf A} = 1.5 \text{ Perf B}$$

we can also say, computer B is 1.5 times slower than that of A.

#) improve performance or improve Exe time is to decrease execution time, increase performance.

wall clock time = exe time = response time.

a) Computer tries to optimise throughput.

#) CPU Execution time.

also called CPU time - actual time CPU spends in performing a task.

- User CPU time - CPU time spent in program
- System CPU time - CPU time spent in OS.

System performance = time spent on system.

CPU performance = user time spent.

- \* Computers are controlled using clock, that determines when event take place in Hardware.

These discrete time interval is called as  
clock cycle.

→ (clock cycles, clock ticks,  
clock periods.)  
each.

Clock period = length of clock cycle. ex.  
= clock cycle time, (250 picoseconds)

Clock period  $\Rightarrow$  Clock rate. (ex 4 GHz)

$$\boxed{\text{Clock rate} = \frac{1}{\text{Clock period}}}$$

- \*  $\boxed{\text{CPU exec time} = \text{CPU clock cycle for a program} \times \text{Clock cycle time for a program}}$

$$\boxed{\text{CPU exec time} = \frac{\text{CPU clock cycle for program}}{\text{clock rate}}}$$

$$\boxed{\text{Clock cycle time} = \frac{1}{\text{Clock rate}}}$$

\* 
$$\text{CPU clock} = \frac{\text{Instruction for program}}{\text{cycles}} \times \frac{\text{Average clock cycle per instruction}}{\text{per instruction}}$$

$$\text{clock cycle per instruction} = \text{CPI} = \frac{\text{Average clock cycle per instruction}}{\text{per instruction}}$$

CPI - average no. of clock cycles each instrn take to execute.

CPI is average of all instrn run in program.

\* 
$$Ex^n = \frac{\text{CPU clock cycles}}{\text{time}} \times \text{clock cycle time}$$

\* 
$$Ex^n = \frac{\text{Instrn for program}}{\text{time}} \times \text{CPI} \times \text{clock cycle time}$$

\* 
$$\frac{Ex^n}{\text{time}} = \frac{\text{Instrn count} \times \text{CPI}}{\text{clock rate}}$$

$$\text{CPU clock} = \sum_{i=1}^n C_{Pi} \times C_i$$

Time measured in sec.

$$\text{Time} = \frac{\text{Sec}}{\text{Program}} = \frac{\text{Instructions}}{\text{Program}} \times \frac{\text{Clock cycle}}{\text{Instructions}} \times \frac{\text{Second}}{\text{Clock cycle}}$$

(2)

A

$$\text{EXN time}_A = 10^3 \cdot \\ \text{clock rate} = 2 \text{ GHz.}$$

B

$$\text{EXN time}_B = 6 \text{ s}$$

$$\frac{\text{Clock}}{\text{cycle B}} = 1.2 \frac{\text{Clock cycle}}{A}$$

$$\text{Clock rate B} = ?$$

$$E = \frac{\text{CPU clock cycle}}{\text{Clock cycle}}$$

$$E = \frac{\text{CPU Clock cycle.}}{\text{Clock rate.}}$$

$$\text{Clock cycle} = E \times \text{Rate.}$$

$$\frac{\text{Clock cycles}_B}{\text{Clock cycles}_A} = 1.2 = \frac{(EXR)_B}{(EXR)_A}$$

$$\frac{0.4}{1.2} = \frac{3 \times R}{10 \times 2 \text{ GHz.}}$$

$$R = 4 \text{ GHz clock.}$$

6

1

$$\text{Clock cycle time } t = 250 \text{ ps.}$$

n

$$CET - t = 500 \text{ ps}$$

$$CPI = 1.2$$

$$\frac{\text{Perf A}}{\text{Perf B}} = \frac{\frac{\text{Ex B}}{\text{Exn n}}}{\frac{\text{Exn n}}{\text{Instn cont}}} = \frac{(1.2 \times 500)}{(2 \times 250)} = 1.2$$

Q Both prag. run son.  
no. of instructions.

$n$  is 1.2 faster as of B.

① compiler designing trying to decide how few code  
seg. foll facts are present.

CPI for each instruction class -

|     | A | B | C |
|-----|---|---|---|
| CPI | 1 | 2 | 3 |

instruction code for two code sequences is,

Code Ser.

A B C  
2 1 2

7

$u$  ( )

instrn costs for each 'instr' day

| M         | T | W | T | F | S     | S |
|-----------|---|---|---|---|-------|---|
| Page No.: |   |   |   |   |       |   |
| Date:     |   |   |   |   | YOUVA |   |

which code seq executes more instrns.

Seq 1 :  $2 + 1 + 2 = 5$  instrns.

Seq 2 :  $4 + 1 + 1 = 6$  instrns

Seq 2 executes more instrns.

Q) Which will be faster.

finding total no of clock cycles in each sequence.

$$\text{clock cycles} = \sum (CPI_i \times C_i)$$

$$\begin{aligned} \text{Total clock cycles 1} &= 2 \times 1 + 1 \times 2 + 2 \times 2 \\ &= 10 \text{ cycles} \end{aligned}$$

To

$$\begin{aligned} \text{Total clock cycles 2} &= 4 \times 1 + 1 \times 2 + 1 \times 3 \\ &= 9 \text{ cycles.} \end{aligned}$$

as Seq 2 has less no of clock cycles.  
it will execute faster.

Q) What is CPI for each sequence.

Average CPI = Clock cycles per instrns

$$CPI = \frac{\text{clock cycle}}{\text{total instr}}$$

$$CPI_1 = \frac{10}{5} = 2.0$$

$$CPI_2 = \frac{9}{6} = 1.5$$

- Q) 400 MHz processor was used to execute branchless program with fall instrn.

Calculate CPI.

| Type | A | B | C | D |
|------|---|---|---|---|
|------|---|---|---|---|

|       |        |        |        |       |
|-------|--------|--------|--------|-------|
| Count | 450000 | 320000 | 150000 | 80000 |
|-------|--------|--------|--------|-------|

|             |   |   |   |   |
|-------------|---|---|---|---|
| clock cycle | 1 | 2 | 2 | 2 |
|-------------|---|---|---|---|

$$CPI = \frac{450000 + 2 \times 320000 + 2 \times 150000 + 2 \times 80000}{450000 + 320000 + 320000 + 80000}$$

$$CPI = 1.55$$

$$Ex^n \text{ time} = n \times CPI \times t \quad n = 1000000$$

$$= \frac{1000000 \times 1.55 \times}{400 \times 10^6}$$

$$Ex^n \text{ time} = 38.75 \times 10^{-6} \text{ sec}$$

#] MIPS million instrn per second.

It is a measurement of program execution speed. based on no. of million instruction.

MIPS is inversely proportion to Execution time.

Bigger MIPS means faster Computer.



$$\text{MIPS} = \frac{\text{Instruction Count}}{\text{Execution time} \times 10^6}$$



$$\text{MIPS} = \frac{\text{Clock rate}}{\text{CPI} \times 10^6}$$

(Q)  $2 \times 10^6$  instructions in code at 400 MHz processor  
prog has 4 ports each port has separate CPI  
and its 90%

(a) Avg CPI

2) MIPS rate from above CPI

| IT | CPI | 1 m/s   |
|----|-----|---------|
| A  | 1   | 60 m/s  |
| B  | 2   | 18      |
| C  | 4   | 12      |
| D  | 8   | 10      |
|    |     | 100 m/s |

$$\text{Avg CPI} = \frac{1 \times 60 + 2 \times 18 + 4 \times 12 + 8 \times 10}{100}$$

$$\underline{\text{Avg CPI}} = \underline{2.24}$$

$$\begin{aligned} \text{MIPS} &= \frac{\text{clock rate}}{\text{CPI} \times 10^6} \\ &= \frac{400 \times 10^6}{2.24 \times 10^6} \end{aligned}$$

$$\underline{\text{MIPS.}} = \underline{178.57}$$

Q) 3 processors P<sub>1</sub>, P<sub>2</sub>, P<sub>3</sub> executing some instrn set

P<sub>1</sub> Has 3GHz clock rate. 1.5 CPI

P<sub>2</sub> 2.5 GHz 1.0 CPI

P<sub>3</sub> 4.0 GHz 2.2 CPI

→ 1) which processor has higher instrn per sec rate.

$$\text{MIPS} = \text{CPI} \frac{\text{clock rate}}{\text{CPI} \times 10^6}$$

$$\therefore \text{Instrr/sec} = \frac{CR}{CPI}$$

$$1) \frac{3}{1.5} = 2$$

$$3) \frac{4}{2.2} = 1.8$$

$$2) \underline{2.5} = 2.5$$

$\therefore P_2$  Has Higher Instruction per sec.

b) processor exec time to see find no of cycles &  
no of instrn.

$$ET = 10$$

$$\rightarrow ET = \frac{\text{clock cycle.}}{\text{clock rate}}$$

$$\text{clock cycle} = ET \times \text{clock rate}$$

$$ET = T_c \times CPI \times t$$

$$ET = CC \times t$$

$$ET = \frac{CC}{CR}$$

$$CC = T_c \times CPI$$

$$T_c = \frac{CC}{CPI}$$

$$T_c = \frac{ET \times CR}{CPI}$$

$$CPI = \frac{10 \times 3 \times 10^9}{1.5} \quad C_{P2} = \frac{10 \times 2.5 \times 10^9}{1}$$

$$C_{P3} = \frac{10 \times 4 \times 10^9}{2.2}$$

c) We are trying to reduce exn time by 30%  
which results in  $CPI \uparrow 20\%$ .  
What do we have to do.

→  $ET \downarrow 30\%$

∴ New  $ET = 7$  sec.

$$CPI_{new} = CPI \times 1.2 \\ =$$

$$ET = \frac{CC}{CR}$$

$$ET = \frac{TC \times CPI}{CR}$$

$$CR = \frac{TC \times CPI}{ET}$$

$$CPI = \frac{TC \times CPI \times 1.2}{7}$$

Some for 203.

Instruction Format General discussion.

|           |   |   |   |   |   |       |
|-----------|---|---|---|---|---|-------|
| M         | T | W | T | F | S | S     |
| Page No.: |   |   |   |   |   |       |
| Date:     |   |   |   |   |   | YOUVA |

Instruction Set architecture.

Memory locn & address.

group of  $n$  bits called word,

$n$  = word length.

many computer has word length  $16 \rightarrow 64$  bits.

if word length of computer is 32 bits, a 32 bit word is stored. or 4 ASCII encoded character stored 8 bits each.

memory stored at address locn.

0 to  $2^k - 1$  as address locn in memory.

∴ for  $k$  bit in address, there are  $2^k$  address line.

∴ 24 bit address generates  $2^{24}$  locations.

$$2^{24} = 2^4 \times 2^{20}$$
$$= 16 M \quad \underline{\hspace{10em}} \quad 1M = 2^{20}$$





$$\text{kilo } \times = 2^{10} (1024)$$

\* Word length vary from 16-64 bits.

\* Byte addressable memory

ex if word length is 32 bit.

each word is located successive at  
0, 4, 8, ...  
with each word consisting of 4 bytes.

#] Big-Endian. 2) Little-Endian Assignment

Big-endian assignment

lower byte address are used. for most significant ad bytes

Little-endian address.

lower byte address  
i.e. Rightmost used for least significant bytes

|               |           |           |           |           |           |
|---------------|-----------|-----------|-----------|-----------|-----------|
| word address. | 0         | 0         | 1         | 2         | 3         |
|               | 4         | 4         | 5         | 6         | 7         |
|               | :         | :         | :         | :         | :         |
|               | $2^k - 4$ | $2^k - 4$ | $2^k - 3$ | $2^k - 2$ | $2^k - 1$ |

Big-endian assignment

|   |           |           |           |           |           |
|---|-----------|-----------|-----------|-----------|-----------|
| 0 | 3         | 2         | 1         | 0         |           |
| 4 | 7         | 6         | 5         | 4         |           |
| 8 |           |           |           |           |           |
|   | :         | :         | :         | :         |           |
|   | $2^k - 4$ | $2^k - 1$ | $2^k - 2$ | $2^k - 3$ | $2^k - 0$ |

Little-endian assignment

This is Byte & Word addressing.

v) Word alignment

If word is 16 bit (2 bytes)  
aligned byte address is 0, 4, 8, ...

If word is ~~32~~ 32-bit (8 bytes)  
aligned byte address is 0, 8, 16, 24, ...

v) 2 Basic opn of memory are Read & Write,

#) Instrn & Instrn sequencing

4 tasks are part a computer must have  
instruction to perform.

- 1) Data transfer b/w memory & processor
- 2) Arithmetic & logical operat.
- 3) Prog Seq. & Control
- 4) I/O transfer.

Name rep addresses of memory loc is,  
LOC, PLACE, A, VAR2)

Name of registers are R0, R1, R2, R3, R4, R5

Register in I/O System is DATAIN, OUT STATUS

eg  $R_2 \leftarrow [LOC]$

data at locn Loc address is transferred to  $R_2$

eg  $R_4 \leftarrow [R_2] + [R_3]$ .

Register Transfer Notation.

add content at Register  $R_2$ ,  $R_3$  and store at  $R_4$

LHS = locn RHS = value.

### \* Assembly language Notation.

ex LOAD  $R_2, LOC$

loading data at memory location  
LOC into register  $R_2$

load is read from memory

locn and ~~with~~ pl load in DR<sub>2</sub>

ex Add  $R_4, R_2, R_3$

$R_2, R_3$  are source.

$R_4$  defn.

Load - LD

Store - STR / ST

### [ ] RISC and CISC instruction set

These are two approaches for the design of instruction set in computers.

RISC - Reduced instruction set computers.

CISC - Complex instruction set computers.

## : RISC

- 1) Based on the approach that each instruction should occupy exactly one word in the memory.
- 2) Operands needs to execute an arithmetic or logic operation specified by instruction already present in registers.
- 3) In this approach the processor needs to process a sequence of instructions in 'pipelined' manner.
- 4) The activities are overlapped and total execution time is reduced.
- 5) The condition that each instruction must fit into the single word reduces complexity and no of instructions that can be used.

## CISC

- 1) This is alternative to RISC and make a more complex instruction that may span more than one word.
- 2) In this more complicated operations are used.
- 3) CISC was used prior to the RISC approach.

## #] RISC instruction set

### characteristics.

- 1) each instruction fits in a single word.
- 2) Load / Store architecture is used.
  - Memory operands are accessed using load & store instructions.

- All operands in arithmetic or logic opn must either be present in registers, or one of the two operands given explicitly within instrn word.

\* Load destn, source.

Load processor-reg, memory-loc.

$$\text{or } C = A + B$$

Load R<sub>2</sub>, A

Load R<sub>3</sub>, B

Add R<sub>4</sub>, R<sub>2</sub>, R<sub>3</sub>.

Store R<sub>4</sub>, C

Store → Source, destn.

Add = destn, src1, src2

Address.                      Content.

i                      Load R<sub>2</sub>, A

i+4                      Load R<sub>3</sub>, B

i+8                      Add R<sub>4</sub>, R<sub>2</sub>, R<sub>3</sub>

i+12.                      Store R<sub>4</sub>, C.

A

B

C

data

for the  
program.

Program Counter (PC) Holds the address of the next instruction to be executed.

To begin the program execution the address of the first instruction (here (i)) is placed in the (PC).

At each step 'pc' is increased by '4' to next instruction.

#] Executing a given instruction is a two-phase procedure.

first phase

IF Instruction fetch : instrn is fetched from memory locn whose address is in PC.

IR Instruction Register : instrn is placed in the IR in processor.

Second phase

IE Instruction Execute : instrn in IR is examined to determine which oper<sup>n</sup> is to perform

FO fetch operand : fetching operand from memory or processor register for ex<sup>n</sup>.

nc STORE : Storing result in destn location.

After IE phase is done, the PC contains address of next instrn & new IF phase begins.

## #] Branching

Branching is to make loop.

(list of n numbers, rep as NUM1, NUM2, ..., NUMn)

R2 is used as counter to determine no of times loop is executed.

Subtract R2, R2, #1 is done after each loop.

conditional branch causes the branch if only specified condn is satis. followed,

if condition is not satisfied PC incremented in normal way.

Here it will be

branch - H - [R2] > 0.      loop

store

Store : Storing result in dest'n location.

After IE phase is done, the PC contains address of next instrn & new IF phase begins.

## #] Branching

Branching is to make loop.

(list of n numbers, rep as NUM<sub>1</sub>, NUM<sub>2</sub>, ..., NUM<sub>n</sub>)

R<sub>2</sub> is used as counter to determine no of times loop is executed.

Subtract R<sub>2</sub>, R<sub>2</sub>, #1 is done after each loop.

Conditional branch causes the branch if only specified condn is satis. followed,

if condition is not satisfied PC incremented in normal way.

Hence it will be

branch - H - [R<sub>2</sub>] > 0.      loop

Load R<sub>2</sub>, N  
 Clear R<sub>3</sub>  
 Load Load R<sub>5</sub>, [Num]  
 Add R<sub>3</sub>, R<sub>3</sub>, R<sub>5</sub>  
 Subtract R<sub>2</sub>, R<sub>2</sub>, #1  
 Branch if [R<sub>2</sub>] > 0 loop.  
 Store R<sub>3</sub>, sum.

ex Branch-if-[R<sub>4</sub>] > [R<sub>5</sub>] written as

Branch-greater-than R<sub>4</sub>, R<sub>5</sub>, LOOP.

BGT R<sub>4</sub>, R<sub>5</sub>, loop.

## #] Addressing modes.

Addressing modes in RISC.

| Name.             | Syntax                             | Addressing Fn.                              |
|-------------------|------------------------------------|---------------------------------------------|
| Immediate         | # Value                            | opand = value.                              |
| Register          | R <sub>i</sub>                     | EA = R <sub>i</sub>                         |
| Absolute          | LOC                                | EA = LOC                                    |
| Register indirect | (R <sub>i</sub> )                  | EA = [R <sub>i</sub> ]                      |
| Indexed           | X(R <sub>i</sub> )                 | EA = [R <sub>i</sub> ] + X                  |
| Base with index   | (R <sub>i</sub> , R <sub>j</sub> ) | EA = [R <sub>i</sub> ] + [R <sub>j</sub> ]. |

The absolute address can be given only in 16 bit value not more than a 32 bit word.

15 bits are used to store value &  
0-15 bits are used for 16 bit no.

16-82 are used to sign extension.

\* Load operation only works on memory operands.

To load a number into a register we use

MOVE R4, #NUM

program for adding n numbers.

|       |                         |
|-------|-------------------------|
| Load  | R2, H                   |
| Clear | R3                      |
| Move  | R4, #NUM                |
| Loop  | Load R5, (R4)           |
|       | Add R3, R3, R5          |
|       | Add R4, R4, #1          |
|       | Subtract R2, R2, #1     |
|       | Branch if [R2] > 0 loop |
|       | Store R3, sum.          |

indirect addressing used.

\* Indexed mode.

EA generated by adding a const val to  
content of register

Indexing also called indexed register

$x(R_i)$

$$EN = x + [R_i]$$

x defines an 'offset' i.e.  
displacement from this location to a  
location specified.

\* Mnemonics, ST R2 50H,

ST is an operation code opcode.

Assembler translates this code & computer  
recognizes.

#) Subroutine,

( Block of instruction is called - subroutine )

That is executed in new block of  
main program.

After a subroutine is called the program must return  
execution to after the instrn that called the  
subroutine.

The PC saves the by the call instruction to enable to correct return to calling program.

The way in which a computer makes it possible to call and return from subroutine is referred as subroutine linkage method.

Simpler method is, save the return address in specific locn also called link register.

### v) Subroutine Nesting

It is to have one subroutine call another one subroutine.

### # Additional Instructions.

i) Logical Instruction, AND, OR, & NOT.

ex And R<sub>1</sub>, R<sub>2</sub>, R<sub>3</sub>.

Bitwise and between R<sub>2</sub>, R<sub>3</sub> and store result in R<sub>4</sub>.

And R<sub>4</sub>, R<sub>2</sub>, #1 value.

ex And R<sub>2</sub>, R<sub>2</sub>, #0xFF  
 Move R<sub>3</sub>, #0x5A.  
 Branch if [R<sub>2</sub>] = [R<sub>3</sub>] Found 2.

\* ) Shift & Rotate instructions.

\*) Logical Shift.

shifting left - LShiftL

shifting right - LShiftR

general form LShiftL Ri, Rj, Count.

shifting  $R_j$  bits to left by Count gives

\* ) Rotate opn.

#) Multiplication and Division

Multiply  $R_k \underline{R_i, R_j}$ .

$$R_k \leftarrow [R_i] \times [R_j]$$

2 n bit words after multiplying generate  
as large as  $2n$  bit words  
store the both n bit words, a  
adjacent register  $R_{(k+1)}$  is created.

Divide  $R_k, R_i, R_j$

$$\underline{R_j \leftarrow [R_j] / [R_i]}$$

Have quoted stored in R<sub>K</sub> & remainder in R(K+1)

### \* CISC Instrn Set

Complex instruction set computers

CISC instruction are not constrained to load / store architecture, arithmetic & logical operations can be performed only on operands that are in processor registers.

Instructions do not necessarily need to be fit into single word.

CISC uses two address format unlike the RISC 3 address format.

Ex operation defn, source,

Add B, A

$B \leftarrow [A] + [B]$ .

Load B is both source &  
Defn.

Ex  $C = A + B$

Move C, B

Add C, A.

Sum of numbers in CISC architecture,

```

move R2, N
clear R3
move R4, #1000
loop Add R3, (R4)+ ← Autoincrement to
subtract R2, #1 next number.
Branch >0 LOOP.
move sum, R3.

```

## \*) RISC and CISC styles.

- 1) RISC
  - 1) simple addressing mode.
  - 2) All instrn fitting in single word
  - 3) few instrn in instrn set.
  - 4) arithmetic & logical operations can be performed only on operands in processor register.
  - 5) fast execution. - uses pipelining
  - 6) load / store instrn don't allow direct transfer of from one memory to another
  
- 2) CISC
  - 1) Complex addressing mode.
  - 2) Instrn may span in multiple words
  - 3) Instrn implement complex tasks.
  - 4) Arithmetic & logical operations can be performed in memory operators as well as operand in processor register.
  - 5) Transfer from one memory to other using simple move instrn.
  - 6) Complex instrn req for complex task.

Dot product of vectors.

$$\text{Dot product} = \sum_{i=0}^{n-1} A(i) \times B(i)$$

Move R<sub>2</sub>, #AVEC.

Move R<sub>3</sub>, #BVEC.

Load R<sub>4</sub>, N

Clear R<sub>5</sub>

Loop: Load R<sub>6</sub>, (R<sub>2</sub>)

Load R<sub>7</sub>, (R<sub>3</sub>)

Multiply R<sub>8</sub>, R<sub>6</sub>, R<sub>7</sub>

Add R<sub>5</sub>, R<sub>5</sub>, R<sub>8</sub>

Add R<sub>2</sub>, R<sub>2</sub>, #4

Add R<sub>3</sub>, R<sub>3</sub>, #4.

Sub R<sub>4</sub>, R<sub>1</sub>, #1

Branch if (R<sub>4</sub>) > 0 (loop)

Store R<sub>5</sub>, DOTPROD.

RISC style

Now CISC style

Move R<sub>2</sub>, #AVEC.

Move R<sub>3</sub>, #BVEC.

Load R<sub>4</sub>, N

Clear R<sub>5</sub>

Loop Move R<sub>6</sub>, (R<sub>2</sub>)f,

Multiply R<sub>6</sub>, (R<sub>3</sub>)f

Move R<sub>5</sub>, R<sub>6</sub>

Subtract R<sub>4</sub>, #1

Branch if 0 (loop)

Move DOTPROD, R<sub>5</sub>.

## 4) Instruction cycle, Subcycle.

Page No.  
Date:  
YUVVA

Processor does these things.

- 1) Fetch instn : read from memory
- 2) Interpret instn : instn is decoded.
- 3) Fetch data : reading data from memory or I/O.
- 4) Process data : perform arithmetic & logical operation on data.
- 5) Write data : exn reg writing data to memory.

processor needs small internal memory for temporary storage.

## ALU



Internal Str<sup>n</sup> of CPU.

## v) Register organization

Registers in processor perform two roles.

user-visible Registers.

Control & Status Registers.

x) referenced by machine language that processor executes

Registers employed to control the operation of processor.

general purpose  
Data register.  
Address.

Condition codes (Flags)

Program Counter (PC)

Instruction register (IR)

Memory Address Register (MAR)

Memory Buffer Register (MBR)

## v) Program Status Word.

Contains status information.

Common Hints / flags are,

- 1) Sign.
- 2) Zero.
- 3) Carry
- 4) Equal
- 5) Overflow

- 6) Interrupt
- 7) Supervisor.

## Instruction cycle.

An instn cycle includes foll stages.

- 1) Fetch: Read the next instn from memory
- 2) Execute: Interpret opcode & perform indicated operation
- 3) Interrupt: If interrupt is enabled and an interrupt has occurred, save the current process state and service the interrupt

one additional stage Indirect cycle.

while executing instn memory access are required & if indirect addressing is used then additional memory access are required.

fetching of indirect addresses is taken as one more stage in Instn cycle.

first, an instn is fetched & then it is examined to see if any indirect address is present. If so the key operands are fetched using Indirect cycle.

Then following execution, an interrupt may occur then it is serviced. Before next instn to be executed.



\* ) Instn cycle Make diagram.



\* ) Data flow, fetch cycle.



## #] Instruction pipelining

|           |   |   |   |   |       |
|-----------|---|---|---|---|-------|
| M         | T | W | F | S | S     |
| Page No.: |   |   |   |   |       |
| Date:     |   |   |   |   | YOUVA |

In pipelining new inputs are accepted at one end before previously accepted inputs appear at the output end.

Instruction has no. of stages, ex 2 stage process



Consider dividing the instn processing in two stages, fetch instn & execute instn.

There are times in exec stage when main memory is not being used. This time can be used to fetch a new instn while in parallel to exec of curr one.

In Fetch stage it fetches & give instn to exec, while it is executing a new instn is fetched & stored in buffer, when exec frees it is given to it.

This is called instruction prefetch.

## #] More stages in the instn cycle. helps in pipelining.

- 1) Fetch instrn (FI) : fetch next in buffer
- 2) Decode instrn (DI) : determine the opcode & operand specifier.
- 3) Calculate operands (CO) : calculate address of operands
- 4) Fetch operands (FO) : fetch operand from memory
- 5) Execute instrn (EI) : perform opern.
- 6) Write operand (WO) : store the result in memory

Timing diagram for Instrn pipelining operation.



The more stages, better pipelining but this is disturbed due to 2 point.

- 1) At each stage there is some overhead involving in moving data from buffer to buffer, thus extending the time.
- 2) Amount of logic req to handle memory and register dependency.

\*) 6 stage instrn pipelining.



$$\text{Speedup factor} = \frac{nK}{K + (n-1)}$$

K = no of stages  
n = total instrn.

## Pipelining Hazards.

it means some portion of pipeline stops because conditions do not permit. This is also called pipeline bubble.

### 3 types of Hazards.

Resource  
Data  
control

1) Resource Hazard - A Resource Hazard occurs when two or (more) instrn that are already in pipeline need the same resource result is instrn are executed in sequential rather than parallel. For some part of pipeline.

2) Data Hazard. A Data Hazard occurs when there is a conflict in access of operand location. If two instrn are executed in sequence & both req particular memory or register. If they are run in sequence, no problem occurs. If run in parallel, then actual result & expected result differ.

3) Control Hazard - A Control Hazard occurs when the pipelining makes the wrong decision on a branch prediction & brings instrn into pipelining that must subsequently be discarded.

Intel 80486 pipelining has 5 stages

⑦ Data path, control signal, Hardwired.  
<sup>161</sup> Control signal, Hardwired.  
<sup>172</sup> microprogrammed control,  
<sup>173</sup>  
<sup>183</sup>

\* IR Hold the instrn until it is executed, completely open specified in instrn can be carried out by performing one or more of following.

- 1) Read content from memory loc & load into a processor register
- 2) Read data from one or more processor register
- 3) Perform arithmetic or logical operation & place result into a processor register
- 4) Store data from processor register into memory loc

\* Hardware components of processor.



- Registers consist of edge triggered flip flops, now added at edge of clock
- Complex opern in block can be broken down into simple step, performed by sub circuit in stages
- Multi-stage structure is advantageous for pipelined opern in RISC-style instrn set.

## # Hardware components

All instrn in RISC style processor can be executed in Five-stages,  
Hence Hardware may be organised in 5 stages.

### 1\*) Register file

- It has general purpose registers in it, which is small & fast memory block.
- It has array of storage elements with access circuitry that enables data to be read or written in the register.
- Access circuitry is designed in such a way that it can read two registers at same time.
- In this way two separate output port are available A), B. for reading
- It has two address port for selecting the two ports to be read.
- These inputs are connected to the IR which specify the register to be read

- Register file has data input port C by which the data to be written in the register is entered.
- For this there is one address line (Address C) which tells where to write data.

This type of Rfile is called as, dual-ported.



## 2) ALU.

ALU is to perform arithmetic & logical operation. ALU takes output of Register file A & B as input to the ALU unit & output of ALU unit, in input data C is entered into the Register.



Immediate value is used to add any immediate value to the port B .  
 if B has a value then i.e. immediate then that value is taken.

## #] Data path.

Instr'n processing is two stage | phase,  
 fetch phase & execution phase

The section that fetches instr'n is also responsible to decode and generate control signals that cause appropriate action to be taken.



instrn fetched in IR Stage 1 is placed in IR.  
It is decoded and source register is read in Step 2

Info in IR is used to generate control signal in all steps.

It is necessary to have inter stage registers to store info intermediate.

The stage 2 to 5 is called as Datapath.

- Data from registers are placed in Register RA & RB.
- This RA & RB gives input to ALU.  
ALU solves the opern & store result in RZ.
- for action such as Add no m action is taken in Stage 4  
At this the muxy selects RZ & transfers data to RF & loaded into data register in Stage 5 to Register file
- for load / store instrn off address of memory operand is computed & stored in RZ by ALU  
In case of load instrn the memory data read from memory is selected by muxy & placed in Reg RF to transfer to Reg file.

The data to be stored is placed in Reg RM from RB

- Interrupt processing they return address to be saved, another Reg RA is use for this purpose,  
by this answ by prod intpt calls for return address, it can be sent to Register file thru RF.

## Data path in processor

| M        | T | W | T | F | S     | S |
|----------|---|---|---|---|-------|---|
| Page No. |   |   |   |   |       |   |
| Date:    |   |   |   |   | YOUVA |   |



## #7 Control Signal

### control signal overview

- Processor hardware component governed by control signal, determining action such as multiplexer being ALU oper.
- Data flow in stages via inter-stage registers.
- Multiplexer select data for oper. in each stage influenced by control signal.

### multiplexer control

- Multiplexers (muxB, muxY, muxMA, muxC) are controlled by signals (B-Sel- $\downarrow$ , Y Sel- $\downarrow$ , MA\_Sel- $\downarrow$ , C-Sel- $\downarrow$ ) to choose input data.
- Two bits are needed to control muxC & muxY as each selects from three inputs.

### ALU opern

specify 2 $\ell$  opern.

- ALU opern are determined by K-bit control word (ALU-op) specifying operation as (Add, Sub, AND, OR)
- Comparators generate conditional signal, examined during condition branch for determined branch cond $\downarrow$ .

### Memory - Interface

- MEM-read & MEM-write are used to initiate the memory opern.
- IR-enable is used to allow reading new instr into register.

- MTC ~~to~~ signal is used when required opn is executed.

### .) Instr add generator

- INC Selct is used to determine the value to be added to PC (Const or branch offset)
- PC Selct ~~to~~ Select signal Updatr address or content of register.

### ii) Hardwired control

There are 2 ways for generating control signals.  
Hardwired control, microprogrammed control.

Step counter is used to keep track of the process of execution.

### .) Setting of control signal depends on:

- 1) Content of step counter.
- 2) Content of instr register.
- 3) Result of computation or comparison opern.
- 4) External input signals such as interrupt req.

- The instr decoder interprets op code, & additly mode into setting corres M/S signal.
- Step counter increments at end clock cycle, indicates the current execution step ( $T_1 - T_5$ )

control signal generator generates necessary control signal based on inputs.

- ex) New instrn is fetched from memory identified by signal T,  
MA-select signal set to 1,  
Setting the PC as memory ad., source R  
MEMread is activated.

IR is loaded with data from memory  
PC incremented by 4 preparing for next instruction

\* Datapath Control Signal

- Instrn decoder interprets IR content & source register controls become available at A<sub>2</sub> B<sub>3</sub>
- Other stage registers are also controlled. like RA RB etc.

# 2] Microprogrammed Control

Microprogrammed control is an alternative to Hardwired approach

Control signals are determined by program stored in a special memory called microprogram memory.

- Each step task has its own microprogram represented by series of bits, called microinstruction

| M        | T | W | T | F | S | S |
|----------|---|---|---|---|---|---|
| Page No. |   |   |   |   |   |   |
| Date     |   |   |   |   |   |   |

YOUNA

- Microinstrn are step by step instrn to conduct

### Microprogram memory

- Microprogram are stored in small & fast memory called microprogram memory
- This memory has all instrns needed for diff steps in various tasks
- Execution of program depend the microinstrn
- Microinstrn address generator helps find the proper microinstrn in memory.
- Microroutine is the sequence of executing the instrn.
- This approach is slower than Handwired approach.  
It is more flexible than Handwired.

RISC-V uses Handwired control.

## Control unit operations.

on 20, 21

Staltry

### \* Micro operations.

Each instrn cycle is made up of smaller units, as fetch, indirect, execute, interrupt - These are the micro-operations.

each these steps involve processor registers, each instrn is executed in an instrn cycle, so each cycle has subcycles as fetch, indirect, etc.

### \* Fetch cycle

It is at the start of instrn to be executed.

It has 4 Registers.

1) MAR (Memory add. reg.) (connected to address lines)

2) MBR (Memory Buffer reg.) (connected to data lines)

3) Program Counter (PC) (Hold the address of next instrn to be fetched)

4) Instruction Register (IR) (Hold the last instrn fetched)

They work in 3 steps.

$$T_1 : \text{MAR} \leftarrow \text{PC}$$

$$T_2 : \text{MBR} \leftarrow \text{Memory}$$

$$\text{PC} \leftarrow (\text{PC}) + 1.$$

$$T_3 : \text{IR} \leftarrow (\text{MBR})$$

\*) Add of next instrn to be exerted is in PC  
Step 1 is to move the content of PC to MDR  
Step 2 One address in MAR is put on data lines, & instrn is fetched in MDR by Read. Instrn

- \* The content in memory appear on data lines and put on MBR
- \* Meanwhile the PC is increased for next instrn to execute
- \* Third step is move data from MBR to IR & set up the MBR.

## \* Indirect cycle.

If instrn specifies on indirect address then an indirect cycle runs before execute cycle

This indirect add is transferred to MBR, in T<sub>1</sub>. Then memory from memory is send in MBR in T<sub>2</sub>.

Now in T<sub>3</sub> the MBR address is stored in IR.

T<sub>1</sub>      MBR  $\leftarrow$  IR (Address)  
T<sub>2</sub>      MBR  $\leftarrow$  Memory  
T<sub>3</sub>      IR (Add)  $\leftarrow$  MBR (Add)

## \* Interrupt cycle.

- \* At end of execute cycle, a check is made to see if any intx. occurred,

it has 3 steps.

- T<sub>1</sub> : MBR  $\leftarrow$  (PC)  
T<sub>2</sub> : MAR  $\leftarrow$  save-Add.  
PC  $\leftarrow$  Routine-Add  
T<sub>3</sub> : Memory  $\leftarrow$  MBR

} Saving curr. state of prog  
↓ Story of interrup

## \* Execute cycle

EC can vary for instrn to instrn, many varieties

So, control unit examines the opcode of instrn to generate the sequence of micro-operations  
This is instrn decoding.

ex ADD R1, X

add content of locn X into Register R1

- T<sub>1</sub> : MAR  $\leftarrow$  IR (addr)  
T<sub>2</sub> : MBR  $\leftarrow$  Memory,  
T<sub>3</sub> : R<sub>1</sub>  $\leftarrow$  (R<sub>1</sub>) + (MBR)

## #) Instruction cycle code

(Instrn cycle code) It is a 2bit register which decides, in which sub cycle the instrn is,

00 Fetch.

01 Indirect

10 Execute

11 ?

## \* control of processor

functional requirement of the control unit are the fns that the CU must perform

3 step process leads to characterization of CU.

- 1) Define basic elemnts of processor
- 2) Describe micro operations
- 3) Determine fns of m cu.

1) Basic functional elements:

ALU, Register, datapath, CU.

2) micro opns.

- 1) Data transfer, register to another register to external, external to register.

Perform ALU opns. on data in registers

\* Control signals in m afe.

- 1) Clock,
- 2) IR
- 3) Flags
- 4) Control signal for control Bus.
- 5) Control signal within processor
- 6) Control signal to control Bus.

## a) Intel 8085 organization



- Studying DO
- \* Data path, control signal for RISC-V instrn  
 ADD, LOAD, etc.
- 4.72 - 236 - 262
- \* Introduction:
- Core RISC-V impl instrn & th:
- 1) memory reference instrn load doubleword (ld)  
store doubleword (sd)
  - 2) arithmetic logic opern add, sub, and, or.
  - 3) conditional branch instrn branch if equal (beq)
  - \* Every instrn has two basic steps.
    - 1) Send the PC to the memory that has code & return the instrn.
    - 2) Read one or two registers using fields of the instrn to select the registers.
  - \* After this there are 3 instruction classes.  
(memory - reference, Arithmetic - logic, branches)

Studying the datapath flow done previously for better grip

\* RISC-V instrn set architecture

DO Chapter

Patterson

4.1 - 4.4