

## CSE 340

Course Name : Computer Architecture.

Rejwan Shaf

23241108

Fall 2023

Faculty: Sumaiya Tanjil Khan (STK)

email : ext. sumaiya.tanjil @ bracu.ac.bd.

### Consultation :

• Sunday  $\Rightarrow$  12:30 - 01:50 pm

$\Rightarrow$  3:30 - 4:50 pm

UB80909

• Tuesday  $\Rightarrow$  9:30 - 11:00

3:30 - 5:00

• Thursday  $\Rightarrow$  11:00 - 02:00 pm

### Mark Distribution:

• Attendance  $\Rightarrow$  5%

• Quizzes  $\Rightarrow$  15% (best 3 out of 4/5)

• Assignment  $\Rightarrow$  15%

• Mid  $\Rightarrow$  25%

• Final  $\Rightarrow$  40%

Raw mark

• Course FE (Theory + Math) based

### Midterm Syllabus:

- Chapter- 1
- Chapter- 2
- Chapter- 3

### Final Syllabus:

- Chapter- 4

## Chapter - 1

### Computer Abstraction and Technology

#### The Computer Revolution:

##### • Progress in Computer technology.

■ Underpinned by Moore's Law \*

→ If you have a chip of small area and we have n transistors in the chip. Then in the next 2 years, same area of

chip  $\rightarrow 2^n$  transistor 2008 (fit 200 Mhz)

⇒ The number of chips present in a single chip would get double in every 2 years.

##### • Makes

feasible.

⇒ Computer in automobiles.

⇒ Cell phones.

⇒ Human Genome Project

⇒ World wide web

⇒ Search Engine

##### • Computers are pervasive

⇒ প্রাক্তনীর প্রযুক্তি যা, বর্তমানে carry করে এসে

## Different Types of Computers

### Desktop Computer:

- ⇒ General purpose, variety of software.
- ⇒ Subject to cost / performance trade off.
- ⇒ One machine, one user at a time

} Varies on the performance you want or on your budget.

### Server Computer:

- ⇒ Network based
- ⇒ High capacity, performance, reliability
- ⇒ Range from small servers to building sized
- ⇒ One machine, multiple user, at a time.

### Embedded Computer:

- ⇒ Hidden as component of systems.
- ⇒ Stringent power / performance / cost constraints.
- ⇒ Built for specific reason
- ⇒ Particular tasks meant to create I/O.  
Ex: micro-oven, washing-machine.

### Cloud Computer:

- ⇒ Server की तरफ से डेटा।
- ⇒ नेटवर्क उपयोग करते हैं, उदाहरण: Colab
- ⇒ Need internet.

### Single Core:

- One CPU
- One Chip



### Multi-Core

- Many CPU
- One chip



### Many-Core

- Many - Core
- Many CPU



• each proc sharing main memory & peripherals.

### Understanding Performance

Performance depends on three things:

- ✓ • Algorithms.
- ✓ • Programming Language, Compiler, Architecture.
- ✓ • Processor and memory systems.
- ✓ • I/O Systems

✓ Algorithms:  
⇒ Determines numbers of operations executed

✓ Programming Language, Compiler, Architecture:  
⇒ Determines numbers of machine instructions.

⇒ Determines numbers of machine instructions per operation

✓ Processor and memory Systems:  
⇒ Determines how fast Instructions are executed

## I/O System

⇒ Determines how fast I/O Operations are executed.

# I/O ~~ans~~ input / output.

## The Processor Market

- Traditional feature phone ↗ demand ~~2010~~
- Smartphones, Tablets ↗ demand ~~PC~~ ~~laptop~~
- ⇒ Smartphone ↗ demand ~~tablet~~ ~~laptop~~

## Learning Outcomes:

- Procedure of translating programs into machine code.  
⇒ And how hardware executes them.
- Hardware / Software interface.
- Determining factor of program performance, and how to improve.
- How hardware designers improve performance.
- What's parallel processing.

# "Increase CPU performance, decrease execution time."

~~Execution / CPU time  $\propto$   $\frac{1}{CPU}$  performance~~

Below your program. ★★

### ① Application Software :

⇒ Written in high level language.

### ② System Software.

- Compiler

⇒ Translates high level language to machine code.

- Operating System

⇒ Manages memory and storage scheduling tasks.

### ③ Hardware

⇒ Processor, memory, I/O controllers.



## Levels of Programme Code

### • High level language:

- ⇒ Almost similar to human language
- ⇒ Level of abstraction closer to problem domain.
- ⇒ Provides for productivity and portability.



### Assembly Language: (MIPS - 32 bit architecture)

- ⇒ Close to machine language.
- ⇒ Textual rep. of instructions.

$$c = a + b$$

$$\text{add } \$c, \$a, \$b$$

### Hardware representations,

- ⇒ Uses machine language
- ⇒ Binary digits (bits).
- ⇒ Encoded instructions and data.

### Components of Computers:

- Same component for all kind of computer.
  - ⇒ Desktop, Server, embedded
- Input / Output devices
  - ⇒ UI device.  
display, keyboard, mouse.
- ⇒ Storage devices,  
Hard disk, CD-DVD, SSD, flash.
- ⇒ Network adapters.

## Inside the Processor (CPU):

- Datapath
- Control
- Cache memory

### Datapath:

memory এবং নির্দিষ্ট data store করার্থী; input, output

বেসে, input-output যেকোন data memory তে যাবার

data display করতে পারে এবং memory যেকোন আউ

output device এ অন্যের, কোরা instruction

execute করতে পারে তাহলে ALU কা

অন্য উকুমা unit এ স্মাৰ্ট, ALU computer memory

যেকোন instruction পার্শ্বে,

data গুলো এবং এক তাফুণ যেকোন অন্য উকুমা যাবার,

এখনো ইন্ডেক্স datapath এর সাহিত রাখ, একক

data datapath এর সাথীর স্বতন্ত্র connected

প্রতিকোণী unit এর স্বতন্ত্র "data" pass হয়

data path এর সাথীরে,

Performs operation on data

⇒ Combination of devices and their connecting path.  
Devices communicate through this path.

## Control:

control:  
⇒ Controls the sequence of operations, sequence of instruction execution

[Which instruction should go where; how it should execute, which hardware it should access]

⇒ Control Unit controls any operation.

## Cache Memory :

```

graph TD
    CM[Cache Memory] --> CPU[CPU]
    CM --> RAM[RAM]
    CPU --> IE[instruction execute]
    IE --> D[data]
    D --> CM
    CM --> CPU
    CPU -- feedback --> IE
    CPU --> I[instruction]
    I --> EX[execute]
    EX --> D
    D --> SRAM[SRAM memory]
    SRAM --> A[for immediate access of data]
  
```

→ Very very small, very expensive bcz they  
are very fast you can access data

from the memory devices. That's why they are inside the processors. bcz processor operates very quickly

- The bigger the cache memory the more beneficial it is for the processor.

\* AMD Barcelona was a 4 Cores Processor.

### Abstractions: ★★

- Hides the lower level detail from the user and makes the higher level detail visible to the user.
- Helps us deal with complexity.
  - ⇒ Hides lower-level detail.

### Instruction Set Architecture (ISA)

- ↳ The hardware / software interface.
- ↳ instruction set for a particular architecture.
- ↳ allows high level programmer to be translated into assembly instruction and then translates into machine instruction

### Application binary Interface (ABI)

- The ISA plus system software interface
- Allows the system to identify where the data is going to be stored in the registers; how much data will be transferred

## • Implementation.

⇒ The details underlying and interface

### Definitions:

#### Computer architecture:

ज्ञान दर्शन के लिए नियम विषय.

- hardware.

- Software.

ज्ञान दर्शन के लिए नियम विषय.  
hardware, software ज्ञान दर्शन के लिए नियम विषय.  
ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.  
ज्ञान दर्शन के लिए नियम विषय.  
ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.

ज्ञान दर्शन के लिए नियम विषय.

## ISA: \*\*\*

- ⇒ An abstract interface between hardware and software (lower level) of a machine.
- ⇒ That encompasses all the necessary information.
- ⇒ To write a machine language program.
- ⇒ That will run correctly, including instructions, registers, memory access, I/O and so on.
- ⇒ assembly language  $\rightarrow$  instruction set  $\rightarrow$  high level language  $\rightarrow$  hardware level  $\rightarrow$  ISA.
- ⇒ general architecture  $\rightarrow$  specific instruction set.

## Datapath: \*\*

- ⇒ datapath is a set of functional (ALU, Multiplier) unit, that carries out data processing.
- # Central part of many CPU.
- ⇒ datapath along with control unit, make up of the CPU, that largely regulates interaction between the datapath and data itself.
- ⇒ Datapaths are usually stored in registers or memory.

# We can create large datapath by joining more than one datapath.

⇒ CPU ଏବଂ ଅନ୍ୟ ସିଙ୍ଗର component ଏବଂ ଅନ୍ୟ data ଅଗାମ ପ୍ରକାର କଥାର ଅବଶ୍ୟକ ପଥ ନେତ୍ରର ଜ୍ଞାନ ଦ୍ୱାରା

datapath.

# datapath CPU ଏବଂ ଅନ୍ୟକୁ component ଜ୍ଞାନ ଦ୍ୱାରା connected

### Memory Hierarchy:

⇒ Used in computer architecture while discussing performance issues in computer architectural design, algorithm prediction and lower level programming construct.

⇒ memory hierarchy ରେଖା କଣ୍ଠେ response time ରେ parameter

ଫିଲ୍ଡର ଦେଖିବାରେ

⇒ "response time ରେ କାହାର memory କେଉଁ  
efficient"

⇒ ସମ୍ପଦ୍ୟ efficient memory ରେ  
hierarchy କେଉଁ କାହାର.

⇒ register and cache ସମ୍ପଦ୍ୟ କୋଣି.



- Reg
- Cache
- RAM
- USB, Flash
- Harddrive.
- Tape backup

efficiency वाढ़े  
CPU time घटें  
cost, size घटें,

# दूसरे memory को access करते समय कमीवन्न लाता

ज्यादा size के लिए चाहे यह, Cache बेसिंग expensive, Cache लिए होने better.

# main target इसे CPU efficiency बढ़ावा,

### Multiprocessor एवं जुड़विए

- Simultaneously कार्रा करते होते हैं.
- more efficient than single core processor.
- improves computing speed, performance and cost-effectiveness
- Multi processor:
  - ↳ tightly coupled computer system having two or more processors. each sharing main memory and peripherals.

# instruction दूसरे processor को भेज

କାହାର ଦ୍ୱାରା, କୌଣସି ଏବନାରେ କୌଣସି ଯାଇ ଯାଇ  
କାହାର କାହାର chunk କିମ୍ବା feature କିମ୍ବା information କିମ୍ବା chunk  
କିମ୍ବା dependent କିମ୍ବା

## Role of Computer Architect

### • look backward

- Analyze and evaluate the past.

### • look forward.

- Be the dreamer, be creative (Future design କରିବା)

### • look up:

- Understand important problem and their nature.

### • look down:

- Understanding the capabilities of underlying technology.

- future technology, or future

design କରିବାର ପାଇଁ

# execution time  $\rightarrow$   $\rightarrow$  CPU performance  $\rightarrow$   
 faster processor  $\rightarrow$  old processor replace  
 multiple processor add  $\rightarrow$   $\rightarrow$ ,  
 response time part of throughput  $\rightarrow$ ,  
 which is a good thing.

### Response time:

$\Rightarrow$  task complete  $\rightarrow$  time  $\rightarrow$ ,

### Throughput:

Total work done per unit time

# CPU time  $\rightarrow$  execution time same.

|                                       |                                                 |
|---------------------------------------|-------------------------------------------------|
| • Performance, $P = \frac{1}{ET}$     | Execution time.                                 |
| • $\frac{P_A}{P_B} = \frac{E_B}{E_A}$ | $= \frac{\text{CPU time}_B}{\text{CPU time}_A}$ |

• Time period,  $T = \frac{1}{f} \rightarrow$  frequency

• Frequency,  $f = \frac{1}{T}$

Time taken to run a program 10 s on A, 15 s on B.. Which processor is fast

$$\Rightarrow \frac{E_B}{E_A} = \frac{15}{5} = 1.5$$

value of denominator is 3 times.

∴ A is 1.5 times faster than B

Elapsed time is the total response time.

∴ Elapsed time = Response time.

CPU time.

↪ Total time spent on a given job.

# of clock cycles (Clock) complete work after time

↪ Time (Clock) period T.

↪ duration of a clock cycle.

Clock frequency → Cycles per second

$$f = \frac{1}{T}$$

### Formulas:

$$\text{CPU time} = \text{CPU Clock Cycles} \times \text{Clock Cycle Time}$$

$$= \frac{\text{CPU Clock Cycles}}{\text{Clock rate}}$$

$$\left| \begin{array}{l} \text{clock cycle time} \\ = \frac{1}{\text{Clock rate}} \end{array} \right.$$

Performance can be improved by,

- 1. Reducing number of clock cycles.
- 2. Increasing Clock rate (decreasing clock cycle time)
- 3. Must often trade off clock rate ~~vs~~ against cycle count.

Computer A: 2GHz clock, 10s CPU time.

Designing Computer B.

- Aim for 6s CPU time.
- Causing computer B to require 1.2 times as many clock cycles as computer A for this program.
- Can do faster clock, but causes 1.2 \* Clock Cycles

How fast Computer B's Clock would be?

(Clock rate  $\underline{\underline{5.76 \text{ GHz}}}$ )

$$\Rightarrow \text{Clock rate}_B = \frac{\text{number of Clock Cycles}_B}{\text{CPU time}_B}$$

$$= \frac{1.2 * \text{number of Clock cycles}_A}{\cancel{\text{CPU}}} \quad 6$$

$$\text{Clock cycle}_A = \frac{\text{CPU time}_A \times \text{clock rate}_A}{= 10 \text{ s} \times 2 \text{ GHz}}$$

$$= 20 \times 10^9$$

$$\therefore \text{Clock rate}_B = \frac{1.2 * 20 \times 10^9}{6} = 4 \text{ GHz}$$

"CPU time ~~is~~ ~~not~~ ~~so~~ better"

CPI  $\Rightarrow$  clock cycle per instruction  
 execute ~~not~~ ~~so~~ ~~so~~ clock  
 ↳ ~~one~~ instruction  
 cycle ~~one~~.

At "ISA same ~~as~~ instruction count same."

$$\text{clock cycles} = \text{instruction count} \times \text{CPI}$$

$$\text{CPU} = \text{Clock Cycles} * \text{Clock cycle time}$$

$$= \text{instruction count} \times \text{CPI} * \text{clock cycle time}$$

$$= \frac{\text{instruction count} \times \text{CPI}}{\text{clock rate}}$$

Computer A: Cycle time = 250 ps , CPI = 2.0

Computer B: Cycle time = 500 ps x CPI = 1.2

Same ISA.

Which is faster and by how much?

$$\Rightarrow \text{CPU time}_A = \text{instruction count} \times \text{CPI}_A \times \text{Clock cycle time}_A$$
$$= 1 \times 2.0 \times 250 \times 10^{-12} \text{ s}$$
$$= 500 \times 10^{-12} \text{ s}$$

$$\text{CPU time}_B = \text{instruction count} \times \text{CPI}_B \times \text{Clock cycle time}_B$$
$$= 1 \times 1.2 \times 500 \times 10^{-12}$$
$$= 600 \times 10^{-12} \text{ s}$$

$$\therefore \frac{\text{CPU time}_B}{\text{CPU time}_A} = \frac{600 \times 10^{-12}}{500 \times 10^{-12}} = 1.2$$

CPU A is 1.2 times faster than B.

$\Sigma$  different instruction classes  $\Sigma$  clock cycles.

$$\boxed{\text{Clock Cycles} = \sum_{i=1}^n (\text{CPI}_i \times \text{instruction count}_i)}$$

Weighted average CPI,

$$\boxed{\text{CPI} = \frac{\text{Clock Cycles}}{\text{instruction count}} = \sum_{i=1}^n (\text{CPI}_i \times \frac{\text{instruction count}_i}{\text{instruction count}} \times \text{Relative frequency})}$$

- Alternative compiled code sequences using instructions in classes A, B, C. Find out Clock cycles and Avg. CPI.

| class            | A | B | C |
|------------------|---|---|---|
| CPI for class    | 1 | 2 | 3 |
| IC in sequence 1 | 2 | 1 | 2 |
| IC in sequence 2 | 4 | 1 | 1 |

$\Rightarrow$  Sequence - 1:

$$\begin{aligned} \text{clock cycles} &= (1 \times 2) + (2 \times 1) + (3 \times 2) \\ &= 10 \end{aligned}$$

$$\therefore \text{Avg CPI} = \frac{10}{5} = 2.0$$

Sequence - 2

$$\begin{aligned} \text{clock cycles} &= (4 \times 1) + (1 \times 2) \\ &\quad + (3 \times 1) \\ &= 9. \end{aligned}$$

$$\therefore \text{Avg CPI} = \frac{9}{6} = 1.5$$

- Performance depends on:
  - ⇒ Algorithm: affects IC, Possibly CPI.
  - ⇒ Programming language: affects CPI, IC.
  - ⇒ Compiler: Affects CPI, IC
  - ⇒ ISA: affects, IC, CPI, T<sub>c</sub>

Transistor or Capacitive load

$$\boxed{\text{Power} = \underbrace{\text{Capacitive load}}_{\text{Transistor or Capacitive load}} \times \text{Voltage}^2 \times \text{Frequency}}$$

- # CPU time ~~is~~ better, CPU time ~~is~~ better; which is good.
- # Capacitive load ~~is~~ better.
- # CPU to multiprocessor ~~is~~ better; as, ~~one~~ time ~~is~~ processor switch ~~one~~ ~~time~~ simultaneously ~~one~~ ~~time~~.
- # multiprocessor  $\cup$  CPU time ~~is~~ ~~not~~ ~~good~~, Processing time ~~is~~ ~~not~~ ~~good~~.

Suppose a new CPU has, 85% of ~~capabilities~~ capacitive load of old CPU and 15% voltage and 15% frequency reduction?

$$\Rightarrow \frac{P_{\text{new}}}{P_{\text{old}}} = \frac{C_{\text{old}} \times 85\% \times (V_{\text{old}} \times 85\%)^2 \times (F \times 85\%)}{C_{\text{old}} \times V_{\text{old}}^2 \times F_{\text{old}}}$$

$$= (0.85)^4 = 0.52.$$

∴ There is 42% power reduction after.

(P.T.O)

## RISC

- Reduced Instruction Set Computer
- Instruction execute in only one clock cycle.
- Variable instruction size
  - RISC vs CISC
- Requires Pipelining
- Hardware Controlled.
- Complexity pushed to the compiler.
- Instruction set is orthogonal (little overlapping of instruction functionality)
- Instruction interface with memory via fixed mechanism

## CISC

- Complex Instruction Set Computer.
- Instruction take more than one clock cycle to execute.
- Fixed instruction format and uniformed length instruction
- No pipelining.
- Micro code Controlled.
- Work well with simpler compiler
- Upward compatibility within a family.
- Instruction interface with memory in multiple ways mechanisms with complex addressing modes.

# MIPS architecture RISC vs under ⤵

For a specific machine, total instruction count =  $2 \times 10^9$ , Avg CPI = 3 cycles/instruction, Clock rate = 100 MHz. We want to improve the compiler to have instruction count =  $10^9$ , new CPI = 4 cycles/instruction, clock rate = 250 MHz. By what percentage is the new compiler improved?

⇒ Initially,

$$\text{CPU time} = \frac{ic * \text{CPI}}{\text{clock rate}} = \frac{2 * 10^{-9} * 3}{100 * 10^6} = \cancel{600} * 10^{-20} \quad 60$$

After,

$$\text{CPU time} = \frac{10^9 * 4}{250 * 10^6} = 16$$

$$\therefore \% \text{ of improvement} = \left( \frac{\text{CPU time for comp}_A - \text{CPU time for comp}_B}{\text{CPU time for comp}_A} \right) \times 100 \\ = 73.33\%$$

Assume, a new CPU has 6x less capacitive load compared to an existing CPU and has an 2% reduction in voltage ~~and frequency~~. Now calculate the reduction in overall power use.

$$\Rightarrow \frac{P_{\text{new}}}{P_{\text{old}}} = \frac{(0.94 \times \text{capacitive load}) \times (0.98 \times \text{voltage})^2 \times \text{frequency}}{\text{capacitive load} \times \text{voltage}^2 \times \text{frequency}}$$

$$= 0.902.$$

$P_{\text{new}}, P_{\text{old}}$  ~~is~~ is 0.902 times improve  
power reduce 26.8%,

$$\therefore \text{Reduction in power} = (1 - 0.902) \times 100\% \\ = 9.8\%.$$

Spec Ratios

|                  |      |                        |      |
|------------------|------|------------------------|------|
| # Reference time | 0.12 | current execution time | 0.12 |
| ratio            | 1    | spec ratio.            |      |

$$\text{Spec Ratio} = \frac{\text{Reference Time}}{\text{Execution Time}}$$

$$\text{Geometric mean} = \sqrt[n]{\prod_{i=1}^n \text{Spec ratio}}$$

$$\therefore \sqrt[2]{SR_1 \times SR_2}$$

$$\sqrt[3]{SR_1 \times SR_2 \times SR_3}$$

Geometric Mean

# Multiple processor এবং সিঙ্গেল-processor কে ৩টা  
ক্ষেত্রের প্রযোজন করা হচ্ছে।  
প্রথম ক্ষেত্রের স্পেস রেটি সমান।

# execution time ratio এবং Spec ratio same.

Given 2 programs,

Program A, instruction count = 10, CPI = 3, clock cycle = 3s,

Reference time = 500 s

Program B, instruction count = 50, CPI = 7, clock cycle = 3s

Reference time = 2000 s

Find spec ratio for each program individually? also

find the geometric mean.

Program A

CPU time = IC \* CPI \* Clock cycle

$$= 10 \times 3 \times 3$$

$$= 90 \text{ s}$$

$$\therefore \text{Spec ratio} = \frac{500}{90}$$

$$= 5.556$$

Program B

CPU time = IC \* CPI \* Clock cycle

$$= 50 \times 7 \times 3$$

$$= 1050 \text{ s}$$

$$\text{Spec ratio} = \frac{2000}{1050}$$

$$= 1.9047$$

$$\therefore \text{Geometric mean} = \sqrt[2]{\text{Spec ratio}_A \times \text{Spec ratio}_B}$$

$$= \sqrt[2]{5.556 \times 1.9047}$$

$$= 3.2537.$$

Am dahl's Law:

⇒ improving an aspect of a computer - and expecting a proportional improvement in overall performance.

"**ବ୍ୟକ୍ତିଗତ ଏବେ ପରିବାରକାରୀ** specific feature ~~software~~ ପରିଯୋଜନ -  
improve କାରି ପାଇଁ overall program ପାଇଁ  
କାରଣ୍ଡାରୀ improvement ହେଲା ଅଛି ଅନୁମତି ଆମ୍ବଧାଳୀ's law  
କାରାରୀ calculate କାରି,

$$T_{\text{improved}} = \frac{T_{\text{affected}}}{\text{improving factor}} + T_{\text{unaffected}}$$

multiply accounts for  $\frac{80}{100}$ .  
 active / affected time  
 Total time.

How much improvement in multiply performance to get  
 5x overall?

$$\Rightarrow \frac{100}{5} = \frac{80}{n} + 20$$

$$\Rightarrow 20 = \frac{80}{n} + 20$$

$\Rightarrow$  can't be done → program is

by the way, improvement factor n is value  $\neq$  negative,  $\infty$ , or 0  $\exists$  all

5x improve  $\Rightarrow$  impossible

$\Rightarrow$  100s execution time is after 80s 80s 80s  
 after idle after 20 second idle after 20s.

instead of 5x improvement  $\Rightarrow$  2x improvement

260,

$$\frac{100}{2} = \frac{80}{n} + 20$$

$$\Rightarrow 50 = \frac{80}{n} + 20$$

$$\Rightarrow \frac{80}{n} = 30$$

$$\Rightarrow n = 2.67.$$

$\therefore$  improvement factor = 2.67.

Now assume that, a particular operation takes  $2.5\% \text{ of}$   
 the total execution time. What improvement is required  
 if we want 2.5 times speedup in that operation,  
 where  $\alpha$  is equal to 10.

$$\Rightarrow \frac{100}{2.5} = \frac{(100 \times 2.5 \times 10)}{n} + 75$$

$$\Rightarrow 40 = \frac{25}{n} + 75$$

$$\Rightarrow n = -0.7142$$

value negative error  
 → improve error  
 not fit.

∴ Since the improvement factor  
 is negative, we can't bring this  
 improvement.

Suppose you are training a face recognition model,  
 which is heavily dependent on a process (80%). So,  
 you installed a graphics card with to speed up  
 that process. Now, you observe that it is taking  
 only 3 days to execute, as opposed to 6 days  
 before installing the card. What is the  
 improvement?

$$\Rightarrow \text{improved time} = 3 \text{ days}$$

$$\text{Affected time} = 6 \times 0.8 = 4.8$$

$$\text{Unaffected time} = 6 - 4.8 = 1.2$$

percentage fixed  $\rightarrow 100 = 80 + 20$   
 $\Rightarrow \frac{100}{2} = \frac{80}{n} + 20$   
 $\Rightarrow n = 2.67.$

$$\therefore 3 = \frac{4.8}{n} + 1.2$$

$$\Rightarrow 1.8n = 4.8$$

$$\Rightarrow n = 2.67.$$


---

MIPS as a performance Metric.

- MIPS  $\Rightarrow$  Millions of instructions per second
- 1 MIPS  $\Rightarrow$  1 million instruction execute ~~20ns~~.
- $2^{20}$  MIPS  $\Rightarrow$  ~~20ns~~ unit.
- doesn't account for:
  - different ISA
  - difference in complexity between instructions

$$\text{MIPS} = \frac{\text{instruction count}}{\text{Execution time} \times 10^6}$$

$$= \frac{\text{instruction count}}{\text{instruction count} \times \text{CPI} \times 10^6}$$

clock rate

$$= \frac{\text{clock rate}}{\text{CPI} \times 10^6}$$

# The Von Neumann Model ★★★

- Hardware based architecture.

• Data memory & instruction memory same memory <sup>to</sup>  
मेमोरी, As a result, <sup>जैसे</sup> time <sup>में</sup> simultaneously  
data डाटा instruction access करते हैं। <sup>मिलते हैं।</sup>

- Also called "stored program computer." (instruction in memory).

• Sequentially programme run  
<sup>जैसे</sup>

## ■ Stored program:

- ⇒ instruction stored in a linear memory array
- ⇒ Memory is unified between instruction and data

The interpolation of a stored value depends  
on the controlling signals

## ■ Sequential instruction processing.

- ⇒ One instruction ~~referred~~ processed at a time.
- ⇒ Program counter (PC) identifies the current instruction that is being executed

⇒ PC is advanced sequentially except for.  
control transfer instruction.

if else condition  $\Rightarrow$  ZTS100 check za.  
 1. if : if condition satisfy  $ZT^P$  1st MB 2  
 2. add then 3  $ZT^M$  ---  
 3. Z  
 4. else: for 3, if condition satisfy  $ZT^P$   
 5. multiplication then 4  $ZT^M$  else 5  $ZT^P$  ON2R  
 $ZT^M$ ,  
 $ZT^P$



Figure: Von Neumann Architecture.

## The Harvard Architecture

- Data memory  $\Rightarrow$  instruction memory  $\Rightarrow$  access  
(shared  $\Rightarrow$ ).  
↳  $\Rightarrow$   $\Rightarrow$  data  $\Rightarrow$  instruction  $\Rightarrow$  access  
রয়েছে মানুষ, কিন্তু কোথা Hazard থাকে না.
- This architecture have separate physical storage  
and signal pathways for instruction and data.



Figure: Harvard Architecture.

- Harvard architecture: computer can thus be faster for a given circuit complexity because instruction fetcher and data access do not contend for a single memory pathway.
- ~~Data~~ ~~data~~ memory to data share, characteristics share not mandatory.
  - 1. (Word width, timing, implementation, technology and memory access structure can differ).
- Instruction memory is often wider than data memory.
- In some system instruction can be stored in read-only memory while data memory generally requires read-write memory.

Calculate the execution time of the programs and SPEC ratio from table below as well as the summary benchmark point using geometric mean?

| Benchmark Program | Instruction count $\times 10^9$ | CPI  | Clock rate (GHz) | Reference time (seconds) |
|-------------------|---------------------------------|------|------------------|--------------------------|
| G0                | 1970                            | 1.10 | 3.2              | 604.90                   |
| Hmmer             | 2910                            | 0.6  | 3.2              | 933.0                    |

⇒

G0

$$\text{execution time} = \frac{1970 \times 10^9 \times 1.10}{3.2 \times 10^9}$$

$$= 677.1875$$

$$\therefore \text{Spec ratio}_{G0} = \frac{10490}{677.1875}$$

$$= 15.49053$$

Hmmer

$$\text{execution time} = \frac{2910 \times 10^9 \times 0.6}{3.2 \times 10^9}$$

$$= 545.625$$

$$\text{Spec ratio}_{Hmmer} = \frac{9330}{545.625}$$

$$= 17.09965$$

$$\therefore \text{Geometric mean} = \sqrt[2]{15.49053 \times 17.09965}$$

$$= 16.2752$$

## ★ Chapter - 1 Formula's ★

- $P \propto \frac{1}{\text{Execution time}}$

- $\frac{P_A}{P_B} = \frac{E_B}{E_A} = \frac{\text{CPU time}_B}{\text{CPU time}_A}$

- CPU time or execution time =  $\text{CPU clock cycles} * \text{Clock Cycle time}$
- $= \frac{\text{CPU clock cycles}}{\text{clock rate}}$
- $= \frac{\text{instruction count} * \text{CPI}}{\text{clock rate}}$
- $= \text{instruction count} * \text{CPI} * \text{Clock Cycle time.}$

- Clock Cycle time =  $\frac{1}{\text{clock rate}}$

- Same ISA ~~26m~~ instruction count same.

- Clock Cycles =  $\sum_{i=1}^n (\text{CPI}_i * \text{instruction count}_i)$

- Weighted average CPI.

$$\begin{aligned} \text{CPI} &= \frac{\text{Clock Cycles}}{\text{instruction count}} \\ &= \sum_{i=1}^n (\text{CPI}_i * \underbrace{\frac{\text{instruction count}_i}{\text{instruction count}}}_{\text{Relative frequency}}) \end{aligned}$$

- Power = Capacitive load  $\times$  Voltage<sup>2</sup>  $\times$  Frequency

- Spec ratio =  $\frac{\text{reference time}}{\text{execution time}}$

- Geometric mean =  $\sqrt[n]{\prod_{i=1}^n \text{Spec ratio}_i}$

- Amrdhal's law,

$$T_{\text{improved}} = \frac{T_{\text{affected}}}{\text{improvement factor, } n} + T_{\text{unaffected}}$$

- MIPS =  $\frac{\text{instruction count}}{\text{execution time} \times 10^6}$

$$= \frac{\text{Clock rate}}{\text{CPI} \times 10^6}$$

- Cost per die =  $\frac{\text{cost per wafer}}{\text{Dies per wafer} \times \text{Yield}}$

- Dies per wafer  $\approx$  Wafer area / Die area

- Yield =  $\frac{1}{(1 + (\text{Defects per area} \times \text{Die area}/2))^2}$

Integrated Circuit cost