

Comp Arch 3 - InnoClassify Computers:

We Want

|                             |
|-----------------------------|
| PayLynch.000.webhostapp.com |
| User: ft2814                |
| Pass: paul2814              |



① How do we minimize response time?

- Topics:
- i. Internal CPU Design
  - ii. Memory Design: Caches & DRAM
  - iii. Parallel Processing - Multicore
  - iv. Storage disks, etc.

Computer Performance Between Computers:Performance M/C  $x = n \cdot M/C y$ Take a program: measure execution time of  $x$  &  $y$ Higher Performance  $\rightarrow$  Shorter execution time

$$\text{Performance } x = \frac{1}{\text{Execution time}} \rightarrow \frac{\text{Perf } x}{\text{Perf } y} = \frac{\text{Exec } y}{\text{Exec } x} = n$$

Prog takes: 10s to run on A  
15s on B

$$\frac{15_0}{10_A} = 1.5 \quad A \text{ is 1.5 times faster than B}$$

Measure execution time of program?

- Governed by current activity on multitasking mcu
- Execution time governed by: CPU Exec t, I/O, other

### CPU Time:

Ray.c

```
main
{
    int x,y;
    x=y;
    cout<<x;
}
```



→ Ideally would like to measure execution time of

- User code (Usually only able to measure this)
- OS code

→ User code is easy to access, O/P of PC - MCU code

code instructions (assembly)

→ Have figures for how long every instruction takes

→ Program composed of MCU code instructions

→ Exec Time = CPU Time Executing Instructions

→ Instruction Execution measured by no. of clock cycles / instructions

→ Execution time of program = no. of clock cycles  $\times T_{clock}$

\* Average no. of cycles/instr = no. of prog instr  $\times \frac{\text{Avg}(cycles)}{\text{instr}} \times T_{clock}$   
 can vary instruction to instruction (CPI)  $\times T_{clock} \times T_{clock}$ \*

### Analyse Computer Program:

Execution Time of program: No. of Instr  $\times CPI \times T_{clock}$

$$\boxed{\text{Exec Time} = \frac{\text{Instr Count} \times CPI}{T_{clock} \text{ CLK Rate}}}$$

| MCU A | Cycle Time | CPI |
|-------|------------|-----|
|       | 250ps      | 2   |
| MCU B | 500ps      | 1.2 |

which one is faster? Assuming they have identical Instruction Set of Architecture (ISA)

$$\text{Exec A} = \text{ISA} \times 2 \times 250\text{ps} = 500 \times \text{ISA}$$

$$\text{Exec B} = \text{ISA} \times 1.2 \times 500\text{ps} = 600 \times \text{ISA}$$

$$\frac{\text{Exec B}}{\text{Exec A}} = \frac{600}{500} = 1.2 \quad \text{A is 20% faster than B}$$

∴ Performance depends upon

Performance depends on: Algorithm  $\rightarrow$  ISA + CPI  
Language & Compiler  $\rightarrow$  ISA + CPI

ISA = N, CPI, T<sub>Clock</sub>

Amdahl's Law:

- Applies to making changes to hardware to improve overall performance
- Improve Multiplier Area of CPU: Works faster but ~~can't~~ change other parts of CPU

$$T_{\text{overall perf}} = \frac{T_{\text{multiplier}}}{\text{Improvement}} + T_{\text{unmodified}}$$

In a sample program multipliers take 80s for 100s of program

How must we improve multiplication perf % to get the MCU to run 5 times faster?

Moore's Law: Density of transistors will double every 18 months  $\approx$  CPU power doubles every 18 months



### CPU Breakdown:



→ Improvements made by Hardware engineers w/ extra transistors → no work for software

### Instruction Level Parallelisation (ILP):

→ Execute multiple instructions in parallel at same time



ILP improvements slowed down  
because control size >> data processing  
→ power consumption becomes issue

→ As transistors are reduced in size



Transistors smaller: problem switching fully off



- ILP Wall

- Power Wall

- Memory Wall

→ Brick wall to traditional unicore CPUs

→ Must move to multicore

→ Use simpler internal design, but use CPUs in parallel

→ Up to software engineers to update design programs  
for multicore platform - multithreaded program