

# Computer Organization and Operating System

# Computer Performance

Akharin Khunkitti

KMITL

# Topic

- Computer Performance Overview
- Performance Techniques
- Clock Speed
  - Overview, Effects
- Pipeline
  - Overview, Effects
- Parallel Processing
- Conclusion



# Computer Performance Overview

- Need for higher speed of processing
  - High Performance
- Performance Measurements
  - Instructions Per Second
    - MIPS – Million Inst. Per Sec.
  - Operations Per Second
    - FLOPS – Floating Point Op./ Sec.
  - **Cycle Per Instruction (CPI)**
  - **Instruction Per Cycle (IPC)**
- Performance Effects
  - Clock Speed
  - Instruction Execution Time
  - Instructions in Program
- Benchmarks / Standards



# Performance Techniques

- Clock Speed
- Pipelining
- Parallel Processing
  - Multi-Core
  - Clustering



# Clock Speed

- CPU synchronized to clock
- Increase Clock Speed – Hz
- Physical Limits
  - Electronics
  - Size / Distance
- Clock Speed vs Power Consumption
  - Heat
- CPU-RAM Speed difference
  - Cache Memory



# Pipelining

- Instruction Cycle => Multiple Stages
- Execute Multiple Instructions Simultaneously
  - Instruction Stages Overlapping
- Number of Instruction Stages => Number of Parallel Instructions in Pipeline
- Increase Instruction Per Cycle (IPC)
  - One Clock Cycle => Execute Multiple Instructions
- Pre-Execution



This timing diagram shows the execution of five instructions over eight clock cycles. The rows represent instructions, and the columns represent clock cycles (t1 to t8). The legend indicates the stages: IF (Instruction Fetch, green), ID (Instruction Decode, light green), IE (Instruction Execute, yellow), and RW (Result Writing, light blue). Dashes ('--') indicate no activity. Annotations include 'No instruction' pointing to the first column of Instr. 1 and 'Number of stages engaged' pointing to the last column of Instr. 5.

|          | t1 | t2 | t3 | t4 | t5 | t6 | t7 | t8 |
|----------|----|----|----|----|----|----|----|----|
| Instr. 1 | IF | ID | IE | RW | -- | -- | -- | -- |
| Instr. 2 | -- | IF | ID | IE | RW | -- | -- | -- |
| Instr. 3 | -- | -- | IF | ID | IE | RW | -- | -- |
| Instr. 4 | -- | -- | -- | IF | ID | IE | RW | -- |
| Instr. 5 | -- | -- | -- | -- | IF | ID | IE | RW |

# Pipeline Problems / Hazards

- Multiple Instructions execute simultaneously
  - Data Hazard
    - Data Access from multiple instructions
  - Control Hazard
    - Sequences of instructions execution
    - Branch
  - Structural Hazard
    - Resources conflicts from multiple instructions



## Stall cycles



## Instruction pipelining Problems – Control hazard



## Instruction pipelining Problems – Data hazard

### ■ Data hazard

An instruction may produce data that is needed by a later instruction

Examples:



(No bubble, why?)



# Pipeline Hazard Solutions

- Branch Prediction
  - Misprediction
- Delayed Execution
  - Added Clock
- ...



| State    | Instruction Cycle |    |     |    |    |     |     |    |
|----------|-------------------|----|-----|----|----|-----|-----|----|
|          | I1                | ST | ADD | I3 | I4 | I4  | I5  | I6 |
| Fetch    |                   |    |     |    |    |     |     |    |
| Decode   |                   |    |     | I1 | ST | ADD | I3  | I5 |
| Operands |                   |    |     |    | I1 | ST  | ADD | I4 |
| Execute  |                   |    |     |    |    | I1  | ST  | -  |
|          |                   |    |     |    |    |     | ADD | I4 |

Extra cycle added



## Pipeline Flush on a Misprediction



# Parallel Processing

- Multiple Executing Units Simultaneously
- Multiple Cores
  - Core => CPU (REG+ALU+CU)
- Multiple Threads
  - Threads – Virtual CPU
    - Executing Status
    - Registers
  - Shared CPU-Components
    - ALU, Control Unit
- Multi-Processors
  - Multiple CPU in multiple sockets
  - Each CPU may be Multi-Cores
  - Shared Memory
- Multi-Computers
  - Multiple sets of CPU-Memory
  - Shared I/O
  - Separate I/O => Computer Cluster



# Conclusion

- Computer Performance continuously increased, using many techniques
- Increasing clock speed is a basic way
  - Limits by physics
  - Different component speeds
    - For main memory, solved by cache memory
- Pipeline increased parallel instructions execute in overlapped instruction stages
  - Multiple instruction executions hazards reduced the performance
- Increasing number of executing units in parallel
  - Multiple executors at the same times
  - Multiple Cores/Threads/Processors/Computers
- Final words, Computer Performance can be improved by Increasing
  - clock speed
  - parallel instructions and execution units

## Factors affecting CPU performance



# END

Questions?