

## Pyramids Taxonomy for classification of computer

### SISD: (single instruction, Single Data)

- > A traditional sequential computer architecture where a single processor executes a single instruction stream on a single data stream.
- > Characteristics:



One CU  
One Processing unit  
Executes one instruction at a time  
Processes one data at a time

- > eg:- classical uniprocessor systems
- > uses?? Embedded systems, simple tasks.

### SIMD:

- > A parallel architecture where multiple processing units execute same instruction on different pieces of data simultaneously.
  - > Characteristics:
- One CU  
Multiple PU  
Execute one inst on many data pks.  
vector & matrix operations.



- > eg: GPU  
Intel AVX
- > Image processing  
ML Training

> A model considered ...

### MISD!

- > An uncommon architecture where multiple processors execute different instructions on same data stream.
- > Ch<sup>r</sup>:
  - Multiple control units
  - Multiple PU
  - Limited applicability
- > eg: fault Tolerant systems.
- > Real Time systems
- Safety critical systems (Aerospace)



### MIMD!

- > A highly flexible architecture where multiple processors execute different instructions on different data independently.
- > Ch<sup>r</sup>:
  - X CU
  - X PU
  - asynchronous parallelism
  - shared/ distributed memory based.
- > eg: Multicore CPU.  
cluster/ cloud c.  
modern supercomputers.
- > web servers  
Databases.



# PRAM (Parallel Random Access Machines)

- > A model considered for most parallel algorithms
- > Multiple processors are attached to a shared memory

|             | SISD                         | SIMD                           | MISD                          | MIMD                                 |
|-------------|------------------------------|--------------------------------|-------------------------------|--------------------------------------|
| data stream | Single                       | Single                         | multiple                      | multiple                             |
| parallelism | Single                       | single                         | S                             | N                                    |
| usage       | None                         | Data P                         | Instr <sup>n</sup> redundancy | Task P.                              |
|             | Legacy sys.<br>Embedded sys. | -GPU<br>-SC.C.<br>-ML-Training | -fault tolerant systems       | -Multiprocessor<br>CPUs,<br>clusters |

## Task Parallelism

- > Involves different tasks across multiple processors

> Features:  
 \* Parallel execution of different tasks.

\* Each task may be independent or may req. co-ord.

\* Heterogeneous workloads.

- Diverse workload
- Eff ↑, flexible

> eg!



- > Multithreaded systems.  
 OS / CC / clusters.  
 RTS

## MCCPU's

## Data Parallelism

- > Involves distributing same task across multiple processors.
- > Each processor works on different subset of data.

> Features:  
 # Parallel execution of same task.

- # Efficient for large datasets.
- # Homogeneous workloads.

> eg! Image Processing

Each processor applies same filter to diff. part of image

> {ML-Training  
Graphics.}

- Scalable  
 - Performance ↑ → comm. overhead  
 - Simple → limited use.

## GPU's

## PRAM (Parallel Random Access Machines)

- > A model considered for most parallel algorithms
- > Multiple processors are attached to single block of memory through (HAL) - Memory

## bit level parallelism

- > This is about parallelism inside a processor
- > word size of a computer ↑
- >  $8 \rightarrow 16 \rightarrow 32 \rightarrow 64$ .
- > Benefits:
  - \* faster arithmetic operations
  - \* Reduces no. of instructions required.
  - \* common in modern processors.

## Instruction Level Parallelism!

- > Executing multiple instructions simultaneously within a processor
- > sequential → concurrent
- > Benefits: \* Better utilization = Parallelism.
- > Techniques:
  - \* Pipelining (Fetch  $\rightarrow$  Decode  $\rightarrow$  Execute)  
eg! Assembly line
  - \* Superscalar Execution  
Multiple Execution units
  - \* Out of order execution
    - ✓ As resources are available
    - ✗ not strictly in program order
  - \* Speculative execution & Branch prediction  
Execute instruction ahead of time

HPC systems aim to solve large, complex problems by using multiple computing resources in parallel. The key concepts used for this are:-

- 1). Concurrency
- 2). Decomposition

Note CTD  
foundation of parallel program model

## Concurrency

- > Multiple tasks all in progress at the same time.
- > concurrency  $\rightarrow$  parallelism
- > degree of conc. $\uparrow$
- > levels!
  - \* ILP (Instruction)
  - \* TLP (Thread)
  - \* PL (Concurrency Process)
  - \* DLP (Data)
  - \* Task level concurrency
- > conc. computing
  - $\downarrow$
  - Threading + Asynchrony + Preemptive Multitasking

## Decomposition

- > process of breaking a large problem into smaller subproblems.
  - $\downarrow$
  - concurrency
- > Types!
  - \* Domain D (Data)
  - \* Functional D (Task)
  - \* hybrid D.



## PRAM (Parallel Random Access Machines)

- A model considered for most parallel algorithms
- Multiple processors are attached to single block of a memory through (MAU) - Memory access unit.
- A PRAM model contains
  - \* A set of similar type of processors.
  - \* common shared memory unit.
  - \* MAU.



n number of processors can perform independent operations on n no. of data. which may result in simultaneous access of same memory location. To solve we have following constraints.

BREW  
BRCW  
CREW  
CRCW

Methods to implement PRAM

- ① Shared memory model
- ② Message passing model
- ③ Data parallel model

