



Mosser's law: No. of transistors on a microchip doubles [1985] every two years.

- ↳ It claims that speed & capability of our computers to increase every two yrs. & cost would be halved.
- ↳ Growth is exponential.
- ↳ with time the machinery size will be decreased, faster & cheaper with time.

contains carbon & silicon which runs electricity faster.



→ Ending of Mosser's law? ←

- ① By 2020, it was believed that the physical limits of computer will come.
- ② Cooling of transistors requires more energy



# The Computational Law Argument or

→ Moore's Law 1975 ←

In this law, Moore revised the time frame from 2 yrs to 18 months.

↳ The computational power argument says that

the logical resources rely on both

implicit parallelism

↳ Separate function calls  
are not required to  
implement parallelism,

explicit parallelism

↳ extra programming  
efforts are needed  
to tell the compiler to  
use the resources parallelly

↳ Ex: Pipelining

Vectorization.

Multithreading

↳ Parallel loops

↳ Message passing

↳ Data parallelism

↳ Lang:

↳ Lang

① Occam

② Erlang

③ Java.

① AIXM

② BMDFM (Binary  
modular Dataflow machine)

③ HPF : High Performance  
Fortran.

④ SAC, Id, SFSAL

Serial processes delays on implicit parallelism



Annexure No :

We can achieve parallelism in two ways



→ Memory Risk Argument →

Clock Rates → 40% ↑ / year while.

DRAM → 10% ↑ / year  
Results in performance issues.

Here principle of locality works

↳ frequently access the same set of memory  
at same location

## → Data Communication Argument ←

→ in databases & data mining.

- ↳ Large Volume of Data is not easy to move from one point to another
- ↳ Thus the movement of data is need to done with parallelism.
- ↳ Thus SETI@ home, folding @home uses this technique to large computing platforms.

} distributed computing project for disease research using molecular dynamics through volunteer computing





Annexure No :

## conventional Architecture



h/w responsible to perform ALU.

→ Pipelining ←

- ① Connected in series .
- ② Data elements are operated .
- ③ In OP/IP → O/P of Elem → I/P of elem .

Process in 5 phases

- IF → instruction fetch .
- ID → instruction decode
- OF → operand fetch
- OE → operand execute .
- OS → operand store .

|    |    |    |    |    |
|----|----|----|----|----|
| IF | ID | OF | OE | OS |
|----|----|----|----|----|

Reason to implement Pipeline.

- ① To improve the performance of CPU improving faster circuits
- ② arrange the hardware in such a way that instruction executes at same time .

## Non pipelining structure



## Pipelining structure



Pipelining

Arithmetic

Pipelining

↳ 4 stages

Instruction

Pipelining

↳ 5 stages

↳ 6 stages

Intermediate Registers are used to store  
the intermediate outputs -

Intermediate Register / Latch / Buffer.

- ① Instruction Fetch  $\rightarrow$  CPU reads  $\rightarrow$  memory  $\rightarrow$  value is
- ② Int. Decod : Register file is accessed to get the values from Program Counter
- ③ Instr. Execute : ALU operations are performed.



Annexure No :

$$\text{Pipelining} = (K \times n - 1) T_p$$

where  $n$  = no. of inst.

$K$  = no. of stages

$T_p$  : clock cycle.

$$\text{Non-Pipelining} = \text{Non} \times K \times T_p$$

$$\begin{aligned}\text{Speed Up} &= \frac{\text{Perf. of NP}}{\text{Perf. of P-}} = \frac{n \times K \times T_p}{(K+n-1) T_p} \\ &= \frac{n \times K}{(K+n-1)}\end{aligned}$$

$\therefore n \ggg k$ :

$$\therefore \frac{n \times k}{n} \\ \boxed{s=k}$$

## Super Scalar Processor

### ① Uses Instruction level parallelism

- ↳ There is single thread of execution of a process - with its own set of resources.
- ↳ Two approaches of Instruction level parallelism
  - ↳ Hardware : Dynamic Parallelism  
↓  
processor decide at run time that which instruction to execute in parallel
  - ↳ Software : static Parallelism
    - ↳ Compiler decide which instruction to execute in parallel.

- ② To improve the rate of more than one instruction per clock cycle for a single sequential program
- ③ on a single processor



Instruction level  
Parallelism

Sequential



Independence

Dependency

Annexure No :

- Superscalar in works in single precision.
- Executes more than one instruction during clock cycle by dispatching multiple instruction at same time.



Adv. : Can make the optimize use of resources.

Dis : Detimental Effect of on performance  
 ↗ Harmful effect-

Superscalar Archi., CPU manages several Instructions pipeline to perform multiple Inst. at same time

Types of super scalar

- ↳ Intel core i7
- ↳ Intel Pentium
- ↳ IBM Power PC 601

## → Instruction Stream ←

### Control Structure of Parallel Platforms

Parallel computing provide more resources & memory in order to tackle the problems that could not be solved through serial computing.

- Serial computing follows von Neuman architecture
- Parallel computing follows Flynn's classification



Fig (a) Von Neuman

SISD : Traditional computers.

SIMD : Image processing & signal processing

MISD : Not Applicable.

MIMD : Distributed Computing & parallel processing & high performance computing



Annexure No. :

## Parallelism

- ① Implementation Property
  - ② Task run / execute simultaneously.
  - ③ Multi processor
  - ④ Sub class of concurrency
  - ⑤ Current task can be parallelized
- " Parallel is doing lot of things at once

## Concurrency

- ① Semantic Property
- ② Task run in overlapping periods of time.
- ③ Single processor
- ④ Order of the processes doesn't matter.
- ⑤ Concurrency is managing dealing with lots of things at once
- ⑥ For single core you have concurrency but not parallelism.

## Superscalar

- ① Use instruction level parallelism
- ② Several ins. simultaneously / multiple pipeline.  
 ↗ Duplicate resource exec
- ③ Spatial Parallelism      multiple ins.
- ④ Run con. on several resources
- ⑤ Duplicating Hardware Resources. like

Enrollment No. :

## Pipelining

- ① Overlapped within the execution
- ② Single pipeline
- ③ Temporal Parallelism
- ④ Several operations on common h/w.
- ⑤ Executing units pipelined more deeply with very fast LLC

Page No. :



Annexure No :

- Very Long Instruction Word (VLIW) Architecture
- It optimizes the ILP. for improved performance.
- It is simpler way to build a superscalar microprocessor.
- CPU processor allows program to execute in sequence but VLIW processor allows to explicit specify instruction to use execute in parallel.
- It allows high performance without complexity.
- It is much more advanced to superscalar but the functioning is just like the superscalar.

Examp:

intel i860, NXP (Previous Philips Semiconductors)  
 CG1000 digital signal processor.

- The architecture is depending upon the compiler.
- The program decides the parallel flow of the instructions.
- This increases the compiler complexity but decreases the hardware complexity.

Word = Register size = sequence of bits  
 that can be transformed from the working memory to the register of processor.

$$1 \text{ Word} = 16 \text{ bits} / 2 \text{ bytes.}$$

- Multiple independent operations are grouped together in single VLIW instruction
- Each operation has independent functional unit.
- Functional unit shares common register file.
- Vms-word: 64-1024 bits : depend upon no. of execution unit & code word length.
- Instruction scheduling & dispatching of word is done statistically by the compiler.
- Compiler checks for dependencies before scheduling parallel execution of the instructions.





Annexure No :



→ Advantages ←

- Reduce H/w complexity
- Reduce power consumption
- Simplifies decoding & instruction issues.

Disadvantage :

- ① Complex compiler required
- ② Increased program code size.

Applications

- ① Digital Signal Processing
- ② Multimedia Processing
- ③ Scientific computing
- ④ Embedded system.