

9/12/2017 :- Prof. Banerjee (soft computing)

Threshold logic

Fuzzy logic :- Data preprocessing

optimization :-

Hybrid Techniques. (Combination of different techniques)

Supervised learning :-

→ not good in generalizing.

Unsupervised learning :-

→ good at generalizing

→ not good in performance v/s. supervised.

Criticism

Soft computing (Characteristics)   
Simon Haykin

↳ chapter - ①

v R. J. Granahan  
↳ chapter - ①

{ James A. Freeman

① - ③ } 2

David RP

9/12/2017 :- Prof. Banerjee (soft computing)

Threshold logic

Fuzzy logic :- Data preprocessing

optimization :-

Hybrid Techniques. (Combination of different techniques)

Supervised learning :-

→ not good in generalizing.

Unsupervised learning :-

→ good at generalizing  
→ not good in performance v/s. supervised.

Criticism

Soft computing (Characteristics)   
Simon Haykin

↳ chapter - ①

v R. J. Granahan  
↳ chapter - ①

{ James A. Freeman

① - ③ } 2  
David RP

## Evolution Algorithms

DATE \_\_\_\_\_  
PAGE NO. \_\_\_\_\_

→ D.E. Goldberg. (1-3 chapters)

1. M.O.E.A (Multi objective evolutionary ~~algo~~)

2. L.K. Deb (NSGA II)

(Chapter 1-6)

Soft computing

Rough sets.

Hybrid Techniques.

AN  
Evo algo  
fuzzy logic

### Soft computing

- (1) Basic decision making model for human mind.
- (2) soft computing is foundation of computational intelligence in machine.
- (3) Treatment of imprecision, uncertainty, partial and approximate knowledge representation and probability.

### Hard Computing vs. Soft Computing

- (1) Hard computing is based on precise modelling and analysing to a accurate results.
- (2) works well for simple problem but bounded to NP-completeness.

?-class problem:- Problem that is solved

NP-hard problem - can solve in polynomial

### Area of soft computing

- (1) Subfield to hard computing.
- (2)
- (3) soft robotics
- (4)

KNN

ANN

v Difference b/w Hard computing and soft computing

univ features of soft computing

Basic components of soft computing

- fuzzy logic

- evaluation algorithm

Biological vs. conventional

speed

robustness

Flexibility / adaptability.

ANN  
i. defined by known outputs

Neuron (biological)

ANN  
Architecture  
Training & weights / connections.  
Testing / performance  
XOR problem

Boolean  
And

| $x_1$ | $x_2$ | Output |
|-------|-------|--------|
| 0     | 0     | 0      |
| 0     | 1     | 0      |
| 1     | 0     | 0      |
| 1     | 1     | 1      |

Date \_\_\_\_\_  
Page No. \_\_\_\_\_

Two input XOR problem can't solve in XOR logic.

Genetic Algorithms

1. encoded form (chromosome)
2. population
3. Genetic operators
4. algo.

5. Basic considerations

Initial population



Exploration & Exploitation Dilemma

Notes

### Introduction :-

Page No.  
Date : 9/12/2019

- \* Hard computing / conventional computing
- \* Soft computing : explorative search is used. (not exhaustive)  
↳ gives acceptable result / solution.
- \* Fuzzy logic : degree of membership (measured with value b/w 0 & 1).
- \* Squared Error : Difference between desired and actual result i.e.  
$$(D - A)^2$$
for 1 result set :  $(D_1 - A_1)^2$   
for 2 result set :  $(D_1 - A_1)^2 + (D_2 - A_2)^2$
- why squared error? → bcoz if small error can be further reduced & higher error can be further amplified  
Also, squared error will take care of both +ve & -ve differences.

### Learning →

1. Teacher

2. Learning Element / learning system.

Scalability :- If input increases & system can be balanced with visibility to user then it is said to be scalable system.

### Types of Learning →

- 1. Supervised Learning
- 2. Unsupervised Learning
- 3. critic (?)

- Page No. \_\_\_\_\_  
Date: / /
- 2. Hard comp works well for simple problems but bounded for NP completeness.
  - Explanation →
  - \* Handling data situation or imprecise data situation is not precise.
  - \* uncertainty is concerned with probability (may or may not happen).

$f(x_1, x_2, \dots, x_n) =$

$x_i \in [a_i]$

for  $x_1 \in [a_1]$  ] for 2 nested loops  
 for  $x_2 \in [a_2]$  and 2 nested loops  
 for  $\dots$

but a computer can't allow many nested loops (e.g. a compiler will allow max 10 loops).

|                    |                                                     |
|--------------------|-----------------------------------------------------|
| for n variables    | $f_1, f_2, f_3, \dots$                              |
| ans 2 <sup>n</sup> | new many functions will be possible!                |
| $2^n \rightarrow$  | for $2^n$ inputs there will be $2^{2^n}$ functions. |
| space              | $\vdots$                                            |

- a soft computing is well suited for real world problems
- + it can't give best results but it can give acceptable results.
- a conventional paradigm shift
- \* if model is not present, then also soft computing can be a good solution

- \* Adaptability  $\rightarrow$  it is adaptive & flexible.
- \* contact sensitivity

### # Artificial Neural Network $\rightarrow$

- highly parallel & distributive in nature
- having natural capability for storing potential knowledge.
- It is similar to human brain in 2 aspects:-
- i. knowledge is captured through learning process.

### 1. Neuron ( Biological Network )

### 2. Artificial Network

### 3. Architecture Networks

### 4. Training & weight / connections.

### 5. Testing / Performance

### 6. XOR Problem.



### hard limitting



Sigmoidal activation

change in weight

$w_{ij}(t+1) = w_{ij}(t) + \Delta w_{ij}(t)$  determined by learning rate (step number)  $\eta$  &  $\alpha$  (adjustment)  $\alpha$  ( $0 < \alpha < 1$ )

$w_{ij}(t)$ : current weight

| $y_j$<br>change in weight | $w_{ij}$<br>weight | $c_{ij}$<br>computation | $x_i$<br>input | $w_{ij}(t)$<br>initial weight |
|---------------------------|--------------------|-------------------------|----------------|-------------------------------|
| $+1$                      | $+1$               | $+1$                    | $+1$           | $+1$                          |
| $-1$                      | $-1$               | $-1$                    | $-1$           | $-1$                          |
| $0$                       | $0$                | $0$                     | $0$            | $0$                           |
| $1$                       | $1$                | $1$                     | $1$            | $1$                           |

no training set

configuration state

|     | $C$  | $P$  |
|-----|------|------|
| $C$ | 0.00 | 0.00 |
| $P$ | 0.00 | 0.00 |

$$\text{The output (desired)} = \frac{\text{sum}}{2} \times 100\%$$

Desired output =  $d_{ij} \times 100\%$

Initial weight =  $w_{ij} = 1.00$

### Biolar AND

Page No. \_\_\_\_\_  
Date : / /

| $x_1$ | $x_2$ | $L$ |
|-------|-------|-----|
| 0     | 0     | 0   |
| 0     | 1     | 1   |
| 1     | 0     | 0   |
| 1     | 1     | 1   |

$$b_{1,0} \text{ bias} = 1.$$

$$2 - 3 \cdot 0 + 1 \cdot 1 = 0.$$

$$3 - 3 \cdot 0 + 1 \cdot 1 = 1.$$

$$4 - 3 \cdot 0 + 1 \cdot 1 = 1.$$



"Decision boundary"  
→ can be separated by a line.

### XOR



↓  
can't be solved using  
linear boundary.

### # Genetic Algorithm

1. Encoded sol'n (chromosome)
2. Population chromosome
3. Genetic operators
4. Mgo.
5. Basic considerations



- \* Exploration  $\rightarrow$  search as many new points as possible to find best sol?
- \* Exploitation  $\rightarrow$  whatever best soln we have so far, we search nearby it to look for any better soln.
- \* Intensification  $\rightarrow$  means exploitation
- \* Diversification  $\rightarrow$  means exploration

(Follow slides provided by sir)

16/12/17

\* Antenna

Page No.  
Date: / /

Omnidirectional antenna



feeding point

Directional :-

1. Dipole Antenna



Radiation pattern.

feeding point : from where we put/feed electrical wave  
& from where we are collecting electrical signals from antenna.

2. Yagi Uda



when  
Antenna is step  
with this po  
it is receiving  
signal.

3. Horn Antenna



4. Parabolic reflector ( like dish TV, Telsky )



Gain factor of Antenna

$$G = \frac{4\pi A_e}{\lambda^2}$$

$$\lambda = \frac{c}{f}$$

↑  
carrier wavelength

$G=1$  for only omnidirectional antenna.

#### # Propagation methods

- Ground wave propagation ( below 2 MHz )
- Sky wave " ( 1400 - 2 - 30 MHz )
- Line of sight " ( e.g. light wave beat its freq. is above 30 MHz )

Ground wave :-



follows curvature of earth :-

For optical wave :- LOS (line of sight comm)  
 For radio wave :- reflection/refraction/diffraction anything may happen



### # Attenuation :-

- \* use amplifier to transmit signal appropriately

o  $\int L_0$



$$\text{free space, } \frac{P_t}{P_r} = \frac{(4\pi d)^2}{\lambda^2}$$

Loss

$$L_{dB} = 20 \log(f) + 20 \log(d) - 147.56 \text{ dB}$$

$$G_t = \frac{4\pi}{\lambda^2} A_t \text{ effective Area} \quad P_t = \frac{(4\pi d)^2}{G_t G_r \lambda^2} = \frac{(\lambda d)^2}{A_t A_r}$$

for directive antenna, we can compensate  $\downarrow$  this loss by -

$$L_{dB} = -20 \log(f) + 20 \log(d) - 10 \log(A_t A_r) + 16.9$$

$\downarrow$  compensate

## # Noise

- Thermal noise
- Intermodulation noise:- signals with diff. frequency are transmitted in a shared medium



so, mixing of signals will produce energy at freq.  $(f_1 + f_2)$  or their multiples in freq.

- Crosstalk noise is unintentionally selecting  $f_2$  freq. instead of  $f_1$ .

- Impulse noise  $\rightarrow$  very random in nature



↑ Impulse noise has washed out these 3 bits.

## # Multipath

original main pulse  
reflected/reflected  
directed  
scattered

Fading channel

$K = 0 \rightarrow$  Rayleigh

$K = \infty \rightarrow$  AWGN

- \* Error correction
- \* Adaptive Equalization

# Delay = Transmission time + Propagation time +  
processing time + Queuing time.

Transmission time =  $\frac{\text{Message size}}{\text{Bandwidth}}$

Propagation time =  $\frac{\text{Distance}}{\text{Propagation speed}}$

Energy consumption =  $P_t \times \frac{\text{No. of bits transmitted}}{\text{Bit Rate}}$



IDM  $\rightarrow f$  is fixed  
FDM  $\rightarrow t$  is fixed

(Numericals  $\rightarrow$  photos).

17/12/2017  
CELLULAR NETWORKS



BTS (Best Transceiving station)



D = min. distance b/w centre of cells  
that use same frequency band. called co-channels.

R = Radius of a cell

d = distance b/w centres of adjacent cell.  $= \sqrt{3} R$

N = No. of cells in a repeater's pattern  
(as each cell in pattern use a unique set of  
frequency bands) termed the reuse factor.

$$N = I^2 + J^2 + (I \times J) \text{ where } I, J = 0, 1, 2, \dots$$

$$\therefore N = 1, 3, 4, 7, 9, \dots$$

(Graph coloring problem)  
Page No. \_\_\_\_\_!  
Date: 11



$$\frac{D}{R} = \sqrt{3} N \quad (\text{bcz } d = \sqrt{3} R)$$

$$\therefore \left[ \frac{D}{d} = \sqrt{N} \right]$$

Q.  $K = 395$ ,  $N=7$ . what is max. freq. which can be achieved  
for individual cells?

$$\therefore K = \frac{395}{7} \approx 57$$

1. Adding new channels. (can add only ~~five~~<sup>no of</sup> channels, not possible everytime)
2. Frequency borrowing
3. cell splitting

inter-channel interface.



E D  
F E

F/E

\* Macro cell

|                    | Macro cell | Micro cell   |
|--------------------|------------|--------------|
| Cell radius        | 1 to 20 km | 0.1 to 1 km  |
| Transmission Power | 1 to 10 W  | 0.1 to 1 W   |
| Average delay      | 1 to 10 ms | 10 to 100 ms |
| Bit Rate           | 0.3 Mbps   | 1 Mbps       |



For optical wave :- LOS (line of sight communication)  
 For radio wave :- reflection refraction diffraction anything may happen



# Attenuation :-

\* use amplifier to transmit signal appropriately

$0 \leq 10$



$$L_{dB} = 20 \log(f) + 20 \log(d) - (4\pi d)^2 / (\lambda^2 * A_r * A_t)$$

$$G_r = \frac{4\pi}{\lambda^2} \cdot A_r$$

$$G_t = \frac{4\pi}{\lambda^2} \cdot A_t$$

$$\text{effective area } \frac{P_t}{P_r} = \frac{(4\pi d)^2}{(4\pi d)^2} = \frac{A_t}{A_r}$$

for directive antenna we can compensate  $\downarrow$  this loss by  $\uparrow$   
 $L_{dB} = 20 \log(f) + 20 \log(d) - 10 \log(A_t A_r) + 16$   
 compensate

C - The loss  
K -

### # Noise

- Thermal noise
- Intermodulation noise :-  
signals with diff. frequency are transmitted in a shared medium

$$\square \rightarrow f_1$$

$$\square \rightarrow f_2$$

so, mixing of signals will produce energy at freq.  $(f_1 + f_2)$   
or their multiplicative bits

- Cross talk noise  
~~is unwantedly~~ selecting  $f_2$  freq. instead of  $f_1$ .

- Impulse noise  $\rightarrow$  very random in nature



Impulse noise has washed out these 3 bits.

### # Multipath

original main pulse  
reflected/refracted  
diffracted  
scattered.

### Fading channel

$K = 0 \rightarrow$  Rayleigh

$K = \infty \rightarrow$  AWGN

- \* Error correction
- \* Adaptive Equalization

# Delay = Transmission time + Propagation time +  
processing time + Queuing time.

$$\text{Transmission time} = \frac{\text{Message size}}{\text{Bandwidth}}$$

$$\text{Propagation time} = \frac{\text{Distance}}{\text{Propagation speed}}$$

$$\text{Energy consumption} = P_t \times \frac{\text{No. of bits transmitted}}{\text{Bit Rate}}$$



IDM  $\rightarrow f$  is fixed  
FOM  $\rightarrow t$  is fixed

(Numericals  $\rightarrow$  photos).

17/12/2017

### CELLULAR NETWORKS



BTS (Best Transceiving station)



$D$  = min. distance b/w centre of cells  
that use same frequency band, called co-channels.

$R$  = Radius of a cell

$d$  = distance b/w centres of adjacent cell.  $= \sqrt{3} R$

$N$  = No. of cells in a repeater's pattern  
(each cell in pattern uses a unique set of frequency bands) termed the reuse factor).

$$N = I^2 + J^2 + (I \times J) \text{ where } I, J = 0, 1, 2, \dots$$

$$N = 1, 3, 4, 7, 9, \dots$$

$$\sum dx$$

(B)

(Graph coloring problem)  
Page No. \_\_\_\_\_  
Date : 11



7 cell structure.

$$\frac{D}{R} = \sqrt{3} N \quad (\text{bcz } d = \sqrt{3} R)$$

$$\therefore \boxed{\frac{D}{d} = \sqrt{N}}$$

Q.  $K = 395$ ,  $N = 7$ . what is max. freq. which can be achieved for individual cells?

$$f = \frac{K}{N} = \frac{395}{7} \approx 57$$

1. Adding new channels. (can add only finite channels, not possible everytime)
2. Frequency reusing
3. cell splitting.



inter-channel interface.

a E D  
d →

E/E

\* Macro cell

Macro cell

Micro cell

| Cell radius        | 1 to 20 km | Radius | 0.1 to 1 km  |
|--------------------|------------|--------|--------------|
| Transmission Power | 1 to 10 w  | Power  | 0.1 to 1 w.  |
| Average delay      | 1 to 10 us | Delay  | 1 to 100 ms. |
| Bit Rate           | 0.3 mps    | Rate   | 1 Mbps       |



Page No. \_\_\_\_\_  
Date: / /

Landline  
Public  
Switching  
Telephone  
Network.



for local calls:-

- \* one MTSD is enough.  
(from MTSD request to MTSO)



(paging message is broadcasted to a  
BTS)  
to search for subscriber

for STD / International calls :-



{for information only → not for exam:-  
Hata Model.

1968

↓

1980.

$$L_{dB} = 69.55 + 26.16 \log f_t - 13.82 \log h_t - A(h_t) + (44.9 - 6.55)$$

\* For closed loop power control, signal power level, received signal to noise ratio, received bit rate are reqd.

# Traffic Engineering  $\Rightarrow$   
 $L_2$  potential subscriber or mobile units.  
 $N$  = no. of simultaneous user or channel capacity.  
 $\lambda$  = min. rate of calls attended per unit time.  
 $b$  = min. holding time per successful call.

$$A = \lambda b$$

$\uparrow$

"Traffic Intensity"  
 unit is "erlang"

$$A = \lambda b = \rho N$$

where  $\rho$  : is server utilization or fraction of time server is busy.

1. LCD : Lost calls Delayed  
 Blocked calls can be put in a queue waiting for a free channel is called "Lost calls delayed".
2. LCC : Lost calls cleared  
 If the user hangs up & wait some random time interval before another call attempt this is known as "lost calls cleared".
3. LCH : Lost calls Held  
 If the user repeatedly attempts calling, it is known as "lost calls held".

$\lambda$  = number of calls per sec  
 hr. The mean holding time per sec



$\lambda \rightarrow$  Birth rate of cell

Note :  $\sum \lambda_i \Leftrightarrow \sum u_i$  (but no. of channel is fixed)

Q. At room temperature, what is thermal noise power density?

Ans:  $N_0 = k \times T \rightarrow$  Temperature

$\downarrow$

Boltzmann constant

$$\text{Room Temp} = T = 17^\circ\text{C} = 290 \text{ K}$$

$$N_0 = 1.38 \times 10^{-23} \times 290 \text{ W/Hz} \quad 10 \log_{10} N_0 \rightarrow \text{unit (dB)}$$

Bandwidth  
 $B = 10 \text{ MHz}$

$$N_0 = KTB = 1.38 \times 10^{-23} \times 290 \times 10$$

Q. Suppose a signal encoding technique requires that bit error rate of  $10^{-4}$ . If

$E_b/N_0 = 8.4 \text{ dB}$  for a temp. is  $290 \text{ K}$  & data rate is  $2400 \text{ Bps}$ . what received signal level is reqd. to overcome thermal noise?

Take,  $\log_{10}$  on RHS both sides to convert to dB

$$\frac{E_b}{N_0} = \frac{S}{KTR} \Rightarrow 8.4 = 10 \log_{10} S + 10 \log_{10} K - 10 \log_{10} T - 10 \log_{10}$$

$$8.4 = 10 \log_{10} (1.38 \times 10^{-23}) - 10 \log_{10} (290) - 10 \log_{10}$$

Wireless LAN -

1. limited by signal range.
  2. ISM band  
conducting Scientific Medicine →
  3. 802.11 (WLAN)
  4. 802.16 (WiMAX)
  5. 802.15 (personal area networks)  
Bluetooth
- SOM - (space division multiplex.)

2. Modes of operations.

- presence of controlled module (CM)  $\xrightarrow{\text{BS}} \xrightarrow{\text{AP}} \xrightarrow{\text{802.11}}$   
Control mobile stations
- 0 Adhoc comm' - Peer to Peer communication where there is no control module. (CM)

Applications -

or - LAN extension  
cross building interconnection

- ⇒ WLAN Requirement (high throughput  
each one of bandwidth)

- Date: \_\_\_\_\_  
Page: \_\_\_\_\_
2. Number of nodes - may be large ~ 100's
  3. good connection to LAN background NW
  4. good service (Range)
  5. Minimum battery power consumption
  6. Transmission security & robustness
  7. Coordinated NW operation
  8. licence free operation (2.4 GHz)
  9. Cell handoff / NW roaming
    - ↳ dynamic management
    - ↳ adaptive MAC
    - ↳ address management
    - ↳ automatic addition
    - ↳ deletion
    - ↳ relocation
  10. choice of physical solution

(1997 - 802.11)  
FHSS → 1 or 2 Mbps  
WLLS

2000 - 802.11 b  
HRL - WLLS - 11 Mbps

2001

802.11 g,  
OFDM, 54Mbps.



Simple LAN architecture:



- 802.11 access control
- avoid collision
- CSMA
- No collision detection

⇒ 802.11 protocol Architecture -  
Two modes

1) DCF (mandatory) (mandatory  
(distributed co-ordinate function)

↳ best effort service.

↳ CSMA/CA

2) PCF (point co-ordination function)

↳ optional

↳ base station controls access to the  
medium  
using polling mechanism with  
high priority.  
Vto the  
access.

Three type of frame

↳ data

↳ control

↳ Management (provide  
services,  
(QoS) priority)

↳ Quality of service.



⇒ Collision Avoidance

RTS - CTS.



⇒ The use of virtual channel sensing using CSMA/CA





↳ when longer frames divided into multiple frames



SC = sequence control

FC = frame control

D/I = Duration / connection ID

(destination Mac add) Adr1 = MAC address of wireless host or AP for receiving this frame

Adr2 = MAC address of wireless host or AP transmitting this frame

Mac add

Add 3 = Mac address of router interface to which access point is attached.  
 Add 4 = used only in Adhoc mode.

⇒ **Scalable** = automatic ~~selection~~ of see. handing of ARQs

802.16

frame - format

a) data frame DLM - 16  
 b) control. OPSK

(bandwidth request frame)



connection binds the end points through the system

bits

- EO - encrypted payload
- CI - check sum input
- EC - encryption technique .



### Service classes

- Constant bit rate service (voice)
- Real time variable bit rate service (video)
- Non real time variable bit rate service (high quality data)
- Best effort service (data, http etc)



802.15

PAN (personal area Netw.)



SIM → use extra register.

\* Data Hazard

|       |      |                                                 |                |                 | 1  | 2   | 3  | 4        | 5        | 6        |
|-------|------|-------------------------------------------------|----------------|-----------------|----|-----|----|----------|----------|----------|
| Ex :- | DADD | (R <sub>1</sub> ) R <sub>2</sub> R <sub>3</sub> | - IF           | ID              | EX | MEM | WB |          |          |          |
|       | DSUB | R <sub>4</sub>                                  | R <sub>1</sub> | R <sub>5</sub>  |    |     |    | IF stall | IF stall | IF stall |
|       | AND  | R <sub>6</sub>                                  | R <sub>1</sub> | R <sub>7</sub>  |    |     |    |          |          |          |
|       | OR   | R <sub>8</sub>                                  | R <sub>1</sub> | R <sub>9</sub>  |    |     |    |          |          |          |
|       | XOR  | R <sub>10</sub>                                 | R <sub>1</sub> | R <sub>11</sub> |    |     |    |          |          |          |

Data Hazard is that value of R<sub>1</sub> is available after 5th cycle  
 but in subtract operation it uses "old" value of R<sub>1</sub> in  
 the 4th cycle to execute subtract instruction. This is incorrect.  
 ∴ It is a data hazard.

Solution :- Data Forwarding



Speed up pipelining = A-Ex Time unpipelined

A-Ex Time pipelining

$$= \frac{I_1 \text{ unpipelined}}{I_1 \text{ pipelined}} \times \frac{C_2 \text{ unpipelined}}{C_2 \text{ pipelined}} \times \frac{C_3 \text{ unpipelined}}{C_3 \text{ pipelined}}$$

$$= \frac{C_1 \text{ unpipelined}}{C_1 \text{ pipelined} + ?} = \frac{C_2 \text{ unpipelined}}{C_2 \text{ pipelined} + ?} \approx \text{pipeline depth}$$

↑                      ↑  
with                with  
constant            constant  
is called - Hazards

CPI pipeline with hazards > 1

#### \* Types of Hazards →

- structural Hazards
- Data Hazard
- Control Hazard

$$\text{speed up pipelining with Hazards} = \frac{\text{pipeline depth}}{1 + \text{scale due to hazards}}$$

| the value of pipelined with some Hazards |                                    |                        |                |                |                |
|------------------------------------------|------------------------------------|------------------------|----------------|----------------|----------------|
| I <sub>1</sub>                           | -                                  | I <sub>1</sub>         | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |
| IP                                       | IP                                 | ID                     | EX             | MEM            | WB             |
| I <sub>1</sub>                           | I <sub>2</sub>                     | I <sub>3</sub>         | I <sub>4</sub> | MEM            | WB             |
| I <sub>1</sub>                           | I <sub>2</sub>                     | I <sub>3</sub>         | I <sub>4</sub> | WB             |                |
| I <sub>1</sub>                           | I <sub>2</sub>                     | I <sub>3</sub>         | I <sub>4</sub> |                |                |
| X                                        | all                                | all                    | all            | X              | all            |
| →                                        | use same register for 2 operations | use different          | use different  | use different  | use different  |
| without structural hazard                | →                                  | with structural hazard |                |                |                |

Ans:

$$\text{CPU time} = \sum_{i=1}^n T_{C_i} \times \text{CPI}_i \times CCT$$

$$n = 3$$

$$\begin{aligned} 1 &\rightarrow 50\% & \text{CPI}_1 &= 3 \\ 2 &\rightarrow 30\% & \text{CPI}_2 &= 4 \\ 3 &\rightarrow 20\% & \text{CPI}_3 &= 5 \end{aligned}$$

$$\therefore \text{CPU time} = T_C \times \text{CPI}_{\text{overall}} \times CCT$$

$$= 10^9 \times \text{CPI}_{\text{overall}} \times \frac{1}{2.94 \mu s}$$

$$= 10^9 \times \text{CPI}_{\text{overall}} \times 0.5 \times 10^{-9}$$

$$\begin{aligned} \text{CPI overall} &= 0.5 \times 3 + 0.3 \times 4 + 0.2 \times 5 \\ &= 3.7 \end{aligned}$$

$$\text{CPU time} = 10^9 \times 3.7 \times 0.5 \times 10^{-9}$$

$$\frac{\text{CPU time old}}{\text{CPU time new}} = \frac{T_{CPI_{\text{old}}}}{T_{CPI_{\text{new}}}} \times \frac{\text{CPI}_{\text{old}}}{\text{CPI}_{\text{new}}} \times \frac{CCT_{\text{old}}}{CCT_{\text{new}}}$$

(of same program)

e.g.

Raw material

$$\begin{aligned} 5 \times 10^9 &= 500 & \text{CPI} &= \frac{500}{100} = 5 \\ (\text{hrs}) & & & \\ \text{Pipeline factor} &= 10^4 & \text{CPI}_{\text{new}} &= \frac{10^4}{100} = 100 \approx 1 \end{aligned}$$

$$\frac{P_{\text{new}}}{P_{\text{old}}} = \frac{\text{CPU Time old}}{\text{CPU time new}}$$

~~$$\frac{f_{\text{old}}}{f_{\text{new}}} \cdot E_{\text{old}} \cdot C_{\text{CPU}}$$~~

5

Power  $\downarrow$  Energy  
Watt Joules.

$$P = DP + LP$$

↳ Leakage  
Dynamic Power

DP  $\propto$  activity  $\times$  capacitance  $\times$  (voltage)<sup>2</sup>  $\times$  frequency

$$LP = f(V, I) \rightarrow LP = 25\% \times \text{Total Power}$$

↳ Current  
Voltage

LP  $\propto$  V.

Ex :- A program runs for 3 seconds on 100 watt processor.

Energy consumption

$$E = P \times \text{Time}$$

↑  
Power

$$= 100 \times 3 = 300 \text{ Joules.}$$

How to reduce DP, LP & Energy?

There are 2 techniques:-

1. DFS (Dynamic Frequency Scaling)

### Load store Architecture

$$x = a + b$$

Load  $\rightarrow$  LD      R<sub>1</sub>      O(R<sub>x</sub>)  $\leftarrow$  from this address  
Load  $\rightarrow$  LD      R<sub>2</sub>      O(R<sub>y</sub>)  
Add  $\rightarrow$  ADD      R<sub>0</sub>      R<sub>1</sub>      R<sub>2</sub>  
store  $\rightarrow$  SD      R<sub>0</sub>      O(R<sub>z</sub>)

Car Analogy  
stage1    stage2  
C<sub>1</sub>

C<sub>2</sub>    C<sub>1</sub> — — —

C<sub>3</sub>    C<sub>2</sub>    C<sub>1</sub> — —  
Latency for each car = 5  
but throughput is increasing  
i.e. more no. of cars (instructions)  
per second.

MIPS

$\hookrightarrow$  Multiprocess or without interlocked pipeline stages

- Five stages :-

- IF (Instruction Fetch)
- ID (Instruction Decode)
- EX (Execute)
- MEM (Memory)
- WB (Write Back)

\* Processor Performance Equation

$$\text{CPU time} = (\text{CPU clock cycles of a program}) \times (\frac{\text{clock cycle time}}{\text{Instruction count (I.C.)}})$$

Q3:

$$\text{CPI} = \frac{\text{CPU clock cycle of a program}}{\text{Instruction count (I.C.)}}$$

↑  
Clocks per  
Instruction

Ans

$$\text{CPU clock cycle} = \text{I.C.} \times \text{CPI}$$

of a program

$$\therefore \boxed{\text{CPU Time} = \text{I.C.} \times \text{CPI} \times \text{CCT}} \text{ ns}$$



$$T \propto \frac{1}{F}$$

- # Q.4: Suppose a ~~processor~~<sup>program</sup> takes 1 billion instructions to execute on a processor running at 2 GHz and 50% of instructions whose CPI is 3 clock cycles per instruction and 30% of instruction is 4 clock cycles per instruction & 20% of instruction is 5 clock cycles per instruction. what is the CPU Time?

## ① DF Scaling

Example



$$DP \propto \sqrt{f}$$

## DVFS



$$CPU\ Time = IC \times CPI \times CCT$$

IC  $\rightarrow$  no. of instructions in a program.

Lower IC, Lower CPU Time.

IC  $\rightarrow$  job of compiler or user to minimize the IC.

CPI  $\rightarrow$  no. of clocks per instruction.  
this is minimized by Computer Architecture.

CCT  $\rightarrow$  by H/W Technology.

$$\begin{aligned} CPU\ Time &= \frac{\text{Instructions}}{\text{program}} \times \frac{\text{clock cycles}}{\text{Instruction}} \times \frac{\text{seconds}}{\text{clock cycle}} \\ &= \frac{\text{seconds}}{\text{program}} \end{aligned}$$

# Different types of Instructions

type - i

i+1

i+2

!

n

we have 'n' types of instructions

$$CPU\ clock\ cycles = \sum_{i=1}^n I.C_i \times CPI_i$$

$$CPU\ Time = \sum_{i=1}^n I.C_i \times CPI_i \times CCT$$

$$\text{overall CPI} = \frac{\sum I.C_i \times CPI_i}{IC} = \sum_{i=1}^n \frac{I.C_i \times CPI_i}{IC}$$

Example:

$$\begin{aligned} \text{Program} &= P \\ X &= 100 \text{ W} \\ \text{Efficiency} &= 100 \times 100 \\ &= 10000 \end{aligned}$$



$$P_{AP} \propto V^2 \times f \downarrow$$

$$\begin{aligned} \text{If } f &= 1 \text{ GHz} & P_{AP} &= 25 \text{ W} & \text{LP} &= 25 \text{ W} \\ \text{we change frequency,} & \downarrow & & \downarrow & & \downarrow \\ & 0.5 \text{ GHz} & 37.5 \text{ watts.} & 2.5 \text{ W} & & \end{aligned}$$

*(No change in LP)*

$$T = 100 \text{ sec}$$

$$\begin{aligned} \downarrow & & P = 37.5 + 2.5 = 62.5 \\ \text{200 sec.} & & \end{aligned}$$

$$E_{max} = P \times t = 62.5 \times 200 = 12500$$

$$E_{max} = 12500$$

$$E_{old} = 10000$$

$$\Rightarrow E_{max} = 1.25 \times E_{old}$$

## 2. DVFS

(Dynamic Voltage Frequency Scaling)

$$P = \frac{V}{100 \text{ mV}} T = \frac{100 \text{ sec}}{1 \text{ ms}}$$



$$0.9V = \frac{100 \text{ mV}}{1 \text{ ms}}$$

If voltage decreases, clock cycle-time increases.

$$P = \frac{100 \text{ mV}}{100 \text{ sec}} T = 100 \text{ sec}$$

$$E_{avg} = P \times T = 100 \times 100 = 10000$$

$$75W \rightarrow 25 \times 0.9 = 22.5W$$

$$\begin{aligned} V &= 0.9 \\ f &= 1.1 \text{ ms} \\ &= \frac{1}{1.1 \times 10^{-3}} = 0.9 \times 10^4 \text{ Hz} = 0.9 \text{ kHz} \end{aligned}$$

$$\begin{aligned} 75 \times (0.9)^2 \times 0.9 &= P_{avg} \\ 2 \times 55W &= 77.5W \approx 77 \\ E_{new} &= P \times t = 77 \times 110 = 8470. \end{aligned}$$

4. Memory operation :- Loading & storing takes place.

LD  $R_o \leftarrow O(R_x)$

SD  $R_o \leftarrow O(R_y)$

5. WB (write back) :-

Addition is performed in 3rd stage but its value is reflected only after 5th stage.

ADD  $R_o = R_1 + R_2$

fetch using  
PC

ADD  
arithmetic  
In ID  $\rightarrow$  Destination Source

we come to know  
source & dest. registers  
& that it is arithmetic  
operation

1cc 2cc 3cc 4cc 5cc  
IF ID EX MEM WB.

It takes 5 clock cycles.

e.g. :-

LD  $R_o \leftarrow O(R_x)$  1cc 2cc 3cc 4cc  
Load IF ID EX MEM

It takes 4 clock cycles.

$(O+R_x)$   
address  
is found

LD  $R_o \leftarrow O(R_x)$

$$0 + 108 \\ = 108.$$



SD R<sub>y</sub> O(R<sub>y</sub>) → also takes +cc's.  
store

Ex: BNE Z  
↓

Branch instruction with Jump or No.Jump

if(x == 0) go to L

↑  
we require 2 clock cycles to  
detect whether it is a  
branch instruction with jump or  
no jump.



Buffer is required bcoz if suppose  
instruction come out of IF stage &  
ID is busy, then it can be  
stored in buffer.

at this clock cycle  
& diff. instructions  
are at 4 diff. stages.  
This is called ILP.

Speedup · pipelining = A · Ex. Time unpipelining

A · Ex. Time pipelining .

$$= \frac{I \times \text{unpipelined}}{I \times \text{pipelined}} \times \frac{CPI \times \text{unpipelined}}{CPI \times \text{pipelined}} \times \frac{CCT \times \text{unpipelined}}{CCT \times \text{pipelined}}$$

$$= \frac{CPI \times \text{unpipelined}}{CPI \times \text{pipelined} + ?} = \frac{CPI \times \text{unpipelined}}{1 + ?} \approx \text{Pipeline depth}$$

↑                              ↑  
with overhead            with overhead  
↳ called "hazards"

$CPI_{\text{pipeline with hazards}} > 1$ .

### # Types of Hazards →

- Structural Hazards
- Data Hazard
- Control Hazard.

$$\text{speed up pipelining with Hazards} = \frac{\text{pipeline depth}}{1 + \text{stalls due to Hazards}}$$

(?)\*

| #              | Structural Hazards                      | IF & MEM opr. performed with same register |
|----------------|-----------------------------------------|--------------------------------------------|
| i <sub>1</sub> | - (IF) Hazards<br>IF    ID    EX    MEM | IF & MEM opr. performed with same register |
| i <sub>2</sub> | IF    ID    EX    MEM    WB             |                                            |
| i <sub>3</sub> | IF    ID    EX    MEM    WB             | IF stall                                   |
| i <sub>4</sub> |                                         | IF stall                                   |



$$E_{old} = 10000$$

$$E_{new} = 8470$$

$$E_{new} = 0.84 \times E_{old}$$

07/01/2018

### Instruction Level Parallelism (ILP)

- Pipelining (best technique to achieve ILP)
  - ↓

Allow instructions to execute in overlap manner.

Instructions

Ex.

ADD       $R_0 \leftarrow R_1 + R_2$

RISC  
|

$R_0, R_1, R_2$  are registers on which ADD operation is performed.

sum of  $R_1$  &  $R_2$  is stored in  $R_0$ .

Full form:-

Instructions

Two ways

32-bit Instruction  $\Rightarrow$  word  $\Rightarrow$  ADD  $R_0 \leftarrow R_1 + R_2$   
 called example:-

64-bit Instruction  $\Rightarrow$  double  $\Rightarrow$  DADD  $R_0 \leftarrow R_1 + R_2$

(double add)

direct address, indirect address, register address  $\Rightarrow$  refer to these diff. addressing modes from book.

| clock cycle |     |     |     |     |     |        |
|-------------|-----|-----|-----|-----|-----|--------|
| 1cc         | 2cc | 3cc | 4cc | 5cc | 6cc | 7cc    |
| PC = i      | IF  | ID  | Ex  | MEM | WB  |        |
| PC=PC+4 i+1 |     | IF  | ID  | EX  | MEM | WB     |
| PC=PC+4 i+2 |     |     | IF  | ID  | EX  | MEM WB |
| PC=PC+4 i+3 |     |     |     |     |     |        |

In every clock cycle, one instruction enters a pipeline.

In every clock cycle, one instruction will get completed.

\* The objective of pipeline is  $\boxed{\text{CPI} = 1}$

Instruction fetch:

- 1. IF: Job is to fetch instruction by using program counter (PC).

$$\text{PC} = \text{PC} + 4$$

↑  
+ bytes i.e. 32 bits (1 word).

- 2. ID (Instruction Decode)

ext ADD R<sub>0</sub> R<sub>1</sub> R<sub>2</sub>

→

ID stage checks whether values are present or not.

It decodes the instruction that what type of instruction it is (Arithmetic instruction etc.)

- 3. Ex (Execution) : performing integer operations or floating point operations.

ADD R<sub>0</sub> R<sub>1</sub> R<sub>2</sub>

⊕

↓ ALU (Arithmetic Logical Unit)

FU

## 2. D VFS (Dynamic Voltage Frequency Scaling)

$$P = \frac{1}{100 \omega} \frac{T}{100 \text{ sec}}$$



$$0.9V = \frac{1}{1.1ms}$$

If voltage decreases, clock cycle-time increases.

$$P = \frac{1}{100 \omega} \frac{T}{100 \text{ sec}}$$

$\downarrow$   
110 sec.

$$E_{avg} = P \times T$$

$$= 100 \times 100 = 10000$$

$$75W \quad 25\omega \quad 25 \times 0.9 = 22.5W$$

$$\begin{aligned} \sqrt{V} &= 0.9 \\ f_2 &= 1.1 \text{ ns} \\ &= \frac{1}{1.1 \times 10^9} = 0.9 \times 10^9 \text{ Hz} = 0.9 \text{ GHz} \end{aligned}$$

$$\begin{aligned} 75 \times (0.9)^2 \times 0.9 &= P_{new} \\ 2 \quad 55W &= 47.5W \approx 47 \\ &= 47.5W \approx 47 \end{aligned}$$

$$E_{new} = P \times t = 47.5 \times 110 = 8470.$$

4. Memory operation :- Loading & storing takes place.

LD  $R_o \leftarrow O(R_x)$

SD  $R_o \leftarrow O(R_y)$

5. WB ( write back) :-

Addition is performed in 3rd stage but its value is reflected only after 5th stage.

ADD  $R_o \leftarrow R_1 + R_2$

fetching  
PC  $\downarrow$  ADD  $R_o \leftarrow R_1 + R_2$  1cc 2cc 3cc 4cc 5cc  
 arithmetic      Destination      Source      IF      ID      EX      MEM      WB.

In ID  $\rightarrow$        $\uparrow$       we come to know  
 source & dest. registers  
 & that it is arithmetic  
 operation

It takes 5 clock cycles.

eg:- LD  $R_o \leftarrow O(R_x)$  1cc 2cc 3cc 4cc  
 Load IF ID EX MEM

It takes 4 clock cycles.

$(O+R_x)$   
 address  
 is found

