

9/12/2017 :- Prof. Banerjee (soft computing)

Threshold logic.

Fuzzy logic :- Data preprocessing

optimization :-

Hybrid Techniques. (combination of different techniques)

Supervised learning :-  
→ not good in generalizing.

unsupervised learning :-  
→ good at generalizing

→ not good in performance v/s. unperformed.

Unicoin

Soft computing (characteristics)   
AN  
Simon Haykin  
↳ chapter - 1

v R. J. Granahan  
↳ chapter - 1

{ James A. Freeman

① - ③ } 2

David SP

9/12/2017 :- Prof. Banerjee (soft computing)

Threshold logic.

Fuzzy logic :- Data preprocessing

optimization :-

Hybrid Techniques. (combination of different techniques)

Supervised learning :-  
→ not good in generalizing.

unsupervised learning :-  
→ good at generalizing

→ not good in performance v/s. unperformed.

Unicoin

Soft computing (characteristics)   
AN  
Simon Haykin  
↳ chapter - 1

v R. J. Granahan  
↳ chapter - 1

{ James A. Freeman

① - ③ } 2  
David SP

## Evolution Algorithms

→ D.E. Goldberg. (1-3 chapters)

1. M.O.E.A (Multi objective evolutionary ~~algo~~)

L.K. Deb (NSGA II)

(Chapter 1-5)

Soft computing → Hybrid Techniques.

Rough sets.

AN  
Evo algo  
fuzzy logic

### Soft computing

- (1) Basic deterministic model for human mind.
- (2) soft computing is foundation of computational intelligence in machine.
- (3) Tolerance of imprecision, Uncertainty, partial and approximate computation based on probability.

### Hard Computing vs. soft computing

- (1) Hard computing is based on precise modelling and analysing to a accurate results.
- (2) Worst worse for simple problem but bounded to NP-completeness.

P-class Problem:- Problem that is solved

NP-class Problem can solve in polynomial

### Area of soft computing

- (1) Subsequent to hard computing.
- (2)
- (3) mfractals
- (4)

Review

▼ Difference b/w Hard computing and soft computing

Distinct features of soft computing

Basic components of soft computing

- fuzzy logic
- evaluation algorithm

Biological vs. conventional

- ) speed
- robustness
- Flexibility / adaptability.

ANN  
is defined by known targets

Neuron (biological)

- ANN
- Architecture
- Training & weight / connections.
- Testing / performance
- XOR problem

Breakdown  
AND

| $x_1$ | $x_2$ | Output |
|-------|-------|--------|
| 0     | 0     | 0      |
| 0     | 1     | 0      |
| 1     | 0     | 0      |
| 1     | 1     | 1      |

DATE \_\_\_\_\_  
PAGE NO. \_\_\_\_\_

Two layer neural problem can't solve via XOR table.

Genetic Algorithms

1. encoded rep (chromosome)
2. population
3. genetic operators
4. algo.

5. Basic considerations



QUESTION

### Introduction :-

Page No.  
Date : 9/12/2017

- \* Hard computing / conventional computing
- \* Soft computing : explorative search is used (not exhaustive)  
↳ gives acceptable result/solution.
- \* Fuzzy logic : degree of membership (measured with values b/w 0 & 1).
- \* Squared Error : Difference between desired and actual result i.e.  
 $(D - A)^2$   
for 1 result set :  $(D_1 - A_1)^2$   
for 2 result set :  $(D_1 - A_1)^2 + (D_2 - A_2)^2$

why squared error?  $\rightarrow$  bcoz if small error can be further reduced  $\rightarrow$  higher error can be further amplified  
Also, squared error will take care of both +ve & -ve differences.

### Learning $\Rightarrow$

1. Teacher

2. Learning Element / learning system.

Scalability : If input increases & system can be balanced with visibility to user then, it is said to be scalable system.

### Types of Learning $\Rightarrow$

- 1. Supervised Learning
- 2. Unsupervised Learning
- 3. critic(?)

2. Hard comp works well for simple problems but bounded for NP completeness.

Explanation →

+ Ambiguous data situations or imprecise data situations → data is not precise

+ uncertainty → concerned with probability (may or may not happen)

$$f(x_0, x_1, \dots, x_n) = \dots$$

$$x_i \in [a, b]$$

for  $x_0 = a, b$  ] for 2 variables there  
 for  $x_1 = a, b$  ] and 2 nested loops  
 for  $x_2 = a, b$

But a computer can't allow many nested loops (e.g. C compiler will allow max 10 loops).

| for 'n' variables          | $f_1, f_2, f_3, \dots$                                                                       |
|----------------------------|----------------------------------------------------------------------------------------------|
| there will be $2^n$ inputs | How many functions will be possible?<br>for $2^n$ inputs, there will be $2^{2^n}$ functions. |

- a soft computing is well suited for real world problems
- + It can't give best results but it can give acceptable results.
- + conventional paradigm shift
- + If model is not present, then also soft computing can be used to generate solution

- \* Adaptability  $\rightarrow$  it is adaptive & flexible
- \* contact sensitivity

# Artificial Neural Network  $\rightarrow$

highly parallel & distributive in nature  
having natural capability for storing potential knowledge.

It is similar to human brain in 2 aspects:-  
1. knowledge is captured through learning process.

1. Neuron (Biological Network)

2. Artificial <sup>Neuron</sup> Network

3. Architecture Network

4. Training & weight / connections

5. Testing / Performance

6. XOR Problem.



Input pattern layer



Input pattern original conditions

$w_{ij}(k+1) = w_{ij}(k) + \Delta w_{ij}(k)$  Determined by learning rule  
 (in step number)  $\eta$  ( $\eta < 1$ ) adjustment  
 $\Delta w_{ij}(k) = (\alpha_i(k) - \alpha_d(k)) \cdot \text{output}_j(k)$

$w_{ij}(k)$ : current weight

| $w_{ij}$ | change or progress | weight |
|----------|--------------------|--------|
| $w_{11}$ | 0.1                | 0.1    |
| $w_{12}$ | 0.2                | 0.2    |
| $w_{13}$ | -                  | -      |
| $w_{14}$ | 0.15               | 0.15   |

→ training set

configuration matrix

|   | C    | P    |
|---|------|------|
| C | 0.11 | 0.11 |
| P | 0.11 | 0.11 |

$$\text{The cost function} = \frac{1}{n_{\text{tot}}} \cdot \text{J}(x)$$

$$\text{J}(x) = \sum_{i=1}^n (y_i - \hat{y}_i)^2$$

$$y_i \text{ with input } x_i \text{ and output } \hat{y}_i$$

### Boolean AND

Page No.  
Date : / /

| $x_1$ | $x_2$ | $f(x_1, x_2)$ |
|-------|-------|---------------|
| 0     | 0     | 0             |
| 0     | 1     | 0             |
| 1     | 0     | 0             |
| 1     | 1     | 1             |

$\sum b_i x_i + b_0 = 1.$

$b_1 = 1, b_2 = -1, b_0 = 1.$

$1 - x_1 - x_2 + 1 = 0 \Rightarrow x_1 + x_2 = 2.$

$0.5x_1 + 0.5x_2 = 1.$



"Decision boundary"  
→ can be separated by a line

### XOR



↓  
can't be solved using  
linear boundary.

### # Genetic Algorithm

1. Encoded soln (chromosome)
2. Population chromosome
3. Genetic operators
4. Algo.
5. Basic considerations



child population finally ↴ mutation



- \* Exploration → search as many new points as possible to find best sol?
- \* Exploitation → whatever best sol we have so far, we search nearby it to look for any better sol?

- \* Intensification → means exploitation
- \* Diversification → means exploration

(Follow slides provided by Sir)

16/12/17

\* Antenna

Page No. \_\_\_\_\_  
Date : / /

Omnidirectional antenna



feeding point

Directional :-

1. Dipole Antenna



feeding point : from where we put/ feed electrical wave & from where we are collecting electrical signals from antenna.

2. Yagi Uda



3. Horn Antenna



4. Parabolic reflector ( like dish TV, Take sky )



Gain factor of Antenna

$$G = \frac{4\pi A_e}{\lambda^2}$$

$$\lambda = \frac{c}{f}$$

↑  
carrier wavelength

$G=1$  for only omnidirectional antenna.

#### # Propagation methods

- Ground wave propagation ( below 2 MHz )
- Sky wave " ( b/w 2 - 30 MHz )
- Line of sight " ( e.g. light wave has its freq. is above 30 MHz )

Ground wave :-



follows curvature of earth :-

For optical wave :- LOS (line of sight comm.)  
 For radio wave :- reflection/refraction/diffraction anything may happen



### # Attenuation :-

- \* use amplifier to transmit signal appropriately.

$$0 \frac{1}{2} 0$$



Free space,  $\frac{P_r}{P_t} = \frac{(4\pi d)^2}{\lambda^2}$

$$L_{dB} = 20 \log(f) + 20 \log(d) - 147.56 \text{ dB}$$

$$G_t \frac{4\pi}{\lambda^2} \cdot A_t^{\text{eff}} \quad P_t = \frac{(4\pi d)^2}{G_t G_r \lambda^2} = \frac{(\lambda d)^2}{A_t A_r}$$

for directive antenna, we can compensate  $\downarrow$  this loss by -

$$L_{dB} = -20 \log(f) + 20 \log(d) - 10 \log(A_t A_r) + 16.9$$

$\downarrow$  compensate

## # Noise

- Thermal noise
- Intermodulation noise :-  
signals with diff. frequency are transmitted in a shared medium



so, mixing of signals will produce energy at freq. ( $f_1 + f_2$ )  
or their multiples i.e.  $5f_1, 5f_2$

- Crosstalk noise  
is unintentionally selecting  $f_2$  freq. instead of  $f_1$ .

- Impulse noise  $\rightarrow$  very random in nature



## # Multipath

original main pulse  
refracted/reflected  
detected  
scattered

Fading channel

$K = 0 \rightarrow$  Rayleigh

$K = \infty \rightarrow$  AWGN

- \* Error correction  
 \* Adaptive Equalization.

$$\# \text{ Delay} = \frac{\text{Transmission time} + \text{Propagation time}}{\text{processing time} + \text{Queuing time.}}$$

$$\text{Transmission time} = \frac{\text{Message size}}{\text{Bandwidth}}$$

$$\text{Propagation time} = \frac{\text{Distance}}{\text{Propagation speed}}$$

$$\text{Energy consumption} = P_t \times \frac{\text{No. of bits transmitted}}{\text{Bit Rate}}$$



TDM  $\rightarrow$  f is fixed

FDM  $\rightarrow$  t is fixed.

(Numericals  $\rightarrow$  photos).

17/12/2017  
CELLULAR NETWORKS



BTS (Best Transceiving station)



D = min. distance b/w centre of cells  
that use same frequency band, called co-channels.

R = Radius of a cell

d = distance b/w centres of adjacent cell.  $= \sqrt{3} R$

N = No. of cells in a repeater's pattern  
(each cell in pattern use a unique set of  
frequency bands) termed the reuse factor.

$$N = I^2 + J^2 + (I \times J) \text{ where } I, J = 0, 1, 2, \dots$$

$$\therefore N = 1, 3, 4, 7, 9, \dots$$

(Graph coloring problem)  
Page No. ?  
Date: / /



$$\frac{D}{R} = \sqrt{3} N \quad (\text{bcz } d = \sqrt{3} R)$$

$$\therefore \boxed{\frac{D}{d} = \sqrt{N}}$$

Q.  $K = 395$ ,  $N = 7$ . what is max. freq. which can be achieved  
for individual cells?

$$f_2 \approx \frac{K}{N} = \frac{395}{7} \approx 57$$

1. Adding new channels. (can add only finite no. of channels, not possible everywhere)
2. Frequency borrowing
3. cell splitting.

inter-channel interference.



$E \geq D$

$E \geq d$

\* Macro cell

|                    | Macro cell | Micro cell   |
|--------------------|------------|--------------|
| cell radius        | 1 to 20 km | 0.1 to 1 km  |
| Transmission Power | 1 to 10 W  | 0.1 to 1 W   |
| Average delay      | 1 to 10 ms | 10 to 100 ms |
| Bit Rate           | 0.3 Mbps   | 1 Mbps       |



|   |           |
|---|-----------|
| M | Mobile    |
| T | Telecommu |
| S | Switching |
| O | office.   |

For optical wave :- LOS (Line of sight communication)  
For radio wave :- reflection refraction diffraction anything may happen



### # Attenuation :-

\* use amplifier to transmit signal appropriately.

0  $\int$   $\infty$



$$\text{free space} \rightarrow \frac{P_r}{P_t} = \frac{(4\pi d)^2}{\lambda^2}$$

base

$$L_{dB} = 20 \log(f) + 20 \log(d) - 147.56 \text{ dB}$$

$$G_r \frac{4\pi}{\lambda^2} \cdot A_e \frac{P_t}{P_r} \frac{(4\pi d)^2}{(4\pi \lambda)^2} \approx \frac{(4\pi d)^2}{A_e \lambda^2} \frac{P_t}{P_r}$$

for directive antenna we can compensate  $\Downarrow$  this loss by :-  
 $L_{dB} = -20 \log(f) + 20 \log(d) - 10 \log(A_t A_r) + 16$   
 compensate

C = The loss  
here

## # Noise

- Thermal noise
- Intermodulation noise :-  
signals with diff. frequency are transmitted in a shared medium.



so, mixing of signals will produce energy at freq.  $(f_1 + f_2)$   
or their multiples in  $f_{1+2}$

- Crosstalk noise  
↳ accidentally selecting  $f_2$  freq. instead of  $f_1$ .

- Impulse noise  $\rightarrow$  very random in nature



Impulse noise has washed out these 3 bits.

## # Multipath

original main pulse

↗ reflected/reduced  
refracted/scattered  
→ n scattered.

## Fading channel

$K = 0 \rightarrow$  Rayleigh

$K = \infty \rightarrow$  AWGN

- \* Error correction
- \* Adaptive Equalization.

# Delay = Transmission time + Propagation time + processing time + Queuing time.

Transmission time =  $\frac{\text{Message size}}{\text{Bandwidth}}$

Propagation time =  $\frac{\text{Distance}}{\text{Propagation speed}}$

Energy consumption =  $P_t \times \frac{\text{No. of bits transmitted}}{\text{Bit Rate}}$



TDM  $\rightarrow f$  is fixed

FDM  $\rightarrow t$  is fixed.

(Numericals  $\rightarrow$  photos).

17/12/2017

### CELLULAR NETWORKS



BTS (Best Transceiving station)



$D$  = min. distance b/w centre of cells  
that use same frequency band, called co-channels.

$R$  = Radius of a cell

$d$  = distance b/w centres of adjacent cell.  $= \sqrt{3} R$

$N$  = No. of cells in a repeater's pattern

(each cell in pattern uses a unique set of frequency bands) termed the reuse factor).

$$N = I^2 + J^2 + (I \times J) \text{ where } I, J = 0, 1, 2, \dots$$

$$N = 1, 3, 4, 7, 9, \dots$$

$$\sum d_i$$

(B)

(Graph coloring problem)  
Page No. ?  
Date : / /

↑  
7 cell structure.

$$\frac{D}{R} = \sqrt{3} N \quad (\text{bcz } d = \sqrt{3} R)$$

$$\therefore \boxed{\frac{D}{d} = \sqrt{N}}$$

Q.  $K = 395$ ,  $N = 7$ . what is max. freq. which can be achieved  
for individual cells?

$$f_2 = \frac{K}{N} = \frac{395}{7} \approx 57$$

1. Adding new channels. ( can add only finite no. of channels, not possible everytime)
2. Frequency reusing
3. cell splitting.

inter-channel interface.



$c \rightarrow E \rightarrow D$

Freq

\* ~~macro cell~~

|                    | Macro cell | Micro cell   |
|--------------------|------------|--------------|
| Cell radius        | 1 to 20 km | 0.1 to 1 km  |
| Transmission Power | 1 to 10 W  | 0.1 to 1 mW  |
| Average delay      | 1 to 10 us | 10 to 100 us |
| Bit Rate           | 0.3 Mbps   | 1 Mbps       |





Network

1. mobile → BTS → MTS → BTS → mobile.
2. Landline → BTS → MTS → PSTN → Landline.

for local calls:-

\* one MTSD is enough.  
(from MTSD request to MTSD)



For STD / International calls :-



For information only → not for Exam:-  
Hata Model

1968  
↓  
1980.

$$L_{dB} = 69.55 + 26.16 \log f_t - 13.82 \log h_t - A(h_t) + (44.9 - 6.55)$$

\* In closed loop power control, signal power level, received signal to noise ratio, received bit rate are reqd.

### # Traffic Engineering $\Rightarrow$

L = potential subscriber or mobile units.

N = no. of simultaneous users or channel capacity.

$\lambda$  = min. rate of calls attended per unit time.

$t_h$  = min. holding time per successful call.

$$A = \lambda t_h$$

"Traffic intensity"

unit is "erlang".

$$A = \lambda t_h = \rho N$$

where  $\rho$  : is server utilization or fraction of time server is busy.

### 1. LCD : Lost calls Delayed

Blocked calls can be put in a queue waiting for a free channel is called "lost calls delayed".

### 2. LCC : Lost calls Cleared

If the user hangs up & waits some random time interval before another call attempt this is known as "lost calls cleared".

### 3. LCH : Lost calls Held

If the user repeatedly attempts calling, it is known as "lost calls held".

$$\begin{aligned} d &= \text{min. holding time per user} \\ h &= \text{The mean holding time per user} \end{aligned}$$



$\lambda \rightarrow$  Birth rate of cell

Note :  $\sum \lambda_i \Leftrightarrow \sum u_i$  ( bcz no. of channel is fixed)

Q.1 At room temperature, what is thermal noise power density?

$$N_0 = k \times T \rightarrow \text{Temperature}$$

$\downarrow$   
Boltzmann constant

$$\text{Room temp} = T = 17^\circ\text{C} = 290 \text{ K}$$

$$N_0 = 1.38 \times 10^{-23} \times 290 \text{ W/Hz}$$

$10 \log_{10} N_0 \Rightarrow \text{units (dB)}$

? Bandwidth  
 $B = 10 \text{ MHz}$

$$N_0 = KTB = 1.38 \times 10^{-23} \times 290 \times 10$$

Q2. Suppose a signal encoding technique requires that bit error rate of  $10^{-4}$ . If

$E_b/N_0 = 8.4 \text{ dB}$  for a bit error rate of  $10^{-4}$ . If the effective noise temp. is  $290 \text{ K}$  & data rate is  $2400 \text{ Bps}$ . what received signal level is reqd. to overcome thermal noise?

RHS  
 $\downarrow$  Take  $10 \log_{10}$  on both sides to convert to dB

$$\frac{E_b}{N_0} = \frac{S}{KTR} \Rightarrow 8.4 = 10 \log_{10} S + 10 \log_{10} K - 10 \log_{10} T - 10 \log_{10}$$

$$8.4 = 10 \log_{10} S - 10 \log_{10} (1.38 \times 10^{-23}) - 10 \log_{10} (290) - 10 \log_{10}$$

Wireless LAN -

1. limited by signal range.
2. ISM band  
conducting Scientific Medicine

3. 802.11 (WLAN)

1. 802.16 (WMAN)

5. 802.15 (personal area networks)  
Bluetooth

SOM - (space division multiplex.)

2. Modes of operations

→ presence of controlled module (CML) BS → AP 802.11

→ Adhoc comm<sup>n</sup> - Peer to Peer communication where there is no control module. (CML)

Applications -

CR - LAN extension  
cross building interconnection

⇒ WOLAN Requirement (high throughput  
cross use of bandwidth)

- Page No. 11  
Date
- 2. Number of nodes - may be large = 100's
  - 3. good connection to LAN background NW
  - 4. good service (Range)
  - 5. Minimum battery power consumption
  - 6. Transmission security & robustness
  - 7. Coordinated NW operation
  - 8. licence free operation (2.4 GHz)
  - 9. Cell handoff / NW roaming
    - ↳ dynamic management
    - ↳ adaptive MAC
    - ↳ address management
    - ↳ automatic addition
    - ↳ deletion
    - ↳ relocation
  - 10. choice of physical solution

(1997 - 802.11) 1 or 2 Mbps

IEEE

802.11

2000 - 802.11 b

IEEE - 802.11 - 11 Mbps

8001

802.11 g,  
OFDM, 54 Mbps.



Simple LAN architecture.



- 802.11 access control
- avoid collision.
- CSMA
- No collision detection.

⇒ 802.11 protocol Architecture -  
Two modes

1) DCF (mandatory) (mandatory)  
(distributed co-ordinate function)

↳ best effort service

↳ CSMA/CA

2) PCF (Point co-ordination function)

↳ optional

↳ base station control users to the  
medium

using polling mechanism with  
high priority.

Up to the  
users.

Three type of frame

↳ data

↳ control

↳ Management (provide  
services,  
(QoS) priority)

Quality of service.

containing location info.  
also can



802.11 sender

- ⇒ 6) if sends the channel if it is idle. atleast for (DIFS) then transmit the entire frame (no co)  
(with contention service)

- 7) if sends the channel busy then start random back off time while channel transmit when time expires, if no ACK, increas random back off interval repeat 2.

⇒ 802.11 receiver

- if frame received OK return ACK after SIFS.



⇒ Collision Avoidance.

RTS - CTS.



⇒ The use of virtual channel sensing using CSMA/CA





↳ when single frame divided into multiple frames



5.



⇒

802.11 MAC frame fragment



SC = sequence control

FC = frame control

D/I = Duration / connection ID

(destination Mac add) Add1 = MAC address of wireless host or AP for receiving this frame

dot11 frame

1. Source Mac add  
Add2 = MAC address of wireless host or AP transmitting this frame

Add 3 = Mac address of router interface to which access point is attached.

Add 4 = used only in Adhoc mode.

⇒ Selectable = automatic selection of see how many ARQs

802.16

frame - format

a) data frame SAW - 16  
b) control. OPSK

(band width request frame)

|   |   |   |   |      |   |    |        |                |      |     |
|---|---|---|---|------|---|----|--------|----------------|------|-----|
| 1 | 1 | 6 | 1 | 1    | 2 | 11 | 16     | 8              | 11   | 4   |
| 0 | 1 | 5 | 0 | type | c | ER | length | content header | data | CRC |

|               |   |   |      |      |            |        |        |         |      |     |
|---------------|---|---|------|------|------------|--------|--------|---------|------|-----|
| data frame    | 1 | 0 | type | type | connection | header | length | content | data | CRC |
| control frame | 1 | 1 | 6    | 11   | 16         | 8      |        |         |      |     |

connection links the end points through the system

bits

- EO - encrypted payload
- CS - check sum input
- CK - encryption technique.

⇒ Service classes

- Constant bit rate service (voice)
- Real time variable bit rate service (video)
- Non real time variable bit rate service (high quality data)
- Best effort service (data, http etc)

802.15

PAN (personal area Netw.)



Solution → use extra register.

value of  $R_1$  will be available after 5<sup>th</sup> cycle.

### \* Data Hazard

Ex :-

|      |          |       |          | 1 | 2  | 3  | 4        | 5     | 6     |
|------|----------|-------|----------|---|----|----|----------|-------|-------|
| DADD | $R_1$    | $R_2$ | $R_3$    | - | IF | ID | EX       | MEM   | WB    |
| DSUB | $R_4$    | $R_1$ | $R_5$    |   |    |    | IF stall | stall | stall |
| AND  | $R_6$    | $R_1$ | $R_7$    |   |    |    |          |       | ID    |
| OR   | $R_8$    | $R_1$ | $R_9$    |   |    |    |          |       |       |
| XOR  | $R_{10}$ | $R_1$ | $R_{11}$ |   |    |    |          |       |       |

Data Hazard is that value of  $R_1$  is available after 5<sup>th</sup> cycle but in subtract operation it uses "old" value of  $R_1$  in the 4<sup>th</sup> cycle to execute subtract instruction. This is incorrect.  
 $\therefore$  It is a data Hazard.

### Solution :- Data Forwarding



Speed-up pipelining  $\geq$  A.Ex. Time unpipelined

I Kungpahmed x CPZ pahmed x CCF wahpahmed  
I Kungpahmed CPZ pahmed CCF pahmed

$$= \frac{CPI_{\text{adjusted}}}{CPI_{\text{unadjusted}}} = \frac{\text{positive defl.}}{1 + (\%)}$$

$\uparrow$        $\uparrow$

with      with

invoiced    invoiced

↳ called "hazards"

Chopping with karashi > 1

## Type of Hazards

#### Structural Hazards

Date Hand

### Control Hazard

$$\text{speed of pipetting with} \quad = \quad \frac{\text{pipeline depth}}{1 + \text{stalls due to seconds}}$$

8cc → 11cc  
without directional hazard with directional hazard

Ansikha

$$\text{CPU time} = \sum_{i=1}^n T_{C_i} \times CPI_i \times CCT$$

$$n = 3$$

$$\begin{aligned} 1 &\rightarrow 80\% & CPI_1 &= 3 \\ 2 &\rightarrow 30\% & CPI_2 &= 4 \\ 3 &\rightarrow 20\% & CPI_3 &= 5 \end{aligned}$$

$$\therefore \text{CPU time} = IC \times CPI_{\text{overall}} \times CCT$$

$$= 10^9 \times CPI_{\text{overall}} \times \frac{1}{2.6Hr}$$

$$= 10^9 \times CPI_{\text{overall}} \times 0.5 \times 10^{-9}$$

$$\begin{aligned} CPI_{\text{overall}} &= 0.5 \times 3 + 0.3 \times 4 + 0.2 \times 5 \\ &= 3.7 \end{aligned}$$

$$\text{CPU time} = 10^9 \times 3.7 \times 0.5 \times 10^{-9}$$

$$\frac{\text{CPU Time old}}{\text{CPU Time new}} \times \frac{IC_{old}}{IC_{new}} \times \frac{CPI_{old}}{CPI_{new}} \times \frac{CCT_{old}}{CCT_{new}}$$

(of same  
program)

e.g.

Raw material

$$\begin{aligned} 5 \times 10^9 &= 500 & CPI &= \frac{500}{100} = 5 \\ \text{hrs} & & & \\ \text{Pipeline factor} &= 10^4 & CPI_{\text{new}} - \frac{10^4}{100} &= 1.04 \approx 1 \end{aligned}$$

$$\frac{P_{\text{New}}}{P_{\text{Old}}} = \frac{CPU \text{ Time Old}}{CPU \text{ Time New}} \times \frac{\frac{f_{\text{Old}}}{f_{\text{New}}}}{\frac{f_{\text{Old}}}{f_{\text{New}}} + \frac{f_{\text{New}}}{f_{\text{Old}}}}$$

2

5

Power  $\downarrow$  Energy  
watt Joulies.

$$P = DP + LP$$

$\downarrow$  Dynamic Power       $\hookrightarrow$  Leakage Power

$DP \propto$  activity  $\times$  capacitance  $\times$  (voltage) $^2$   $\times$  frequency

$$LP = f(V, I) \rightarrow LP = 25\% \times \text{Total Power}$$

$\downarrow$  I current       $\downarrow$  voltage

$LP \propto V$ .

Ex :- A program runs for 3 seconds on 100 watt processor.

Energy consumption

$$E = P \times \frac{T}{\text{Power}}$$

$$= 100 \times 3 = 300 \text{ Joules.}$$

How to reduce  $DP$ ,  $LP$  & Energy?

There are 2 techniques:-

1. DFS (Dynamic Frequency Scaling).

### Load store Architecture

$$x = a + b$$

Load  $\hookrightarrow$  LD       $R_1 \quad O(R_x)$   $\leftarrow$  from this address  
Load  $\hookrightarrow$  LD       $R_2 \quad O(R_y)$   
Add  $\rightarrow$  ADD       $R_0 \quad R_1 \quad R_2$   
store  $\rightarrow$  SD       $R_0 \quad O(R_z)$

Car Analogy  
stage1    stage2  
 $C_1$  — — — —

$C_2$      $C_1$  — — — —

$C_3$      $C_2$      $C_1$  — — — —

Latency for each car = 5  
but throughput is increasing  
i.e. more no. of cars (instructions)  
per second.

MIPS

$\hookrightarrow$  Multiprocess or without interlocked pipeline stages

- Five stages :-

- | IF (Instruction fetch)
- | ID (Instruction Decode)
- | EX (Execute)
- | MEM (Memory)
- | WB (Write Back)

\* Processor Performance Equation

$$\text{CPU time} = (\text{CPU clock cycles of a program}) \times (\text{clock cycle time})$$

Q.3:

$$\text{CPI} = \frac{\text{CPU clock cycle of a program}}{\text{Instruction count (I.C.)}}$$

↑  
clocks per  
instruction

Mrs

$$\text{CPU clock cycle} = \text{I.C.} \times \text{CPI}$$

of a program

$$\therefore \boxed{\text{CPU Time} = \text{I.C.} \times \text{CPI} \times \text{CCT}} \text{ ns.}$$



$$T \propto \frac{1}{F}$$

# Q.4: Suppose a ~~processor~~ takes 1 billion instructions to execute on a processor running at 2 GHz and 50% of instructions whose CPI is 3 clock cycles per instruction and 30% of instruction is 4 clock cycles per instruction 20% of instruction is 5 clock cycles per instruction what is the CPU Time?

## ① DF Scaling

Example



$$DP \propto \sqrt{f}$$

$$\begin{aligned} T &= 100 \text{ secs.} \\ f &= 0.5 \text{ GHz} \\ T &= 200 \text{ secs.} \end{aligned}$$
$$E = P \times t = 100 \times 100 = 10,000$$

$$\begin{aligned} E_{\text{new}} &= 12500 \\ E_{\text{old}} &= 10000 \\ E_{\text{new}} &= 1.25 \times E_{\text{old}} \end{aligned}$$

## DVFS



$$\text{CPU Time} = \text{IC} \times \text{CPI} \times \text{CCT}$$

IC  $\rightarrow$  no. of instructions in a program.

Lower IC, Lower CPU Time.

IC  $\rightarrow$  job of compiler or user to minimize the IC.

CPI  $\rightarrow$  no. of clock cycles per instruction.  
this is minimized by Computer Architecture

CCT  $\rightarrow$  by H/W Technology.

$$\begin{aligned}\text{CPU Time} &= \frac{\text{Instructions}}{\text{program}} \times \frac{\text{clock cycles}}{\text{Instruction}} \times \frac{\text{seconds}}{\text{clock cycle}} \\ &= \frac{\text{seconds}}{\text{program}}\end{aligned}$$

# Different types of Instructions:

type - i

i+1

i+2

⋮

n

we have 'n' types of instructions

$$\text{CPU clock cycles} = \sum_{i=1}^n I.C_i \times \text{CPI}_i$$

$$\text{CPU Time} = \sum_{i=1}^n I.C_i \times \text{CPI}_i \times \text{CCT}$$

$$\text{overall CPI} = \frac{\sum I.C_i \times \text{CPI}_i}{I.C} = \sum_{i=1}^n \frac{I.C_i \times \text{CPI}_i}{I.C}$$

Example:

Program X

100 W

$$T = \frac{P}{100 \text{ sec.}} = \frac{P}{t}$$

$$E_{dd} = 100 \times 100$$

$$= 10000$$



$$D.P. \downarrow \propto V^2 \propto f \downarrow$$

$$\begin{array}{l} \text{if } f = 1 \text{ GHz} \\ \text{we change frequency,} \\ \quad \downarrow \\ \quad 0.5 \text{ GHz} \end{array}$$

$$\begin{array}{l} D.P. = 25 \text{ W} \\ \downarrow \\ 37.5 \text{ watts.} \end{array}$$

(No change in E.P.)

$$T = 100 \text{ sec.}$$

$$\begin{array}{l} \downarrow \\ 200 \text{ sec.} \\ P = 37.5 + 25 = 62.5 \end{array}$$

$$E_{max} = P \times t = 62.5 \times 200 = 12500$$

$$E_{new} = 12500$$

$$E_{old} = 10000$$

$$\Rightarrow [E_{new} = 1.25 \times E_{old}]$$

## 2. DVFS (Dynamic Voltage Frequency Scaling)

$$P = \frac{V}{100 \text{ mV}} \times T = \frac{0.9 \sqrt{2}}{100 \text{ sec}}$$



$$0.9 \sqrt{2} = \frac{1}{100 \mu\text{s}} \times 100 \mu\text{s}$$

If voltage decreases, clock cycle-time increases.

$$P = \frac{V}{100 \text{ mV}} \times T = \frac{0.9 \sqrt{2}}{100 \text{ sec}}$$

$$E_{dd} = P \times T = 100 \times 100 = 10000$$

$$75W \rightarrow 25W \rightarrow 25 \times 0.9 = 22.5W$$

$$\begin{aligned} V &= 0.9 \\ f &= 1.1 \text{ ns} \\ &= \frac{1}{1.1 \times 10^9} = 0.9 \times 10^9 \text{ Hz} = 0.9 \text{ GHz} \end{aligned}$$

$$75 \times (0.9)^2 \times 0.9 = P_{new} = 77.5W \approx 77$$

$$2 \times 55W = E_{new} = P \times T = 77 \times 110 = 8470.$$

4. Memory operation :- Loading & storing takes place.

LD  $R_0 \leftarrow O(R_x)$

SD  $R_0 \leftarrow O(R_y)$

5. WB (write back) :-

Addition is performed in 3rd stage but its value is reflected only after 5th stage.

ADD  $R_0 = R_1 + R_2$

fetch using  
PC  
 $\downarrow$  ADD  $R_0 = R_1 + R_2$   
 arithmetic      Destination      Source  
 In ID  $\rightarrow$        $\downarrow$        $\downarrow$

1cc    2cc    3cc    4cc    5cc  
 IF    ID    EX    MEM    WB.

we come to know  
 source & dest. registers  
 & that it is arithmetic  
 operation

It takes 5 clock cycles.

eg :- LD  $R_0 \leftarrow O(R_x)$     1cc    2cc    3cc    4cc  
 Load                                    IF    ID    EX    MEM

$\downarrow$   
 It takes 4 clock cycles.

$(O+R_x)$   
 address  
 is found

LD  $R_0 \leftarrow O(R_x)$

$$0 + 108 \\ = 108.$$



SD R<sub>y</sub> O(R<sub>y</sub>) → also takes +cc's.  
Store

Ex :- BNE Z  
↓

Branch instruction with Jump or No Jump

if(x == 0) go to L

L: - - -

↑  
we require 2 clock cycles to  
detect whether it is a  
branch instruction with jump or  
no jump.

1cc 2cc 3cc 4cc 5cc  
IP ID EX MEM WR



3. "Buffer"

4. called as  
Pipeline  
Registers.

Buffer is required bcoz if suppose  
instruction come out of IF stage &  
ID is busy, then it can be  
stored in buffer.

[MR] - - - .

↑  
at this clock cycle  
4 diff. instructions  
are at 4 diff. stages.  
This is called ILP.

Speedup · pipelining = A · Ex. Time unpipelining

A · Ex. Time pipelining .

$$= \frac{I/C \text{ unpipelined}}{I/C \text{ pipelined}} \times \frac{CPI \text{ unpipelined}}{CPI \text{ pipelined}} \times \frac{CCT \text{ unpipelined}}{CCT \text{ pipelined}}$$

$$= \frac{CPI_{\text{unpipelined}}}{CPI_{\text{pipelined}} + ?} = \frac{CPI_{\text{unpipelined}}}{1 + ?} \approx \text{pipeline depth}$$

↑                              ↑  
with overhead            with overhead  
↳ called "hazards"

$CPI_{\text{pipeline with hazards}} > 1$

# Types of Hazards →

- Structural Hazards
- Data Hazard
- Control Hazard.

$$\text{speed up pipelining with Hazards} = \frac{\text{pipeline depth}}{1 + \text{stalls due to Hazards}}$$

(?)\*

| #              | Structural Hazards |      |    |    | IF & MEM opr. performed with same register |    |  |
|----------------|--------------------|------|----|----|--------------------------------------------|----|--|
| i <sub>1</sub> | -                  | (IF) | IF | IF | IF                                         | WB |  |
| i <sub>2</sub> |                    | ID   | ID | EX | MEM                                        | WB |  |
| i <sub>3</sub> |                    | IF   | ID | EX | MEM                                        | WB |  |
| i <sub>4</sub> |                    | IF   | ID | EX | IF                                         |    |  |

not allowed to use same register for 2 operations in same clock cycle.  
so, it is stalled.



$$E_{old} = 10000$$

$$E_{new} = 8470$$

$$E_{new} = 0.84 \times E_{old}$$

07/01/2018

### Instruction Level Parallelism (ILP)

- Pipelining (best technique to achieve ILP)
  - ↓

Allow instructions to execute in overlap manner.

Instructions

Ex:

ADD       $R_0 \leftarrow R_1 + R_2$

RISC  
|

$R_0, R_1, R_2$  are registers.  
on which ADD operation is performed.  
sum of  $R_1$  &  $R_2$  is stored in  $R_0$ .

Full form:-

Instructions

Two ways

32-bit Instruction  $\Rightarrow$  word  
64-bit Instruction  $\Rightarrow$  double

called example:-

ADD  $R_0 \leftarrow R_1 + R_2$

DADD  $R_0 \leftarrow R_1 + R_2$

(double add)

direct address, indirect address, register address  $\rightarrow$  refer to these diff. addressing modes from book.

| clock cycle |     |     |     |     |     |        |
|-------------|-----|-----|-----|-----|-----|--------|
| 1cc         | 2cc | 3cc | 4cc | 5cc | 6cc | 7cc    |
| PC = i      | IF  | ID  | Ex  | MEM | WB  |        |
| PC=PC+4 i+1 |     | IF  | ID  | EX  | MEM | WB     |
| PC=PC+4 i+2 |     |     | IF  | ID  | EX  | MEM WB |
| PC=PC+4 i+3 |     |     |     |     |     |        |

In every clock cycle, one instruction enters a pipeline.

In every clock cycle, one instruction will get completed.

\* The objective of pipeline is  $CPI = 1$

Instruction fetch:

1. IF: Job is to fetch instruction by using program counter (PC).

$$PC = PC + 4$$

↑  
4 bytes i.e. 32 bits (1 word).

2. ID (Instruction Decode)

~~ex-~~ ADD R<sub>0</sub> R<sub>1</sub> R<sub>2</sub>



ID stage checks whether values are present or not.

It decodes the instruction that what type of instruction it is (e.g. arithmetic instruction etc.)

3. Ex (Execution) : performing integer operations or floating point operations.

ADD R<sub>0</sub> R<sub>1</sub> R<sub>2</sub>



ALU (Arithmetic Logical Unit)

FU

## 2. D VFS (Dynamic Voltage Frequency Scaling)

$$P = \frac{V}{100 \omega} T = \frac{V}{100 \text{ sec}}$$



Ins.

$$0.9V = \frac{V}{1.1\text{ms}}$$

If voltage decreases, clock cycle-time increases.

$$P = \frac{V}{100 \omega} T = \frac{V}{100 \text{ sec}}$$

110 sec.

$$E_{old} = P \times T = 100 \times 100 = 10000$$

$$75W \rightarrow 25W \quad 25 \times 0.9 = 22.5W$$

$$\begin{aligned} V &= 0.9 \\ f &= 1.1 \text{ ns} \\ &= \frac{1}{1.1 \times 10^9} = 0.9 \times 10^9 \text{ Hz} = 0.9 \text{ GHz} \end{aligned}$$

$$75 \times (0.9)^2 \times 0.9 = P_{new}$$

$$= 47.5W \approx 47$$

$$E_{new} = P \times t = 47.5 \times 110 = 8470.$$

4. Memory operation :- Loading & storing takes place.

LD  $R_o \leftarrow O(R_x)$

SD  $R_o \leftarrow O(R_y)$

5. WB (Write Back) :-

Addition is performed in 3rd stage but its value is reflected only after 5th stage.

ADD  $R_o \leftarrow R_1 + R_2$

fetching using  
PC  $\downarrow$  ADD  $R_o \leftarrow R_1 + R_2$  1cc 2cc 3cc 4cc 5cc  
 arithmetic      Destination      Source      JF      ID      EX      MEM      WB.

In ID  $\rightarrow$   $\uparrow$   $\uparrow$   
 we come to know  
 source & dest. registers  
 & that it is arithmetic  
 operation

It takes 5 clock cycles.

eg:- LD  $R_o \leftarrow O(R_x)$  1cc 2cc 3cc 4cc  
 Load IF ID Ex MEM  
 $\downarrow$

It takes 4 clock cycles.

$(O+R_x)$   
 address  
 is found

