

## Computer Architecture

Regd.  
Date 27/05/24  
Page

Q1 What are the diff. parameters used to measure CPU performance?

→ To estimate the CPU performance, the measure most generally most imp. is execution time,  $T$ . because, Performance =  $\frac{1}{\text{Execution time}}$ .

So, if Execution time  $\uparrow$  CPU performance  $\downarrow$ .

- 3 parameters → (i) Speedup
- (ii) Efficiency
- (iii) Throughput

Speed up → Effect of improvement expressed in terms of speedup,  $S$

→ Ratio of execution time without improvement ( $T_w$ ) to the execution time with improvement ( $T_u$ )

$$S = T_w / T_u$$

Speed-up as a direct percent can be represented as:

$$S = ((T_w - T_u) / T_w) \times 100$$

Efficiency → Ratio of Speed-up to the no. of processor used.

$$E = S/p$$

Throughput → measure of no. of computation over a unit time.  $W = \frac{n}{t}$ .

Q2 "Instruction execution throughput  $\uparrow$  in proportion with the no. of pipeline stages." Is it true? Justify.

→ Pipeline refers to the technique in which a given task is divided into a no. of subtasks that need to be performed in sequence.

→ Each subtask is performed by a given functional unit.

→ The units are connected in a serial fashion & all of them operate simultaneously.



Regist  
Date \_\_\_\_\_  
Page \_\_\_\_\_  
27/05/24

measure CPU

measure exec-  
time. T.

i.

in terms of

improvement  
improvement( $T_1$ )

resulted as:-

processor

over a

tion with  
fy.

given task

need to be

nal unit

f all



→ The use of pipelining improves the performance compared to the traditional sequential execution of tasks.

→ Considering the execution of  $m$  tasks (instructions) using  $n$ -stages (units) pipeline. We assume that the unit time  $T = t$  units.

Then the throughput  $U(n)$  is,  $m/(n+m-1)t$  i.e., no. of tasks executed per unit time.

So, if we ↑ the no. of stages in a pipeline ↑ also ↑ the throughput of the pipeline.

### (3) Discuss data hazards.

- (i) If data hazards is any condition in which the source or the destination operands of an instruction are not available @ the time expected in the pipeline
- (ii) As a result of which some operation has to be delayed & pipeline ~~hazard stalls~~. Whenever there are 2 instr. one of which depends on the data obtained from other.

$$A = 3 + A$$

$$B = A * 4$$

Types:- (i) RAW (ii) WAR (iii) WAW.

#### Read After Write :- (True dependency)

This occurs when an instr. depends on the result of a prev. instr. that has not yet completed.

For ex:- if instr. B needs to read a value that instr. A is currently writing.

#### Write After Read (WAR) (anti-dependency)

This happens when an instr. needs to write a value after a prev. instr. has read it, but the read occurs before the write operation completes.

Write after Write (WAW) (Output dependency)  
This happens when 2 instruc'. write to the same location, and the 2nd write operation occurs before the 1st write completes.  
example with an ox.

- Q) Briefly describe cache coherence problem with an ex.  
Suggest one software protocol for this.

→ Memory coherence properly can manage the memories of a multiprocessor system so that no data is lost or overwritten before the data is transferred from a cache to the target memory.

→ Memory caching is eff. bcz. most programs access the same data or instructions over & over.

→ Cache coherence is a concept in CA that ensures that multiple copies of data in diff. caches remain consistent.

→ It is critical for the correctness & performance of multi-core processors & multiprocessor systems.

→ It ensures that all processors have a consistent view of memory, which is essential for the correct execution of parallel programs.

(5) a) Define pipeline hazard? Discuss its types.

→ Pipeline hazard:

- 5) a) Define pipeline hazard :-

  - Also known as pipeline conflicts or stalls, occur in a pipelined processor architecture when the next instruction can't execute in the following clock cycle.
  - These hazards can cause delays in the instruction pipeline, reducing the overall performance and efficiency of the CPU.

Types: → Struc  
→ Obj  
→ Proc

## Structural Hints

$\rightarrow$  Sr. hazards & hardware reuse

bcz. the ha  
Operations ne

Example :-

If a process  
instruc. fl.  
may occur  
access n

Dala Ha

→ Arise in  
close to

incorrect

→ Types :-

depe  
p. 1

~~not~~ AD

Su

2. WF

~~it~~ to it.

三

6. 3. 14

to

100

Types:-

- Structural Hazards
- Data Hazards
- Control Hazards.

Structural Hazards :-

→ Str. hazards occur when two or more instr. req. the same hardware resource at the same time. This conflict arises bcz. the hardware can't support all the concurrent operations needed by the pipeline.

→ Example:-

If a processor has a single memory unit for both instruc. fetch & data read/write, a str. hazard may occur if an instr. fetch & a data memory access need to happen simultaneously.

Data Hazards :-

→ Arise when instr. may exhibit data dependencies are close together in the pipeline. This dependency can cause incorrect results or the use of stale data.

→ Types:-

1. RAW (Read After Write): Occurs when an instruction depends on the result of a prev. instruc. that has not yet completed.

ADD R1, R2, R3 //instruc. 1: R1 = R2 + R3  
SUB R4, R1, RS //instruc. 2: R4 = R1 - RS (yet writing)

2. WAR (Write After Read): Occurs when an instruc. writes to a destination before a prev. instr. has read from it.

MOV R1, R4 //instr. 1: R4 = R1  
ADD R4, R2, R3 //instr. 2: R4 = R2 + R3 (R4 read before written)

3. WAW (Write After Write): Occurs when 2 instr. write to the same destination in a diff. order than programmed.

ADD R1, R2, R3 //Instr. 1: R1 = R2 + R3  
ADD R1, R4 //Instr. 2: R1 = R4 (instr. 1 written last)

Control Hazards :-

Control Hazards occur due to the pipelining of branch instructions, it may fetch the wrong set of instr. if the branch decision is not yet resolved.

Example :-

BEQ R1, R2, Label // Branch if  $R1 = R2$  to label  
 ADD R3, R4, R5 // This instr. might be fetched before the branch resolves.

(Q) What is SPEC rating?  
 $\Rightarrow$  SPEC (Standard Performance Corporation)

(Q) Use Bernstein's conditions for determining the max. parallelism in the following sequence of instr.

$$A = B \times C$$

$$B = D + E$$

$$C = A + B$$

$$E = F - D$$

$$I_1 : A = B \times C$$

$$I_2 : B = D + E$$

$$I_3 : C = A + B$$

$$I_4 : E = F - D$$

Now, read set and write set of  $I_1, I_2, I_3, I_4$  are as follows:-

$$R1 = \{B, C\}$$

$$W1 = \{A\}$$

$$R2 = \{D, E\}$$

$$W2 = \{B\}$$

$$R3 = \{A, B\}$$

$$W3 = \{C\}$$

$$R4 = \{F, D\}$$

$$W4 = \{E\}$$

Now, let's find out whether  $I_1$  and  $I_2$  are parallel or not:-

$$R1 \cap W2 \neq$$

$$R2 \cap W1 \neq$$

$$W1 \cap W2 =$$

Similarly,

$$R1 \cap W3 \neq$$

$$R3 \cap W1 \neq$$

$$W1 \cap W3 =$$

Hence II a

Similarly

$$R1 \cap W1$$

$$R4 \cap W1$$

$$W1 \cap W1$$

For  $I_2$

$$R2 \cap W1$$

$$R3 \cap W1$$

$$W2 \cap W1$$

For  $I_2$

$$R2 \cap W1$$

$$R4 \cap W1$$

$$W2 \cap W1$$

For  $I_1$

$$R3 \cap W1$$

$$R4 \cap W1$$

$$W3 \cap W1$$

$$R_1 \cap W_2 = \emptyset$$

$$R_2 \cap W_1 = \emptyset$$

$$W_1 \cap W_2 = \emptyset$$

That means  $I_1$  and  $I_2$  are not independent of each other.

Similarly, for  $I_1 \parallel I_3$ ,

$$R_1 \cap W_3 = \emptyset$$

$$R_3 \cap W_1 = \emptyset$$

$$W_1 \cap W_3 = \emptyset$$

Hence  $I_1$  and  $I_3$  are not independent of each other.

Similarly for  $I_1 \parallel I_4$ ,

$$R_1 \cap W_4 = \emptyset$$

$$R_4 \cap W_1 = \emptyset$$

$$W_1 \cap W_4 = \emptyset$$

Hence  $I_1$  and  $I_4$  are independent of each other.

For  $I_2 \parallel I_3$ ,

$$R_2 \cap W_3 = \emptyset$$

$$R_3 \cap W_2 \neq \emptyset$$

$$W_2 \cap W_3 = \emptyset$$

Hence,  $I_2$  and  $I_3$  are not independent of each other.

For  $I_2 \parallel I_4$ ,

$$R_2 \cap W_4 \neq \emptyset$$

$$R_4 \cap W_2 = \emptyset$$

$$W_2 \cap W_4 = \emptyset$$

Hence,  $I_2$  and  $I_4$  are not independent of each other.

For  $I_3 \parallel I_4$ ,

$$R_3 \cap W_4 = \emptyset$$

$$R_4 \cap W_3 = \emptyset$$

$$W_3 \cap W_4 = \emptyset$$

Hence,  $I_3$  and  $I_4$  are independent of each other.



⑦ Show that the max. Speedup of a pipeline is eq. to its stages.

→ Suppose:-  $k = \text{no. of stages in pipeline}$ .

$n = \text{no. of processes to be execute}$ .

$\tau = \text{time delay for each stage of pipeline}$ .  
and  $S = \text{Speed-up}$

then,

$S = \frac{\text{Time req. for non-pipeline process}}{\text{Time req. for pipeline process}}$

$$= \frac{n \cdot k \cdot \tau}{(k + (n - 1))\tau} = \frac{n \cdot k}{k + (n - 1)}$$

Max. Speed-up is,  $S_k \Rightarrow k$  when  $n \gg k$ .

- The max. speed-up is never fully achievable bcz. of data-dependencies b/w instruc., interstage prog. branches, etc. So, many pipeline cycles may be wasted on a waiting state caused by out-of sequence instr. executions.

⑧ Draw pipeline execution diagram during the execution of the following instructions:

MUL R1, R2, R3

ADD R2, R3, R4

INC R4

SUB R6, R3, R7

Find out the delay in pipeline execution due to data dependency of the above instructions.

⑨ Above pipeline technique is a 5 Stage pipeline.  
fetch, decode, read, execution, write back.



operation of the instructions are,

$$R1 \leftarrow R2 \cdot R3$$

$$R2 \leftarrow R3 + R4$$

$$R4 \leftarrow R4 + 1$$

$$R6 \leftarrow R3 - R7$$

So, No data hazard will occur in the pipeline.  
Instructions are not dependent to each other. So,  
normal pipeline execution will occur.

clock

1 → MUL R1, R2, R3    ADD R2, R3, R4    INC R4    SUB R6, R3, R7

2 → MUL R1, R2, R3    ADD R2, R3, R4    INC R4    SUB R6, R3, R7

3 → MUL R1, R2, R3    ADD R2, R3, R4    INC R4    SUB R6, R3, R7

4 → MUL R1, R2, R3    ADD R2, R3, R4    INC R4    SUB R6, R3, R7

5 → MUL R1, R2, R3    ADD R2, R3, R4    INC R4    SUB R6, R3, R7

Q) How reservation table helps to study the performance of pipeline?

i) There are 2 types of pipeline - Static and Dynamic.

Static → can perform one function at a time.

Dynamic → can perform more than one function at a time.

ii) A pipeline reservation table shows when stages of a pipeline are in use for a particular function. Each stage is represented by a row in reservation table. Each row of the reserv. table is in turn broken into columns, one per clock cycle.



(iii) When scheduling a static pipeline, only collisions b/w diff. input data for a particular func. had to be avoided.

With a dynamic pipeline, it is possible for diff. input data ref. diff. func. to be present in the pipeline at the same time.  
∴ collisions b/w these data must be considered as well.

⑩ What is cache coherence problem? Describe one method to remove this problem and its limitations.

→ Cache coherence problem arises in multiprocessor system where each processor has its own cache memory. When one processor updates a memory location, other processors' caches might still have outdated copies, leading to inconsistency.  
Resolving method → MESI protocol.

Modified, Exclusive, Shared, Invalid.

In this protocol, each cache line is marked with one of these states, indicating the status of data in that cache line; it means

When a processor wants to modify a cache line, it must gain exclusive access, ensuring no other processor has a copy. Once done, it updates the data & marks it as Modified.

Limitations of MESI:

It requires significant overhead to track & manage cache coherence, which can affect overall system performance.

⑪ a) Write down  
→ Amdahl's law  
improvement  
in a system.

S → Theoretical  
P → proper  
N → NO.

b) What is used to handle  
→ Branch history  
processors  
on branch  
incorrect  
pipeline  
correct  
process  
Method

i. Branch

ii. Delay

Both  
impa

11) a) Write down Amdahl's law of parallel processing.  
→ Amdahl's law is a formula used to find the max. improvement possible by improving a particular part of a system.  $S = \frac{P}{(1-P) + \frac{P}{N}}$

$S$  → Theoretical speedup of the execution of the task.  
 $P$  → proportion of the program that can be parallelized.  
 $N$  → No. of processors.

b) What is branch hazard? Briefly discuss 2 methods to handle branch hazards.  
→ Branch hazard (control hazards) - occur in pipelined processors when the pipeline makes the wrong decision on branch prediction, leading to no fetching of incorrect instructions. This can cause delays as the pipeline has to discard these instructions & fetch the correct ones, thus reducing the efficiency of the processor.

Methods to handle Branch Hazards:

1. Branch prediction → Static branch prediction - "backward branches are taken" (common in loops).  
→ Dynamic Branch prediction - 2-bit predictor or more advanced techniques.
2. Delayed Branching → Rearranges the instruction seq. so that the instr. following the branch is not affected by the branch decision.

Both methods aim to minimize the performance impact of branch hazards.

be one method  
tations.  
processor syst.  
memory.  
location,  
outdated  
valid.

bed with one  
data in that

cache line,  
ing no other  
updates tho

f manage  
all system

● ● ○ ○

RECOMMENDED  
SCANDONIA MATCHUR

2024.6.4 11:14

- (12) What is the drawback of direct mapped cache?  
 How is it resolved in set associative cache?  
 → A drawback of direct-mapped cache is the issue of conflict misses.

This occurs when multiple memory addresses map to the same cache line, causing the cache to constantly evict & replace the data, even if there is available space in other parts of the cache.

#### Resolution:-

Set associative cache helps to resolve this problem by allowing each memory address to map to multiple possible cache lines within a "Set", rather than a single cache line.

How it works:-

- ① Multiple ways per set-
- ② Flexible Placement
- ③ Reduced conflict misses

- (13) Explain the main factors that can influence the performance of interconnection networks.

- i) Topology                          iv) Switching Technique  
 ii) Routing Algorithm                v) Bandwidth  
 iii) Flow control Mechanism      vi) Latency  
 vii) Scalability                      viii) Fault Tolerance  
 ix) Traffic patterns                x) Implementation factors.

#### 1. Topology

• Direct VS. Indirect → Direct topologies (e.g. meshes, tori) connect nodes directly, while indirect topologies (e.g. crossbars) use intermediate nodes or switches.

Deg  
impr

2. Rou

• De  
fix  
le  
A

3. La

• Pro  
sol  
me

• Su  
for

• Qu  
de

(14) D

Ex

→ Ve

a

per

mu

• Degree of connectivity - The no. of connections per node impacts bandwidth & fault tolerance.

### 2. Routing Algorithm:-

- Deterministic vs. Adaptive - Deterministic routing uses fixed paths, which simplifies implementation but can lead to congestion.
- Adaptive routing can dynamically change the paths based on network conditions, improving performance but ↑ complexity.

- Minimal vs. Non-minimal: Minimal routing uses the shortest path, which minimizes latency, while non-minimal routing can balance load better, potentially avoiding congestion at the cost of longer paths.

### 3. Latency:-

- Propagation delay → Time taken for a signal to travel from source to destination, influenced by physical distance  $L$  & medium.
- Switching delay → Time taken by switches to process & forward data.
- Queuing delay → Time data spends waiting in buffers due to congestion.

(14) Define vector chaining? How can it speed up the processing?

Explain with suitable ex.

→ Vector Chaining, also known as <sup>chaining</sup> operations, is a technique used in vector processors to enhance performance by efficiently overlapping the execution of multiple vector instructions.

Benefits of vector chaining →

1. Reduced latency → time req. to complete a seq. of operations is reduced.
2. Improved Resource utilization → reduced idle times b/w operations.
3. Increased Throughput → overall rate of computation ↑.

Ex. of vector chaining :

1.  $C = A + B$
  2.  $D = C \times E$
- without vector chaining :
1. Perform the addition  $C = A + B$ .
  2. Wait for the entire result of  $C$  to be available.
  3. Perform the multiplication  $D = C \times E$ .

Illustration:-

Assume vectors A, B and E each have 4 elements.

- $A = [a_1, a_2, a_3, a_4]$
- $B = [b_1, b_2, b_3, b_4]$
- $E = [e_1, e_2, e_3, e_4]$

without chaining :-

1. Compute  $C = [a_1+b_1, a_2+b_2, a_3+b_3, a_4+b_4]$ .  
This step takes one unit of time.

2. Compute  $D = [c_1 \times e_1, c_2 \times e_2, c_3 \times e_3, c_4 \times e_4]$ .  
This step takes another unit of time after the 1st step is completed.

Total time = 2 units.

With chaining:-

1. Start computing  $C = [a_1+b_1, a_2+b_2, a_3+b_3, a_4+b_4]$ .

2.  $A \leftarrow$   
 $[c_1, c_2, c_3, c_4]$

(5) Explain  
of wrap  
around  
classification  
of concurrent  
operations.

1. SIMD.

instruction.

• Ex. -

→ **SIMD**

Simultaneous  
operations.

Ex. -

- Ques. No. 2. As soon as  $C_1 = a_1 + b_1$  is available, start  $D = [C_1 \times P_1, C_2 \times P_2, C_3 \times P_3, C_4 \times P_4]$ .  
No. of steps of effective time  $\approx 1$  unit.

- (15) Explain in brief with neat diagram the Flynn's classification of computers.  
 → Flynn's taxonomy by Michael J. Flynn in 1966, is a classification system for F.A. based on the no. of concurrent instruction streams. Many can handle 4 categories → SISD (Single Instruction, Single Data)  
 → SIMD (Single Instruction, Multiple Data)  
 → MISD (Multiple Instruction, Single Data)  
 → MIMD (Multiple Instruction, Multiple Data).

1. [SISD]. Description - A single processor executes a single instruction stream & operates on a single data stream.  
 • ex. - Traditional single-core processors.  
 elements.



2. [SIMD]. Description - A single processor executes a single instruction stream but operates on multiple data streams simultaneously.  
 • ex. - Vector processors & GPUs.

6064 6.4 [16/11/15]



3. **MISD** • Description: Multiple processors execute diff. instruc. streams on same data stream.  
• Instruc. streams on practical applications, theoretically used for fault-tolerant systems.



4. **MIMD** • Description: Multiple processors execute diff. instruction streams on diff. data streams.  
• Example: multi-core processors & distributed systems



## Regd. No. \_\_\_\_\_ Date \_\_\_\_\_

- Q. Explain the C-access and S-access memory organization for vector accesses  
→ In context of vector processors, memory access pattern can significantly impact performance.  
Two common patterns are :- C-access (Column access)  
S-access (Stride access)

**C-Access**  
Elements of a vector are stored in consecutive memory locations.

**Layout :-** If you have a vector  $V$  with elements  $v_1, v_2, \dots, v_n$ , they are stored in contiguous memory addresses.

**Access Pattern :-** Accessing elements sequentially, i.e., one after another.

**Example :-** for a vector  $V = [v_1, v_2, v_3, \dots, v_n]$ , the memory addresses might look like :-

Address 0:  $v_1$   
Address 1:  $v_2$   
Address 2:  $v_3$   
...

Elements

Address n-1:  $v_n$

**S-access**

Vector elements are accessed with a fixed stride, meaning there is a regular interval (or gap) b/w the memory addresses of consecutive element.

**Layout :-** if the stride  $S$  is 2, the elements might be stored in memory locations 0, 2, 4, etc.

**Access Pattern :-** Accessing elements with a stride  $S$ .

**Example :-** for a vector  $V = [v_1, v_2, v_3, \dots, v_n]$  and a stride  $S=2$ , the memory address might look like :-

Address 0: V\_1  
Address 2: V\_2  
Address 4: V\_3  
Address 6: V\_4

Address 8: V\_5  
Address 10: V\_6

Comparison - S-access

e-access → can be general.

→ Typically faster  
→ common in applications with dense data & where elements are accessed sequentially.

$\rightarrow A = [a_{11}, a_{12}, a_{13}, a_{21}, a_{22}, a_{23}, a_{31}, a_{32}, a_{33}]$

Access pattern: (C-access)

$a_{11}, a_{12}, a_{13}, a_{21}, a_{22}, a_{23}, a_{31}, a_{32}, a_{33}$

Access pattern: (S-access)

$a_{11}, a_{21}, a_{31}, a_{12}, a_{22}, a_{32}, a_{13}, a_{23}, a_{33}$

(17) Diff. b/w 3-address, 2-address, 1-address, and 0-add.

Instruction with available op.

1. Three-Address Instructions - To

each instru: specifies 3-addresses. These are used for specifying 2 source Operands & one destination Operand.

Ex - assembly lang. for modern CPUs.

⑧  $\text{ADD R1, R2, R3}$

Add contents of registers 'R2' and 'R3'

• Store the result in 'R1'.

2. Two-Address Instructions -

each instru: specifies 2 addresses. One address serves as both a source & a destination, while the other is just a source.

ADD R1, R2  
R1 is both destination & one of source registers.  
R2 is other source register.

3. One - Address instructions: in two address instructions,  
these is only one explicit address in the instruction.

These is both operand & accumulator  
which usually acts as both operand & accumulator.

ADD R1  
Acc = Acc + R1  
Acc → accumulator register  
R1 → source register.

4. Zero - address: These instructions  
Don't contain any explicit address field. These instructions  
typically operate on a stack-based architecture, where  
operands are implicitly assumed to be on top of stack.

ADD  
 $TOS = (TOS - 1) + TOS'$   
TOS → Top of Stack  
TOS → Top of stack from stack, addressed + and  
The 2 operands are popped from stack, addressed + and  
the result is pushed back onto the stack.

and 0 add.  
used for  
non operand

(Q) What is Instruction cycle?

Ans: Micro programmed control unit.

→ Helps in the process through which a command is given. Helps in the process through which a command is given.

→ retrieves, decodes, and execute a command.

Thus people can be broken down into several phases.

1. Fetch - Control unit retrieves instruction from memory.

2. Decode - Instruction is decoded to understand what action is required.

Ques 6.4 N.15  
Ans

g. Execute - Decoded instruction: its carried out, which may for  
involve arithmetic operations, memory access +  
input / output operations.

Regd's  
Date \_\_\_\_\_  
Page \_\_\_\_\_

|                                                                           |                                                                       |
|---------------------------------------------------------------------------|-----------------------------------------------------------------------|
| 4. Whitebox: The structure of the program are written back to the system. |                                                                       |
| Features:                                                                 | Microprogrammed C.P.U.                                                |
| Speed:                                                                    | Faster due to direct address.                                         |
| Flexibility:                                                              | Slow due to sequential microinstruction.                              |
| Complexity:                                                               | High.                                                                 |
| Implementation:                                                           | Moderate complexity uses log. of microinstructions in control memory. |
| Logic Circuits:                                                           | Used in CISC architecture.                                            |
| Native set:                                                               | Used in RISC                                                          |

Q19) What is Von-Neumann Architecture? What is a bottleneck?

Von-Neumann bottleneck? How can this be reduced?  
Also called Von Neumann model or Princeton model.  
→ Architecture, in a computer architecture design model proposed by John Von Neumann.

1. CPU → executes instruction & processes data.
2. ALU: - performs arithmetic & logic operation.
3. CU: - Directs operation of the processor.
4. Memory → Stores data & instructions.
5. Input / Output devices → Interfaces for data input and output.
6. Bus System → Communicate between different components.

Von Neumann Bottleneck:-

Refers to the limitation inherent in Von Neumann architecture, where the throughput of the system

## Registers

is limited by the rate at which data of instructions can be fetched from memory.

### Reducing Von Neumann Bottleneck :-

1. Cache Memory :-
  - L1, L2, L3 caches are small, fast memory storage areas located closer to CPU.
  - They temporarily hold frequently accessed data & instructions to reduce the time the CPU spends waiting for data from main memory.

2. Cache Hierarchy - Diff. levels of cache (L1 being the fastest & smallest, L3 being the largest & slowest) provide a balance b/w Speed & Size.

### SC architecture

Complexity of microcontroller memory.

be reduced

in address

design

of instruction

processor

for

peripherals

and

memory

transfers

to input

from output

peripherals

and memory

transfers

between

peripherals

and memory

transfers

programmed C.P.U.  
to microinstruction  
by specific address

Complexity  
of microcontroller  
memory.

be reduced

in address

design

of instruction

processor

for

peripherals

and

memory

transfers

to input

from output

peripherals

and memory

transfers

between

peripherals

and memory

transfers

to input

from output

peripherals

and memory

transfers

between

peripherals

and memory

transfers

### Reducing Von Neumann Bottleneck :-

1. Cache Memory :-
  - L1, L2, L3 caches are small, fast memory storage areas located closer to CPU.
  - They temporarily hold frequently accessed data & instructions to reduce the time the CPU spends waiting for data from main memory.

2. Cache Hierarchy - Diff. levels of cache (L1 being the fastest & smallest, L3 being the largest & slowest) provide a balance b/w Speed & Size.

3. Pipelining Instruction Pipeline :- Divides the process of executing instructions into separate stages, allowing multiple instructions to be processed simultaneously at diff. stages of execution.

4. Direct Memory Access (DMA) :-
  - DMA controllers :- Allow peripheral devices to access the main memory directly without involving the CPU, reducing the CPU's workload & improving overall system performance.

5. Compare RISC and CISC architecture in brief.

Q) Compare RISC and CISC architecture in brief.

Ans :-

RISC :-

CISC :-

Complexity :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

Instructions :-

Execution :-

Performance :-

Power consumption :-

Cost :-

Speed :-

|                        |                                       |                                         |
|------------------------|---------------------------------------|-----------------------------------------|
| Features               | RISC/Reduced Instruction Set Computer | CISC (Complex Instruction Set Computer) |
| Instruction set        | Small, highly optimized set.          | → Large set of instructions             |
| Instruction complexity | Simple instruction.                   | → Complex Instructions                  |
| Performance            | Packing 1 clock cycle.                | Spending multiple clock cycles.         |
| Code size              | High                                  | → Slower.                               |
| Control unit           | Larger                                | → Smaller                               |
| Registers              | Handwired                             | → Microprogrammed.                      |
| Example                | ARM, MIPS, SPARC                      | X86, VAX, System/360                    |
| User core              | Smartphones, Tablets.                 | → Workstations, Servers, etc.           |

- i) Suppose that a clock cycle has three phases:
- ii) A clock phase that contains:
  - iii) Data transfer between registers and memory.
  - iv) Pipelining instructions.

v) The

- vi) The

- vii) What

- viii) What

2023

{ Date \_\_\_\_\_  
Page \_\_\_\_\_ }

REQUIRING POWER  
SOUND AND MASTERY

i) Superscalar processor is a processor architecture that executes multiple instructions in a single clock cycle by using multiple functional units.

ii) Cluster computer is a group of interconnected comp. (computer Network).

iii) Distributed architecture is a type of parallel architecture that emphasizes data flow and emphasizes communication b/w nodes rather than a shared memory space.

1360

etc.

v) Pipelining is a technique for increasing instruction-level parallelism by simultaneously executing multiple instructions.

v) The goal of exception handling is to avoid the crashing of system.

vi) The FIFO policy in virtual memory is used to decide which page to remove from memory to provide a new page needs to be loaded into memory.

vii) What is superscalar processor?

→ It is a processor that divides its pipeline into more stages, allowing higher clock speeds & multiple instructions to be processed simultaneously.

viii) What is synchronization in a centralized architecture?

6064 6.4 11:16

- vii) What is VLI  
 → In a centralized architecture, synchronizers or protocols to coordinate of uncoordinated & efficiently operate multiple resources to ensure they operate correctly & shared consistency by controlling access & ensure data consistency prevent conflicts.
- viii) What is memory? → Cache  
 → Computer frequent requests.
- ix) What is exception handling in CA?  
 → refers to the management of instructions within a computer's CPU execution of instructions.
- x) What is a memory replacement policy?  
 → are algorithms used in operating systems to decide which memory pages to swap out when new pages need to be loaded into memory.
- xii) What are memory replacement policies?  
 → FIFO → Older data block in the cache is removed.  
 It is simple to implement but may not always provide the best performance.
- xiii) What is LRU (Least Recently Used)?  
 → Data block that has not been used for the longest period is replaced.  
 This policy assumes that data used recently will likely be used again soon.
- (3) Who implemented LRU?
- LRU (Least Recently Used) → This can be useful in certain scenarios where the most recently used data is unlikely to be used again soon.
  - LFU (Least Frequently Used)
  - Random Replacement
  - Clock (Second chance)

vii) What is a VLIW processor?

→ VLIW (Very Long Instruction Word) is a processor architecture where a single instruction can execute multiple operations simultaneously.

② What is cache memory, how it is diff. from main memory?

→ Cache memory is a small, high-speed type of volatile computer memory that provides quick access to frequently used sets of instructions. Located on CPU chip or very close to it.

1. Size: Cache memory is much smaller in size compared to main memory.
2. Speed: Cache memory has faster access speed than main memory.
3. Proximity to CPU: Cache memory is located closer to CPU, whereas main memory is typically located away on the motherboard.
4. Cost: Cache memory is more expensive per unit of storage compared to main memory due to its faster speed of proximity to the CPU.

③ What are some of the design trade-offs involved in implementing a superscalar processor?

1. Instruction-level parallelism (ILP) vs Hardware complexity.
2. Resource sharing vs. latency.
3. Instruction fetch bandwidth vs. Branch Prediction Accuracy.
4. Instruction Dispatch & Issue logic.
5. Dynamic Power consumption vs. Performance.

- 
- 4) What is cache miss, and what are the causes of cache misses in a computer system?  
→ A cache miss occurs when the data requested by the processor is not found in the cache memory, leading to a larger access time as it's fetched from a slower main memory.
- 5) Role of memory management unit (MMU) in virtual memory management.
- 6) a) Describe virtual memory.  
b) Explain how it is implemented in modern computer systems.
- 7) Compare & contrast Superscalar & VLIW processor architecture.
- 8) A system has 64-bit Virtual Address Space and 4KB page size. The page table is stored in Main memory, which has a memory access time of 100ns. The TLB has a hit rate of 90% & a TLB miss takes 50ns to service. What is the effective memory access time?

● ● ○ ○

REDDING POWER  
SCANDONAL MATCHUR

2024.6.4 11:16