

**2022**

**COMPUTER SCIENCE AND ENGINEERING**

**Paper : CSCL-504**

**(Computer Architecture)**

**Full Marks : 70**

*The figures in the margin indicate full marks.*

*Candidates are required to give their answers in their own words  
as far as practicable.*

Answer **question no. 1, question no. 2** and **any four** from the rest.

1. Answer **any five** questions : 2×5
  - (a) What is a ‘systolization’ procedure?
  - (b) What are the desirable features of a processor to be a member of any multiprocessor architecture?
  - (c) What are full-map and chained-map directory structures?
  - (d) How will you justify the usage of cache memory? Express your claim in terms of the relevant parameter set (size, access time, cost per bit).
  - (e) What are XOR and destination-tag routing schemes?
  - (f) How many states are there in  $4\times 4$  switch, including broadcasting and permutation? Comment on the control line format.
  - (g) What do you mean by barrier synchronization?
  
2. Answer **any five** questions : 4×5
  - (a) Write an algorithm to find the test vector for the switch-1 of stage-1 with cross-status in a MICN, whose size is  $8\times 8$ .
  - (b) Justify the statement - “MICN with  $\log_2 N$  stages and each stage with  $N/2$  number of  $2\times 2$  switches is blocking”. How can you convert the blocking MICN to a non-blocking MICN?
  - (c) What do you mean by ‘Parallelism by message passing’? Comment on the number and type of process states of any message-passing system in a multiprocessor system.
  - (d) What is Cache Coherence Problem? What are the primary reasons for the Cache Coherence Problem?
  - (e) What is the necessary and sufficient condition that a given task initiation cycle is allowed by a pipeline with a forbidden latency set F and also for a constant initiation cycle with period ‘p’?
  - (f) What is a super-scalar-pipeline architecture, and how is it different from a vector processor?
  - (g) What are the different types of hazards for pipeline architecture? Briefly discuss the corresponding resolution techniques.

**Please Turn Over**

3. (a) How is a systolic array of processors different from a SIMD environment? Explain briefly with an example.
- (b) Modify the uni-processor matrix multiplication algorithm for two two-dimensional matrices to incorporate temporal and spatial localities. Suggest a suitable systolic array of processors for the said problem. Split out the computations on processors at successive clock cycles. 4+6
4. (a) For the following reservation table, find the initial collision vector, collision, and state transition diagram.

|    | 0 | 1 | 2  | 3 | 4 |
|----|---|---|----|---|---|
| S1 | A | B |    | A | B |
| S2 |   | A |    | B |   |
| S3 | B |   | AB |   | A |

- (b) What lower and upper bounds of minimal average latency correspond to a static pipeline architecture reservation table? Justify your answer. 6+4
5. (a) Write an algorithm to add data items for a shuffle-exchange single-stage interconnection network of N nodes. Split out the steps with an example for an 8-node S-E MICN for 15 data items.
- (b) What are the structural differences between a message-passing module and an interrupt-service-subroutine? 7+3
6. (a) What is snoopy cache coherence protocol?
- (b) Briefly describe techniques to reduce cache-miss rates and cache-miss penalties. 5+5
7. (a) What are different message flow control strategies? What are its limitations?
- (b) Briefly explain the addressing techniques of cache memories with suitable examples.
- (c) A computer has a 256 KB, 4-way set-associative write-back data cache with a block size of 32 bytes. The processor sends a 32-bit address to the cache controller. Each cache tag directory entry contains 2 valid bits in addition to the address tag. Determine the number of bits in the tag field of an address and the size of the cache tag directory. 2+4+4

8. (a) For the following C code :

```
for (int i=0; i<N; i++) {  
    a[i]=b[i]+c[i];  
    d[i]=e[i]+f[i];}  
}
```

Assume the following :

- The arrays are integer arrays, where an integer is 1 word long.
  - N is very large.
  - The loop counter  $i$ , the array size  $N$ , and the base addresses of the six arrays are always kept in registers.
  - The code runs on a machine whose  $L1$  cache is fully associative and consists of four 4-word blocks.
    - (i) What is the cache hit rate of this code, assuming that the cache is initially empty?
    - (ii) Restructure the code to increase the cache hit rate.
    - (iii) What is this optimization called?
    - (iv) What is the cache hit rate of the modified code?
- (b) What are the alternative forms of ‘Data Flow Architecture’? Critically comment on dataflow architecture.

6+4