

Oishe

**University of Rajshahi**  
**Department of Computer Science & Engineering**  
**B.Sc. Engineering Part IV Odd Semester Examination 2020**  
**Course Code: CSE 4111**  
**Course Title: Parallel Processing and Distributed System**

**Full Marks: 52.5**

**Time: 3 Hours**

[Answer any SIX (06) questions taking THREE (03) from each section]

**Section A**

1. a) Why are we developing parallel systems instead of sequential systems? 3  
b) Why do we need to write parallel programs? 2.75  
c) How can we exploit instruction level parallelism and loop level parallelism? 3
2. a) If we fetch a cache line from main memory, where in the cache should it be placed? 3.25  
b) What is the bottleneck of Von Neumann architecture? 1  
c) Explain which function is **thread safe** and which is not. 2  
d) What are the limitations of MPI\_Scatter and MPI\_Gather functions? 2.50
3. a) What are the key characteristics of a vector processor? 2.75  
b) Does false sharing generate incorrect result? Explain the effects of false sharing. 3  
c) What is cache coherence problem? Write a program to illustrate the cache coherence problem. 3
4. a) How can the receiver find out the following values in MPI? 3.75  
    1. The amount of data in the message.  
    2. The sender of the message.  
    3. The tag of the message.  
b) Why do we need MPI\_Reduce and MPI\_Allreduce function? Show the communication pattern of MPI\_Allreduce. 3  
c) Write short notes on GPU. 2

**Section B**

5. a) Why do we use MPI\_Ssend instead of MPI\_Send? 2  
b) A set of numbers (15, 11, 9, 16, 3, 14, 8, 7, 4, 6, 12, 10, 5, 2, 13, 1) and four processes are given. Show how the processes can sort the numbers using parallel odd-even transposition sort. 3.25  
c) Write a program in MPI to sort a set of numbers using parallel odd-even transposition sort algorithm. 3.50
6. a) Define grain size and latency. 2  
b) Define different types of data dependence. 3.25  
c) What are the steps of grain packing? Explain with an example. 3.50
7. a) Can multiple threads simultaneously insert new nodes into a linked list without any problem? If not, explain how to insert the nodes and write a code segment to insert the nodes by multiple threads. 3.25  
b) How can we synchronize threads? 3  
c) What is mutex? Write a program using mutex. 2.50

8. a) What are the problems of busy-waiting? 2.50  
b) Define critical section and race condition. 2.50  
c)  $\text{fibo}[0] = \text{fibo}[1] = 1;$  3.75  

```
# pragma omp parallel for num_threads(thread_count)
for (i = 2; i < n; i++)
    fibo[i] = fibo[i-1] + fibo[i-2];
```

What will be the results if we try running the above code with more than one thread?

[N.B. Answer any Six questions taking Three from each section]

### Section-A

1. (a) Why are we developing parallel systems instead of sequential systems? 2  
 (b) If a fraction  $r$  of our serial program remains unparallelized, then Amdahl's law says 3  
 we can't get a speedup better than  $1/r$ . Illustrate with an example.  
 (c) Suppose  $S$  is a shared variable that is initialized to 3.  $A_0, B_0$  and  $C_0$  are private 3.75  
 and owned by core 0, on the other hand  $A_1, B_1, C_1$  and  $A_2$  are private and owned  
 by core 1. What will be the value of  $A_2$  after executing the following statements?  
 How to solve the problem (if you find any)?

| Time | Core 0       | Core 1       |
|------|--------------|--------------|
| 0    | $A_0=S;$     | $A_1=3*S;$   |
| 1    | $S=5;$       | $B_1=C_1+1;$ |
| 2    | $B_0=C_0+2;$ | $A_2=4*S;$   |

2. (a) What happens when multiple processes first call MPI\_Send simultaneously and 3  
 then call MPI\_Recv? How can you make your program safe if it is unsafe due to  
 the use of MPI\_Send and MPI\_Recv?  
 (b) Explain why we use MPI\_Allgather. 2  
 (c) Write an MPI program that implements multiplication of a vector by a scalar and 3.75  
 dot product. The user should enter two vectors and a scalar, all of which are taken  
 by process 0 and distributed among the processes. The results are calculated and  
 collected from process 0, which prints them. You can assume that  $n$ , the order of  
 the vectors, is evenly divisible by comm\_sz.
3. (a) Suppose you have an array {12, 14, 6, 8, 10, 16, 2, 4}, show all the phases of 4.75  
 sorting the array using odd-even transposition sort algorithm.  
 (b) Write an OpenMP program to implement the odd-even transposition sort 4  
 algorithm.
4. (a) Discuss the efficiency of pipelined processor over non pipelined processor for a 3  
 single instruction?  
 (b) What is pipelining? To perform pipelining in a processor what steps must be 2  
 followed.  
 (c) Draw and explain the structure of a linear pipeline for floating-point 3.75  
 multiplication.

## Section-B

5. (a) Define software parallelism and hardware parallelism. 1.75  
 (b) Is there any relation between software parallelism and hardware parallelism? 3  
 Explain.
- (c) Perform a data dependence analysis on each of the following two program segments. Show the dependence graphs among the statements with justification. 4
- |                   |                      |
|-------------------|----------------------|
| (i) S1: $A=B+D$   | (i) S1: $X=\sin(Y)$  |
| (ii) S2: $C=Ax3$  | (ii) S2: $Z=X+W$     |
| (iii) S3: $A=A+C$ | (iii) S3: $Y=-25xW$  |
| (iv) S4: $E=A/2$  | (iv) S4: $X=\cos(Z)$ |
6. (a) How to eliminate communication delays between processors through node duplication? Illustrate with an example. 3  
 (b) Suppose two  $2 \times 2$  matrices A and B are multiplied to compute the sum of the four elements in the resulting product matrix  $C=AxB$ . Draw a fine-grain program graph assuming each node requires 101 CPU cycles for multiplication, 8 CPU cycles for addition and 212 CPU cycles for inter-processor communication latency. Perform grain packing and show sequential schedule (before grain packing) and parallel schedule for 4 and 6 processors (after grain packing). 5.75
7. (a) Consider the following reservation table 5
- |    | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|----|---|---|---|---|---|---|---|---|
| S1 | X |   |   |   |   | X |   |   |
| S2 |   | X |   | X |   |   |   | X |
| S3 |   |   | X |   | X |   | X |   |
- (i) List the set of forbidden latencies and the collision vector  
 (ii) Draw a state transition diagram showing all possible initial sequences (cycles) without causing a collision in the pipeline  
 (iii) List all the simple cycles from the state diagram
- (b) What is the function of scoreboard in dynamic instruction scheduling? 2.75  
 (c) How to calculate throughput of a pipeline? 1
8. (a) Define distributed system? Why distributed systems to be needed? 3  
 (b) Differentiate between parallel and distributed systems. 2  
 (c) What kind of naming services do you know? Why do distributed systems need this function? 2  
 (d) What is distributed deadlock and why are they hard to detect? 1.75

**Answer any three questions from each part.**

### **Part-A**

- |    |                                                                                                                                                |      |
|----|------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 1. | (a) What is meant by the term 'Parallel Processing'? Can you discuss the general trends in parallel processing?                                | 4    |
|    | (b) What is 'Vector Processor'? Why do we need such processors?                                                                                | 2.75 |
|    | (c) Can you give us a comparative study between vector and scalar processors?                                                                  | 2    |
| 2. | (a) Discuss different types of data dependence.                                                                                                | 2.5  |
|    | (b) What are the steps of grain packing? Discuss with an example.                                                                              | 3.75 |
|    | (c) What is data-flow computer? Write a program and draw its dataflow graph.                                                                   | 2.5  |
| 3. | (a) Define Whetstone results, TPS and KLIPS ratings.                                                                                           | 3    |
|    | (b) Explain harmonic mean speedup performance for a multiprocessor operating in $n$ execution modes with respect to probability distributions. | 4    |
|    | (c) Define average parallelism.                                                                                                                | 1.75 |
| 4. | (a) What are the reasons behind the development of reduced instruction sets.                                                                   | 2    |
|    | (b) Draw the architecture of MC68040 microprocessor and discuss its major units.                                                               | 3.5  |
|    | (c) Explain the concept of using overlapped register windows introduced by the Berkeley RISC architecture.                                     | 3.25 |

### **Part-B**

- |    |                                                                                 |      |
|----|---------------------------------------------------------------------------------|------|
| 5. | (a) What do you know about the distributed database?                            | 1    |
|    | (b) Describe the horizontal and vertical fragmentation of distributed database. | 3.75 |
|    | (c) Describe the failures handling techniques in two-phase commit protocol.     | 4    |
| 6. | (a) Define reservation table.                                                   | 1    |
|    | (b)                                                                             | 4.75 |

|    | 1 | 2 | 3 | 4 | 5 | 6 |
|----|---|---|---|---|---|---|
| S1 | X |   |   |   |   | X |
| S2 |   | X |   |   | X |   |
| S3 |   |   | X |   |   |   |
| S4 |   |   |   | X |   |   |
| S5 |   | X |   |   |   | X |

Considering the above reservation table:

- (a) List the set of forbidden latencies and the collision vector.
- (b) Draw the state transition diagram.
- (c) Determine the minimum average latency of this pipeline.
- (c) Explain how to reduce minimal average latency.

3

7. (a) What are the challenges in the development of future general-purpose supercomputers? 2
- (b) Discuss different types of vector instructions. 3
- (c) Draw the architecture of Fujitsu VP2000 series supercomputer and discuss its major units. 3.75
8. (a) Consider the following pure scalar loop: 3

```
Do K = 1, 2048
  A(K) = A(K) + S
Enddo
```

Explain how to reduce the execution time by vector, scalar-concurrent, vector-concurrent and concurrent outer/vector inner (COVI) modes.

- (b) Consider the following loop: 2

```
Do I = 1, N
  A(I) = B(I)
  C(I) = A(I) + B(I)
  E(I) = C(I+1)
Enddo
```

Show how to vectorize the code.

- (c) Consider a basic block of bubble sort program. 3.75

|        |               |         |                  |
|--------|---------------|---------|------------------|
| t8     | $\Rightarrow$ | j - 1   | temp := A[j]     |
| t9     | $\Leftarrow$  | 4 * t8  |                  |
| temp   | $\Leftarrow$  | A[t9]   |                  |
| t10    | $\Leftarrow$  | j + 1   | A[j + 1]         |
| t11    | $\Leftarrow$  | t10 - 1 |                  |
| t12    | $\Leftarrow$  | 4 * t11 |                  |
| t13    | $\Leftarrow$  | A[t12]  | A[j] := A[j + 1] |
| t14    | $\Leftarrow$  | j - 1   |                  |
| t15    | $\Leftarrow$  | 4 * t14 |                  |
| A[t15] | $\Leftarrow$  | t13     | A[j] := A[j + 1] |
| t16    | $\Leftarrow$  | j + 1   |                  |
| t17    | $\Leftarrow$  | t16 - 1 |                  |
| t18    | $\Leftarrow$  | 4 * t17 | A[j + 1] := temp |
| A[t18] | $\Leftarrow$  | temp    |                  |

Show its corresponding DAG representation.

### Part-A

[Answer any THREE questions.]

1. (a) What do you mean by parallel processing? Discuss the major challenges in the development of future general-purpose supercomputers. 2
  - (b) Draw the architecture of a vector super computer and discuss its major functional units. 3
  - (c) Define implicit and explicit parallelism 1.75
  - (d) Derive MIPS rate and throughput rate. 2
  
  2. (a) Discuss different types of data dependence. 3
  - (b) Consider the following assembly language code. Exploit the maximum degree of parallelism among the 16 instructions, assuming no resource conflicts and multiple functional units are available simultaneously. For simplicity, no pipelining is assumed. All instructions take one machine cycle to execute. Ignore all other overhead. Draw a program graph with 16 nodes to show the flow relationships among the 16 instructions. 5.75
- |     |                  |                                    |
|-----|------------------|------------------------------------|
| 1:  | Load R1, A       | $/R1 \leftarrow \text{Mem}(A)/$    |
| 2:  | Load R2, B       | $/R2 \leftarrow \text{Mem}(B)/$    |
| 3:  | Mul R3, R1, R2   | $/R3 \leftarrow (R1) \times (R2)/$ |
| 4:  | Load R4, D       | $/R4 \leftarrow \text{Mem}(D)/$    |
| 5:  | Mul R5, R1, R4   | $/R5 \leftarrow (R1) \times (R4)/$ |
| 6:  | Add R6, R3, R5   | $/R6 \leftarrow (R3) + (R5)/$      |
| 7:  | Store X, R6      | $/\text{Mem}(X) \leftarrow (R6)/$  |
| 8:  | Load R7, C       | $/R7 \leftarrow \text{Mem}(C)/$    |
| 9:  | Mul R8, R7, R4   | $/R8 \leftarrow (R7) \times (R4)/$ |
| 10: | Load R9, E       | $/R9 \leftarrow \text{Mem}(E)/$    |
| 11: | Add R10, R8, R9  | $/R10 \leftarrow (R8) + (R9)/$     |
| 12: | Store Y, R10     | $/\text{Mem}(Y) \leftarrow (R10)/$ |
| 13: | Add R11, R6, R10 | $/R11 \leftarrow (R6) + (R10)/$    |
| 14: | Store U, R11     | $/\text{Mem}(U) \leftarrow (R11)/$ |
| 15: | Sub R12, R6, R10 | $/R12 \leftarrow (R6) - (R10)/$    |
| 16: | Store V, R12     | $/\text{Mem}(V) \leftarrow (R12)/$ |
- 
3. (a) Give the equation for pipeline CPI. How is it related to ideal pipeline CPI? Explain the terms involved in it. 3
  - (b) What are pipeline registers, what sorts of information do they carry, and why are they needed? 2
  - (c) Illustrate the operational principles to design of a pipeline floating-point adder with the inputs of  $A = a \times 2^p$  and  $B = b \times 2^q$ . 3.75

4. (a) Define network diameter, bisection width and degree of parallelism (DOP). 3  
 (b) How to compute average parallelism? 2.5  
 (c) Ruby Lee has defined several parameters for evaluating parallel computations. 3.25  
 Derive the parameters.

### Part-B

[Answer any THREE questions.]

5. (a) Explain in a VLIW architecture. 2  
 (b) Compare the PRAM and VLSI models of parallel computers and mention how these models facilitate the study of asymptotic behavior of algorithms implementable on parallel computers. 3.75  
 (c) Differentiate the three different shared memory system classification: UMA, NUMA, and COMA. 3
6. (a) Define distributed system. Why distributed systems to be needed? 2  
 (b) Differentiate between parallel and distributed systems. 2  
 (c) What are the common characteristics that can be used to assess distributed system? 1  
 (d) Discuss the transparency of distributed system. 2  
 (e) What are the advantages and disadvantages of distributed system? 1.75

7. (a) 6

|    | 1 | 2 | 3 | 4 | 5 | 6 |
|----|---|---|---|---|---|---|
| S1 | X |   |   |   |   | X |
| S2 |   | X |   |   | X |   |
| S3 |   |   | X |   |   |   |
| S4 |   |   |   | X |   |   |
| S5 |   | X |   |   |   | X |

Considering the above reservation table

- (a) Draw the state transition diagram  
 (b) Determine the simple cycles from the state transition diagram  
 (c) Determine the minimum average latency of this pipeline  
 (b) Show how to reduce minimal average latency by inserting monocompute delay. 2.75
8. (a) Explain how to improve instruction scheduling in the CDC 6600 processor 3  
 (b) Discuss different types of vector instructions. 3  
 (c) Discuss s-access memory organization, c-access memory organization and c/s access memory organization. 2.75

**University of Rajshahi**  
**Department of Computer Science and Engineering**  
**B.Sc. Engg. Part 4 Odd Semester Examination 2015**  
**(Session: 2010-11, 2011-12)**

**Course No. : CSE4111 (Parallel Processing and Distributed System)**  
**Marks: 52.5** Time: 3 Hours

**(Answer any three questions from each Part)**

**Part-A**

1. a) Explain s-access memory organization, c-access memory organization and c/s access memory organization. 3  
 b) What is the relation between major cycle and minor cycle in accessing pipelined memory modules? 2.75  
 c) Explain how to obtain optimal number of pipeline stages to maximize the performance/cost ratio. 3
  
2. a) Explain how to improve the timing after the instruction issuing order is changed. 3  
 b) Explain how internal data forwarding can improve the throughput of a pipelined processor. 3  
 c) Derive the performance degradation factor for the analysis of branching effects. 2.75
  
3. a) Consider the following pipeline reservation table 5
 

|    |   |   |   |   |
|----|---|---|---|---|
|    | 1 | 2 | 3 | 4 |
| S1 | X |   |   | X |
| S2 |   | X |   |   |
| S3 |   |   | X |   |

(i). What are the forbidden latencies?  
 (ii). Draw the state transition diagram.  
 (iii). List all the simple cycles and greedy cycles.

- b) What is multiport memory? 2  
 c) What are the limitations of crossbar network? 1.75

4. a) Define the following terms: (i) Speedup, (ii) Efficiency, (iii) Throughput. 3  
 b) A five stage linear pipeline has clock period of  $t=100\text{ns}$ , and  $n=50$  tasks. What is its throughput, efficiency and speedup? 2  
 c) A CPU of modern computer is a good example of linear pipeline. Explain. 3.75

**Part-B**

5. a) Define critical section found in shared-variable model. 1.75  
 b) What are the patterns of parallelism found in practice of concurrent object oriented programming? Explain with examples. 3  
 c) Discuss the major phases of parallelizing compiler. 4
  
6. a) Explain vectorization ratio in user code, vector loops and pipeline chaining. 2  
 b) Convert the following sequential code to vector code:  

```
DO 20 I=1, N
      A(I)=B(I-1)
      20   B(I)=2*B(I)
    
```

 c) Describe the following optimization of vector functions: 2.75  
 (i). Redundant expression elimination. +2  
 (ii). Constant folding at compile time.
  
7. a) What do you mean by distributed database? Differentiate between centralized and distributed database system. 3  
 b) Differentiate between horizontal fragmentation and vertical fragmentation with suitable example. 3.75  
 c) What is meant by local transaction and global transaction? 2
  
8. a) What is software path length? And teleprocessing? 2  
 b) What are the trends are to distributed processing? 3.75  
 c) What are the technical reasons for using hierarchical distributed processing? 3

**University of Rajshahi**  
**Department of Computer Science and Engineering**  
**B.Sc. Engg. Part 4, Odd Semester, Examination 2014**  
**Course No. : CSE4111 (Parallel Processing and Distributed Systems)**  
**Marks: 52.5** **Time: 3 Hours**

**(Answer any three questions from each Part)**

**Part-A**

- |                                                                                                                                                                                                                                                                                                                                         |                     |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|
| 1. (a) What do you mean by parallel processing? What are the applications of it?<br>(b) Describe the various programmatic levels of parallel processing.<br>(c) Explain the use of an array processor.                                                                                                                                  | 2.75<br>4<br>2      |
| 2. (a) Define the following terms: (i) Reservation table, (ii) Latency, (iii) Collision, (iv) State diagram.<br>(b) Compare the main features of RISC and CISC processors.<br>(c) What is superscalar processor? Briefly discuss pipelining in superscalar processor.                                                                   | 2<br>4<br>2.75      |
| 3. (a) Define vector processing. Discuss the principles of vector processing.<br>(b) Give the basic structure of a vector-register architecture.<br>(c) Explain about the components of vector processor<br>(d) Discuss the advantages and disadvantages of vector processing.                                                          | 3<br>0.75<br>3<br>2 |
| 4. (a) What is pipelining? To perform pipelining in a processor what steps must be followed?<br>(b) What are the differences between scalar and vector pipelines?<br>(c) Draw and explain the structure of a linear pipeline for floating-point multiplication.<br>(d) Write down Handler classification scheme of pipeline processors. | 3<br>2<br>3<br>0.75 |

**Part-B**

- |                                                                                                                                                                                                                                                                                                                                                                                  |                     |   |   |   |   |   |   |   |                |  |   |  |   |  |  |  |                |  |  |   |  |   |  |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|---|---|---|---|---|---|---|----------------|--|---|--|---|--|--|--|----------------|--|--|---|--|---|--|--|--|
| 5. (a) What is non-linear pipeline? Explain reservation table with example.<br>(b) What do you mean by forbidden latencies and collision vector?<br>(c) What are the forbidden latencies and collision vector of the following reservation table?                                                                                                                                | 1.75<br>2<br>2      |   |   |   |   |   |   |   |                |  |   |  |   |  |  |  |                |  |  |   |  |   |  |  |  |
| <table border="1" style="margin: auto;"> <tr> <td>S<sub>1</sub></td> <td>X</td> <td></td> <td></td> <td></td> <td>X</td> <td></td> <td>X</td> </tr> <tr> <td>S<sub>2</sub></td> <td></td> <td>X</td> <td></td> <td>X</td> <td></td> <td></td> <td></td> </tr> <tr> <td>S<sub>3</sub></td> <td></td> <td></td> <td>X</td> <td></td> <td>X</td> <td></td> <td></td> </tr> </table> | S <sub>1</sub>      | X |   |   |   | X |   | X | S <sub>2</sub> |  | X |  | X |  |  |  | S <sub>3</sub> |  |  | X |  | X |  |  |  |
| S <sub>1</sub>                                                                                                                                                                                                                                                                                                                                                                   | X                   |   |   |   | X |   | X |   |                |  |   |  |   |  |  |  |                |  |  |   |  |   |  |  |  |
| S <sub>2</sub>                                                                                                                                                                                                                                                                                                                                                                   |                     | X |   | X |   |   |   |   |                |  |   |  |   |  |  |  |                |  |  |   |  |   |  |  |  |
| S <sub>3</sub>                                                                                                                                                                                                                                                                                                                                                                   |                     |   | X |   | X |   |   |   |                |  |   |  |   |  |  |  |                |  |  |   |  |   |  |  |  |
| (d) Define different types of pipeline hazards. Explain data hazard.                                                                                                                                                                                                                                                                                                             | 3                   |   |   |   |   |   |   |   |                |  |   |  |   |  |  |  |                |  |  |   |  |   |  |  |  |
| 6. (a) What is parallel programming model? What are the different types of parallel programming models? Explain.<br>(b) Explain shared variable parallel model.<br>(c) Explain language features for parallelism.                                                                                                                                                                | 2.75<br>3<br>3      |   |   |   |   |   |   |   |                |  |   |  |   |  |  |  |                |  |  |   |  |   |  |  |  |
| 7. (a) Data should be centralized or decentralized. Explain.<br>(b) Differentiate between dependent hierarchical data and independent hierarchical data. Explain with suitable example.<br>(c) What are the technical reasons for using hierarchical distributed processing?                                                                                                     | 3<br>3<br>3         |   |   |   |   |   |   |   |                |  |   |  |   |  |  |  |                |  |  |   |  |   |  |  |  |
| 8. (a) What is middleware? Give an example.<br>(b) What is the purpose of a middleware system?<br>(c) What are the basic design issues of distributed systems? Explain the need for naming services in distributed systems.<br>(d) Explain the concept and implementation ideas of Remote Method Invocation.                                                                     | 1.75<br>2<br>3<br>2 |   |   |   |   |   |   |   |                |  |   |  |   |  |  |  |                |  |  |   |  |   |  |  |  |