



# Exercise Sheet 2

## Parallel Processing models

Lecture *Parallel Computing Systems*, Winter semester 2025/2026

Prof. Dr.-Ing. Mladen Berekovic

A discussion forum for the exercise can be found at: <https://moodle.uni-luebeck.de>

### Submission Guidelines

Please follow the instructions in “Submission Guidelines” document published within this course.

### LogP

This exercise aims to evaluate two different network topologies using LogP model.

Each network contains 6 arithmetic units, which are connected according to the following topologies:

1. Bus network
2. Star network

Network operation:  $P_0$  makes a request to all other units. The answers should be sent back to  $P_0$ .

For the models, assume the following parameters:

- $L$ : 3 clock cycles
- $o_s$ : 1 clock cycle
- $o_r$ : 1 clock cycle
- $g$ : 2 clock cycles

Illustrate the message flow and determine the time for all communications  $T_{\text{run}}$  of each network.

## BSP- Bulk Synchronous Parallel

The figure below illustrates a task graph of some computation across three processors ( $p = 3$ ). The graph ends at computation step J. Based on Example 1 from the BSP model lecture, construct the corresponding sequence graph and determine the costs for  $g = 3 \text{ flops}$  and  $L = 10 \text{ flops}$ .



*Note: Each node displays the computational cost, while each edge indicates the amount of data transferred - both expressed in flops.*

## Multi-BSP

In this section, we apply the Multi-BSP theoretical model to the AMD Ryzen Core Complex Die (CCD) architecture.



Figure 1: AMD Ryzen CCD Architecture

This figure illustrates the memory hierarchy of an AMD Ryzen processor with 2 CCDs (processing cores are excluded). Each CCD contains 4 Core Complexes (CCX) with one large shared L3 Cache. Each CCX has one processing core, L1 instruction cache, L1 data cache and L2 cache.

1. CCD #1: CCX #1, #2, #3, #4
2. CCD #2: CCX #5, #6, #7, #8

Based on Example from the Multi-BSP model lecture and given tasks in Figure 2:

1. Illustrate the model as hierarchy tree.
2. Illustrate the processing flow and time sequence when all the given tasks execute on the first CCD.
3. Illustrate the processing flow and time sequence when the first 2 tasks execute on the first CCD, the third task executes on the second CCD.

more...



Figure 2: Multi-BSP Tasks

Table 1 presents the additional parameters used to model MultiBSP for this Ryzen CPU.

| Level | $p_i$ | $g_i$ [B/cycle] | $L_i$ [cycles] | $m_i$  |
|-------|-------|-----------------|----------------|--------|
| 1     | 1     | 32              | 0              | 32 KB  |
| 2     | 1     | 32              | 20             | 512 KB |
| 3     | 4     | 32              | 40             | 8 MB   |
| 4     | 2     | 16              | 300            | 16 GB  |

Table 1: AMD Ryzen 7 1800X

#### Notes:

- The run times of the tasks (nodes) are specified in units of 100 clock cycles, the transmitted data packets (edges) in KB.
- The AMD Ryzen runs with a basic clock of 3.6 GHz, the memory controller is synchronous to the memory clock. The information for  $g_i$  has already been standardized accordingly.
- These values in the figure are originally unfavorable for a manual analysis. Therefore, the values have been adjusted accordingly.