

CSCI 6461  
**Computer System Architecture**  
FALL 2023

**PROJECT 2**

**GROUP MEMBERS**  
ADONIA SEQUEIRA G31345330  
SHIVAM CHAVAN G22366287  
VINAYAK DUBEY G47804140

**PROF DR. LEI HE**

## Task 01

Influence of the Cache Size on the Miss Rate - Study the influence of the cache size on the miss rate during the execution of a parallel program in a SMP (symmetric multiprocessor). This project also allows us to show that all the previous uniprocessor projects can be performed in a similar way for multiprocessor systems (multiprocessor traces).

### Configuration

Processors in SMP = 8.

Cache coherence protocol = MESI.

Scheme for bus arbitration = LRU.

Word wide (bits) = 16.

Words by block = 32 (block size = 64 bytes).

Blocks in main memory = 524288 (main memory size = 32 MB).

Mapping = Set-Associative.

Cache sets = They will vary depending on the number of blocks in cache, but you must always have four-way set associative caches (remember: Number\_of\_ways = Number\_of\_blocks\_in\_cache / Number\_of\_cache\_sets).

Replacement policy = LRU.

Configure the blocks in cache using the following configurations: 16 (cache size = 1 KB), 32, 64, 128, 256, 512, 1024, and 2048 (cache size = 128 KB). For each of the configurations, obtain the global miss rate for the system using the trace files: FFT, Simple, Speech and Weather.

## Screenshots

### FFT



**SMPCache - Cache evolution**

Information View cache P0-P7

| Number of blocks: 64 |                 | Cache of node: P0 |         |             |
|----------------------|-----------------|-------------------|---------|-------------|
| Mapping:             | Set-Associative | Cache level:      | 1/1     |             |
| Replacement requ:    | LRU             | Cache type:       | Unified |             |
| Cache sets:          | 16              | Cache address:    | 0       |             |
| Address              | Set/Address     | State             | Block   | Last Access |
| 0000 (0)             | 0/0             | Invalid (I)       | 86048   | 931459      |
| 0001 (1)             | 0/1             | Invalid (I)       | 0       | 931469      |
| 0002 (2)             | 0/2             | Shared (S)        | 43024   | 931451      |
| 0003 (3)             | 0/3             | Invalid (I)       | 28688   | 931419      |
| 0004 (4)             | 1/0             | Invalid (I)       | Empty   | 0           |
| 0005 (5)             | 1/1             | Invalid (I)       | Empty   | 0           |
| 0006 (6)             | 1/2             | Invalid (I)       | Empty   | 0           |
| 0007 (7)             | 1/3             | Invalid (I)       | Empty   | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

|                |                |
|----------------|----------------|
| Access number: | 931469         |
| Access type:   | Data reading   |
| Address:       | 00000000 (Hex) |
| Block:         | 0              |
| Word:          | 0              |

**Data concerning cache 0**

|               |        |        |
|---------------|--------|--------|
| Accesses      | Actual | Total  |
| Instructions  | 390534 | 390534 |
| Data readings | 367258 | 367258 |
| Data writings | 173677 | 173677 |

**Execute**

Continue

Stop

Complete

Exit

Simulation steps: 931469

**SMPCache - Cache evolution**

Information View cache P0-P7

| Number of blocks: 128 |                 | Cache of node: P0 |         |             |
|-----------------------|-----------------|-------------------|---------|-------------|
| Mapping:              | Set-Associative | Cache level:      | 1/1     |             |
| Replacement requ:     | LRU             | Cache type:       | Unified |             |
| Cache sets:           | 32              | Cache address:    | 0       |             |
| Address               | Set/Address     | State             | Block   | Last Access |
| 0000 (0)              | 0/0             | Invalid (I)       | 2432    | 931388      |
| 0001 (1)              | 0/1             | Invalid (I)       | 0       | 931469      |
| 0002 (2)              | 0/2             | Invalid (I)       | 86048   | 931459      |
| 0003 (3)              | 0/3             | Invalid (I)       | 100384  | 931380      |
| 0004 (4)              | 1/0             | Invalid (I)       | Empty   | 0           |
| 0005 (5)              | 1/1             | Invalid (I)       | Empty   | 0           |
| 0006 (6)              | 1/2             | Invalid (I)       | Empty   | 0           |
| 0007 (7)              | 1/3             | Invalid (I)       | Empty   | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

|                |                |
|----------------|----------------|
| Access number: | 931469         |
| Access type:   | Data reading   |
| Address:       | 00000000 (Hex) |
| Block:         | 0              |
| Word:          | 0              |

**Data concerning cache 0**

|               |        |        |
|---------------|--------|--------|
| Accesses      | Actual | Total  |
| Instructions  | 390534 | 390534 |
| Data readings | 367258 | 367258 |
| Data writings | 173677 | 173677 |

**Execute**

Continue

Stop

Complete

Exit

Simulation steps: 931469

**SMPCache - Cache evolution**

Information View cache P0-P7

|                     |                 |                |         |
|---------------------|-----------------|----------------|---------|
| Number of blocks:   | 256             | Cache of node: | P0      |
| Mapping:            | Set-Associative | Cache level:   | 1       |
| Replacement requir. | LRU             | Cache type:    | Unified |
| Cache sets:         | 64              | Cache address: | 0       |

| Address  | Set/Address | State        | Block  | Last Access |
|----------|-------------|--------------|--------|-------------|
| 0000 (0) | 0/0         | Modified (M) | 264576 | 929907      |
| 0001 (1) | 0/1         | Modified (M) | 262528 | 929842      |
| 0002 (2) | 0/2         | Modified (M) | 0      | 931469      |
| 0003 (3) | 0/3         | Modified (M) | 2432   | 931388      |
| 0004 (4) | 1/0         | Invalid (I)  | Empty  | 0           |
| 0005 (5) | 1/1         | Invalid (I)  | Empty  | 0           |
| 0006 (6) | 1/2         | Invalid (I)  | Empty  | 0           |
| 0007 (7) | 1/3         | Invalid (I)  | Empty  | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Hits-Misses**

|                        |         |         |
|------------------------|---------|---------|
| <b>Global readings</b> |         |         |
|                        | Hits    | Misses  |
| Number                 | 565797  | 191995  |
| Rate                   | 74.664% | 25.336% |

|                       |         |         |
|-----------------------|---------|---------|
| <b>Local readings</b> |         |         |
|                       | Hits    | Misses  |
| Number                | 565797  | 191995  |
| Rate                  | 74.664% | 25.336% |

|                        |         |         |
|------------------------|---------|---------|
| <b>Global writings</b> |         |         |
|                        | Hits    | Misses  |
| Number                 | 66230   | 107447  |
| Rate                   | 38.134% | 61.866% |

|                       |         |         |
|-----------------------|---------|---------|
| <b>Local writings</b> |         |         |
|                       | Hits    | Misses  |
| Number                | 66230   | 107447  |
| Rate                  | 38.134% | 61.866% |

|                               |         |         |
|-------------------------------|---------|---------|
| <b>Global hits and misses</b> |         |         |
|                               | Hits    | Misses  |
| Number                        | 632027  | 299442  |
| Rate                          | 67.853% | 32.147% |

|                              |         |         |
|------------------------------|---------|---------|
| <b>Local hits and misses</b> |         |         |
|                              | Hits    | Misses  |
| Number                       | 632027  | 299442  |
| Rate                         | 67.853% | 32.147% |

Total replacements: 59391

**Actual access**

|                |                |
|----------------|----------------|
| Access number: | 931469         |
| Access type:   | Data reading   |
| Address:       | 00000000 (Hex) |
| Block:         | 0              |
| Word:          | 0              |

**Data concerning cache**

|               |        |
|---------------|--------|
| Actual        | Total  |
| Accesses      | 931469 |
| Instructions  | 390534 |
| Data readings | 367258 |
| Data writings | 173677 |

Final writebacks

Execute Continue Stop Complete Exit

Simulation steps: 931469

**SMPCache - Cache evolution**

Information View cache P0-P7

|                     |                 |                |         |
|---------------------|-----------------|----------------|---------|
| Number of blocks:   | 512             | Cache of node: | P0      |
| Mapping:            | Set-Associative | Cache level:   | 1       |
| Replacement requir. | LRU             | Cache type:    | Unified |
| Cache sets:         | 128             | Cache address: | 0       |

| Address  | Set/Address | State       | Block  | Last Access |
|----------|-------------|-------------|--------|-------------|
| 0000 (0) | 0/0         | Invalid (I) | 0      | 931469      |
| 0001 (1) | 0/1         | Invalid (I) | 264576 | 929907      |
| 0002 (2) | 0/2         | Invalid (I) | 262528 | 929842      |
| 0003 (3) | 0/3         | Invalid (I) | 2432   | 931388      |
| 0004 (4) | 1/0         | Invalid (I) | Empty  | 0           |
| 0005 (5) | 1/1         | Invalid (I) | Empty  | 0           |
| 0006 (6) | 1/2         | Invalid (I) | Empty  | 0           |
| 0007 (7) | 1/3         | Invalid (I) | Empty  | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Hits-Misses**

|                        |         |         |
|------------------------|---------|---------|
| <b>Global readings</b> |         |         |
|                        | Hits    | Misses  |
| Number                 | 566023  | 191769  |
| Rate                   | 74.694% | 25.306% |

|                       |         |         |
|-----------------------|---------|---------|
| <b>Local readings</b> |         |         |
|                       | Hits    | Misses  |
| Number                | 566023  | 191769  |
| Rate                  | 74.694% | 25.306% |

|                        |         |         |
|------------------------|---------|---------|
| <b>Global writings</b> |         |         |
|                        | Hits    | Misses  |
| Number                 | 66242   | 107435  |
| Rate                   | 38.141% | 61.859% |

|                       |         |         |
|-----------------------|---------|---------|
| <b>Local writings</b> |         |         |
|                       | Hits    | Misses  |
| Number                | 66242   | 107435  |
| Rate                  | 38.141% | 61.859% |

|                               |         |         |
|-------------------------------|---------|---------|
| <b>Global hits and misses</b> |         |         |
|                               | Hits    | Misses  |
| Number                        | 632265  | 299204  |
| Rate                          | 67.878% | 32.122% |

|                              |         |         |
|------------------------------|---------|---------|
| <b>Local hits and misses</b> |         |         |
|                              | Hits    | Misses  |
| Number                       | 632265  | 299204  |
| Rate                         | 67.878% | 32.122% |

Total replacements: 59353

**Actual access**

|                |                |
|----------------|----------------|
| Access number: | 931469         |
| Access type:   | Data reading   |
| Address:       | 00000000 (Hex) |
| Block:         | 0              |
| Word:          | 0              |

**Data concerning cache**

|               |        |
|---------------|--------|
| Actual        | Total  |
| Accesses      | 931469 |
| Instructions  | 390534 |
| Data readings | 367258 |
| Data writings | 173677 |

Final writebacks

Execute Continue Stop Complete Exit

Simulation steps: 931469

**SMPCache - Cache evolution**

Information View cache P0-P7

|                          |                     |             |        |             |
|--------------------------|---------------------|-------------|--------|-------------|
| Number of blocks: 1024   | Cache of node: P0   |             |        |             |
| Mapping: Set-Associative | Cache level: 1/2    |             |        |             |
| Replacement requ. LRU    |                     |             |        |             |
| Cache sets: 256          | Cache type: Unified |             |        |             |
| Address                  | Set/Address         | State       | Block  | Last Access |
| 0000 (0)                 | 0/0                 | Invalid (I) | 0      | 931469      |
| 0001 (1)                 | 0/1                 | Invalid (I) | 135168 | 474736      |
| 0002 (2)                 | 0/2                 | Invalid (I) | 393216 | 475760      |
| 0003 (3)                 | 0/3                 | Invalid (I) | 391168 | 475392      |
| 0004 (4)                 | 1/0                 | Invalid (I) | Empty  | 0           |
| 0005 (5)                 | 1/1                 | Invalid (I) | Empty  | 0           |
| 0006 (6)                 | 1/2                 | Invalid (I) | Empty  | 0           |
| 0007 (7)                 | 1/3                 | Invalid (I) | Empty  | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

Access number: 931469  
Access type: Data reading  
Address: 00000000 (Hex)  
Block: 0  
Word: 0

**Data concerning cache** 0

**Final writebacks**

**Execute**  
Continue Stop

1 Complete

Simulation steps: 931469

Total replacements: 59392

Cache level 1

**SMPCache - Cache evolution**

Information View cache P0-P7

|                          |                     |             |        |             |
|--------------------------|---------------------|-------------|--------|-------------|
| Number of blocks: 2048   | Cache of node: P0   |             |        |             |
| Mapping: Set-Associative | Cache level: 1/4    |             |        |             |
| Replacement requ. LRU    |                     |             |        |             |
| Cache sets: 512          | Cache type: Unified |             |        |             |
| Address                  | Set/Address         | State       | Block  | Last Access |
| 0000 (0)                 | 0/0                 | Invalid (I) | 0      | 931469      |
| 0001 (1)                 | 0/1                 | Invalid (I) | 135168 | 474736      |
| 0002 (2)                 | 0/2                 | Invalid (I) | 393216 | 475760      |
| 0003 (3)                 | 0/3                 | Invalid (I) | 391168 | 475392      |
| 0004 (4)                 | 1/0                 | Invalid (I) | Empty  | 0           |
| 0005 (5)                 | 1/1                 | Invalid (I) | Empty  | 0           |
| 0006 (6)                 | 1/2                 | Invalid (I) | Empty  | 0           |
| 0007 (7)                 | 1/3                 | Invalid (I) | Empty  | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

Access number: 931469  
Access type: Data reading  
Address: 00000000 (Hex)  
Block: 0  
Word: 0

**Data concerning cache** 0

**Final writebacks**

**Execute**  
Continue Stop

1 Complete

Simulation steps: 931469

Total replacements: 59392

Cache level 1

## Simple



**SMPCache - Cache evolution**

Information View cache P0-P7

|                          |                     |             |       |             |
|--------------------------|---------------------|-------------|-------|-------------|
| Number of blocks: 64     | Cache of node: P0   |             |       |             |
| Mapping: Set-Associative | Cache level: 1/1    |             |       |             |
| Replacement requir: LRU  |                     |             |       |             |
| Cache sets: 16           | Cache type: Unified |             |       |             |
| Address                  | Set/Address         | State       | Block | Last Access |
| 0000 (0)                 | 0/0                 | Invalid (I) | 22544 | 3378818     |
| 0001 (1)                 | 0/1                 | Invalid (I) | 14352 | 3378738     |
| 0002 (2)                 | 0/2                 | Invalid (I) | 24592 | 3378826     |
| 0003 (3)                 | 0/3                 | Shared (S)  | 47120 | 3378834     |
| 0004 (4)                 | 1/0                 | Invalid (I) | Empty | 0           |
| 0005 (5)                 | 1/1                 | Invalid (I) | Empty | 0           |
| 0006 (6)                 | 1/2                 | Invalid (I) | Empty | 0           |
| 0007 (7)                 | 1/3                 | Invalid (I) | Empty | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

|                |                     |
|----------------|---------------------|
| Access number: | 3378834             |
| Access type:   | Instruction reading |
| Address:       | c8170200 (Hex)      |
| Block:         | 47120               |
| Word:          | 0                   |

Final writebacks

**Data concerning cache 0**

|               |         |         |
|---------------|---------|---------|
| Accesses      | Actual  | Total   |
| Instructions  | 1451341 | 1451341 |
| Data readings | 1444689 | 1444689 |
| Data writings | 482804  | 482804  |

Execute Continue Stop Complete Exit

Simulation steps: 3378834

**SMPCache - Cache evolution**

Information View cache P0-P7

|                          |                     |             |        |             |
|--------------------------|---------------------|-------------|--------|-------------|
| Number of blocks: 128    | Cache of node: P0   |             |        |             |
| Mapping: Set-Associative | Cache level: 1/1    |             |        |             |
| Replacement requir: LRU  |                     |             |        |             |
| Cache sets: 32           | Cache type: Unified |             |        |             |
| Address                  | Set/Address         | State       | Block  | Last Access |
| 0000 (0)                 | 0/0                 | Invalid (I) | 0      | 3378714     |
| 0001 (1)                 | 0/1                 | Invalid (I) | 2432   | 3377874     |
| 0002 (2)                 | 0/2                 | Shared (S)  | 344096 | 3377373     |
| 0003 (3)                 | 0/3                 | Invalid (I) | 350240 | 3377858     |
| 0004 (4)                 | 1/0                 | Invalid (I) | Empty  | 0           |
| 0005 (5)                 | 1/1                 | Invalid (I) | Empty  | 0           |
| 0006 (6)                 | 1/2                 | Invalid (I) | Empty  | 0           |
| 0007 (7)                 | 1/3                 | Invalid (I) | Empty  | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

|                |                     |
|----------------|---------------------|
| Access number: | 3378834             |
| Access type:   | Instruction reading |
| Address:       | c8170200 (Hex)      |
| Block:         | 47120               |
| Word:          | 0                   |

Final writebacks

**Data concerning cache 0**

|               |         |         |
|---------------|---------|---------|
| Accesses      | Actual  | Total   |
| Instructions  | 1451341 | 1451341 |
| Data readings | 1444689 | 1444689 |
| Data writings | 482804  | 482804  |

Execute Continue Stop Complete Exit

Simulation steps: 3378834



SMPCache - Cache evolution

Information View cache P0-P7

|                        |                     |             |        |             |
|------------------------|---------------------|-------------|--------|-------------|
| Number of blocks: 1024 | Cache of node: P0   |             |        |             |
| Mapping:               | Set-Associative     |             |        |             |
| Replacement requ.: LRU | Cache level: 1/2    |             |        |             |
| Cache sets: 256        | Cache type: Unified |             |        |             |
| Address                | Set/Address         | State       | Block  | Last Access |
| 0 0000 (0)             | 0/0                 | Shared (S)  | 479744 | 3295692     |
| 0 0001 (1)             | 0/1                 | Shared (S)  | 414208 | 3295627     |
| 0 0002 (2)             | 0/2                 | Invalid (I) | 0      | 3378714     |
| 0 0003 (3)             | 0/3                 | Shared (S)  | 471552 | 3295620     |
| 0 0004 (4)             | 1/0                 | Invalid (I) | Empty  | 0           |
| 0 0005 (5)             | 1/1                 | Invalid (I) | Empty  | 0           |
| 0 0006 (6)             | 1/2                 | Invalid (I) | Empty  | 0           |
| 0 0007 (7)             | 1/3                 | Invalid (I) | Empty  | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

Actual access

|                |                     |
|----------------|---------------------|
| Access number: | 3378834             |
| Access type:   | Instruction reading |
| Address:       | c8170200 (Hex)      |
| Block:         | 47120               |
| Word:          | 0                   |

Hits-Misses

Global readings

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 2054756 | 841274  |
| Rate   | 70.951% | 29.049% |

Local readings

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 2054756 | 841274  |
| Rate   | 70.951% | 29.049% |

Global writings

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 200713  | 282091  |
| Rate   | 41.572% | 58.428% |

Local writings

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 200713  | 282091  |
| Rate   | 41.572% | 58.428% |

Global hits and misses

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 2255469 | 1123365 |
| Rate   | 66.753% | 33.247% |

Local hits and misses

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 2255469 | 1123365 |
| Rate   | 66.753% | 33.247% |

Total replacements: 531023

Cache level 1

Data concerning cache 0

Final writebacks

Execute Continue Stop Complete Exit

Simulation steps: 3378834

SMPCache - Cache evolution

Information View cache P0-P7

|                        |                     |             |        |             |
|------------------------|---------------------|-------------|--------|-------------|
| Number of blocks: 2048 | Cache of node: P0   |             |        |             |
| Mapping:               | Set-Associative     |             |        |             |
| Replacement requ.: LRU | Cache level: 1/4    |             |        |             |
| Cache sets: 512        | Cache type: Unified |             |        |             |
| Address                | Set/Address         | State       | Block  | Last Access |
| 0 0000 (0)             | 0/0                 | Shared (S)  | 471552 | 3295620     |
| 0 0001 (1)             | 0/1                 | Shared (S)  | 414208 | 3295627     |
| 0 0002 (2)             | 0/2                 | Shared (S)  | 479744 | 3295692     |
| 0 0003 (3)             | 0/3                 | Invalid (I) | 0      | 3378714     |
| 0 0004 (4)             | 1/0                 | Invalid (I) | Empty  | 0           |
| 0 0005 (5)             | 1/1                 | Invalid (I) | Empty  | 0           |
| 0 0006 (6)             | 1/2                 | Invalid (I) | Empty  | 0           |
| 0 0007 (7)             | 1/3                 | Invalid (I) | Empty  | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

Actual access

|                |                     |
|----------------|---------------------|
| Access number: | 3378834             |
| Access type:   | Instruction reading |
| Address:       | c8170200 (Hex)      |
| Block:         | 47120               |
| Word:          | 0                   |

Hits-Misses

Global readings

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 2056241 | 839789  |
| Rate   | 71.002% | 28.998% |

Local readings

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 2056241 | 839789  |
| Rate   | 71.002% | 28.998% |

Global writings

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 202136  | 280668  |
| Rate   | 41.867% | 58.133% |

Local writings

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 202136  | 280668  |
| Rate   | 41.867% | 58.133% |

Global hits and misses

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 2258377 | 1120457 |
| Rate   | 66.839% | 33.161% |

Local hits and misses

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 2258377 | 1120457 |
| Rate   | 66.839% | 33.161% |

Total replacements: 526978

Cache level 1

Data concerning cache 0

Final writebacks

Execute Continue Stop Complete Exit

Simulation steps: 3378834

## Speech



**SMPCache - Cache evolution**

Information View cache P0-P7

Number of blocks: 64 Cache of node: P0  
 Mapping: Set-Associative Cache level: 1/1  
 Replacement requ. LRU Cache type: Unified Cache address: 0  
 Cache sets: 16

| Address    | Set/Address | State         | Block  | Last Access |
|------------|-------------|---------------|--------|-------------|
| 0 0000 (0) | 0/0         | Exclusive (E) | 466192 | 1469539     |
| 0 0001 (1) | 0/1         | Exclusive (E) | 519488 | 1469540     |
| 0 0002 (2) | 0/2         | Exclusive (E) | 222496 | 1469544     |
| 0 0003 (3) | 0/3         | Exclusive (E) | 73232  | 1469543     |
| 0 0004 (4) | 1/0         | Invalid (I)   | Empty  | 0           |
| 0 0005 (5) | 1/1         | Invalid (I)   | Empty  | 0           |
| 0 0006 (6) | 1/2         | Invalid (I)   | Empty  | 0           |
| 0 0007 (7) | 1/3         | Invalid (I)   | Empty  | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Hits-Misses**

|                 | Hits    | Misses  |
|-----------------|---------|---------|
| Global readings | 267935  | 882016  |
| Number          | 267935  | 882016  |
| Rate            | 23.300% | 76.700% |

|                 | Hits    | Misses  |
|-----------------|---------|---------|
| Global writings | 220554  | 99039   |
| Number          | 220554  | 99039   |
| Rate            | 69.011% | 30.989% |

|                        | Hits    | Misses  |
|------------------------|---------|---------|
| Global hits and misses | 488489  | 981055  |
| Number                 | 488489  | 981055  |
| Rate                   | 33.241% | 66.759% |

Total replacements: 958375

**Actual access**

| Access number: | 1469544             |
|----------------|---------------------|
| Access type:   | Instruction reading |
| Address:       | 476ca400 (Hex)      |
| Block:         | 222496              |
| Word:          | 0                   |

**Data concerning cache** 0

|               | Actual  | Total   |
|---------------|---------|---------|
| Accesses      | 1469544 | 1469544 |
| Instructions  | 1       | 1       |
| Data readings | 1149950 | 1149950 |
| Data writings | 319593  | 319593  |

**Final writebacks**

Execute Continue Stop Complete Exit

Simulation steps: 1469544

**SMPCache - Cache evolution**

Information View cache P0-P7

Number of blocks: 128 Cache of node: P0  
 Mapping: Set-Associative Cache level: 1/1  
 Replacement requ. LRU Cache type: Unified Cache address: 0  
 Cache sets: 32

| Address    | Set/Address | State         | Block  | Last Access |
|------------|-------------|---------------|--------|-------------|
| 0 0000 (0) | 0/0         | Exclusive (E) | 222496 | 1469544     |
| 0 0001 (1) | 0/1         | Invalid (I)   | 519488 | 1469540     |
| 0 0002 (2) | 0/2         | Invalid (I)   | 311872 | 1469538     |
| 0 0003 (3) | 0/3         | Invalid (I)   | 481856 | 1469534     |
| 0 0004 (4) | 1/0         | Invalid (I)   | Empty  | 0           |
| 0 0005 (5) | 1/1         | Invalid (I)   | Empty  | 0           |
| 0 0006 (6) | 1/2         | Invalid (I)   | Empty  | 0           |
| 0 0007 (7) | 1/3         | Invalid (I)   | Empty  | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Hits-Misses**

|                 | Hits    | Misses  |
|-----------------|---------|---------|
| Global readings | 391740  | 758211  |
| Number          | 391740  | 758211  |
| Rate            | 34.066% | 65.934% |

|                 | Hits    | Misses  |
|-----------------|---------|---------|
| Global writings | 254632  | 64961   |
| Number          | 254632  | 64961   |
| Rate            | 79.674% | 20.326% |

|                        | Hits    | Misses  |
|------------------------|---------|---------|
| Global hits and misses | 646372  | 823172  |
| Number                 | 646372  | 823172  |
| Rate                   | 43.985% | 56.015% |

Total replacements: 798518

**Actual access**

| Access number: | 1469544             |
|----------------|---------------------|
| Access type:   | Instruction reading |
| Address:       | 476ca400 (Hex)      |
| Block:         | 222496              |
| Word:          | 0                   |

**Data concerning cache** 0

|               | Actual  | Total   |
|---------------|---------|---------|
| Accesses      | 1469544 | 1469544 |
| Instructions  | 1       | 1       |
| Data readings | 1149950 | 1149950 |
| Data writings | 319593  | 319593  |

**Final writebacks**

Execute Continue Stop Complete Exit

Simulation steps: 1469544



**SMPCache - Cache evolution**

Information View cache P0-P7

|                          |                     |               |       |             |  |
|--------------------------|---------------------|---------------|-------|-------------|--|
| Number of blocks: 1024   | Cache of node: P0   | 1/2           |       |             |  |
| Mapping: Set-Associative | Cache level: 1      |               |       |             |  |
| Replacement requir. LRU  | Cache type: Unified |               |       |             |  |
| Cache sets: 256          | Cache address: 0    |               |       |             |  |
| Address                  | Set/Address         | State         | Block | Last Access |  |
| 0000 (0)                 | 0/0                 | Exclusive (E) | 12800 | 88          |  |
| 0001 (1)                 | 0/1                 | Shared (S)    | 8704  | 1469495     |  |
| 0002 (2)                 | 0/2                 | Shared (S)    | 10752 | 1469010     |  |
| 0003 (3)                 | 0/3                 | Invalid (I)   | Empty | 0           |  |
| 0004 (4)                 | 1/0                 | Invalid (I)   | Empty | 0           |  |
| 0005 (5)                 | 1/1                 | Invalid (I)   | Empty | 0           |  |
| 0006 (6)                 | 1/2                 | Invalid (I)   | Empty | 0           |  |
| 0007 (7)                 | 1/3                 | Invalid (I)   | Empty | 0           |  |

Events visor Node inspector Hits-Misses Bus States Configuration

**Hits-Misses**

**Global readings**

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 601342  | 548609  |
| Rate   | 52.293% | 47.707% |

**Local readings**

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 601342  | 548609  |
| Rate   | 52.293% | 47.707% |

**Global writings**

|        | Hits    | Misses |
|--------|---------|--------|
| Number | 294620  | 24973  |
| Rate   | 92.186% | 7.814% |

**Local writings**

|        | Hits    | Misses |
|--------|---------|--------|
| Number | 294620  | 24973  |
| Rate   | 92.186% | 7.814% |

**Global hits and misses**

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 895962  | 573582  |
| Rate   | 60.969% | 39.031% |

**Local hits and misses**

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 895962  | 573582  |
| Rate   | 60.969% | 39.031% |

**Cache level** 1

**Actual access**

|                |                     |
|----------------|---------------------|
| Access number: | 1469544             |
| Access type:   | Instruction reading |
| Address:       | 476ca400 (Hex)      |
| Block:         | 222496              |
| Word:          | 0                   |

**Data concerning cache** 0

|               |         |         |
|---------------|---------|---------|
| Accesses      | Actual  | Total   |
| Instructions  | 1       | 1       |
| Data readings | 1149950 | 1149950 |
| Data writings | 319593  | 319593  |

**Execute**

**Continue**

**Stop**

**Complete**

**Exit**

**Simulation steps:** 1469544

||...||

**SMPCache - Cache evolution**

Information View cache P0-P7

|                          |                     |               |       |             |  |
|--------------------------|---------------------|---------------|-------|-------------|--|
| Number of blocks: 2048   | Cache of node: P0   | 1/4           |       |             |  |
| Mapping: Set-Associative | Cache level: 1      |               |       |             |  |
| Replacement requir. LRU  | Cache type: Unified |               |       |             |  |
| Cache sets: 512          | Cache address: 0    |               |       |             |  |
| Address                  | Set/Address         | State         | Block | Last Access |  |
| 0000 (0)                 | 0/0                 | Exclusive (E) | 12800 | 88          |  |
| 0001 (1)                 | 0/1                 | Shared (S)    | 8704  | 1469495     |  |
| 0002 (2)                 | 0/2                 | Shared (S)    | 10752 | 1469010     |  |
| 0003 (3)                 | 0/3                 | Invalid (I)   | Empty | 0           |  |
| 0004 (4)                 | 1/0                 | Invalid (I)   | Empty | 0           |  |
| 0005 (5)                 | 1/1                 | Invalid (I)   | Empty | 0           |  |
| 0006 (6)                 | 1/2                 | Invalid (I)   | Empty | 0           |  |
| 0007 (7)                 | 1/3                 | Invalid (I)   | Empty | 0           |  |

Events visor Node inspector Hits-Misses Bus States Configuration

**Hits-Misses**

**Global readings**

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 604975  | 544976  |
| Rate   | 52.609% | 47.391% |

**Local readings**

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 604975  | 544976  |
| Rate   | 52.609% | 47.391% |

**Global writings**

|        | Hits    | Misses |
|--------|---------|--------|
| Number | 295017  | 24576  |
| Rate   | 92.310% | 7.690% |

**Local writings**

|        | Hits    | Misses |
|--------|---------|--------|
| Number | 295017  | 24576  |
| Rate   | 92.310% | 7.690% |

**Global hits and misses**

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 899992  | 569552  |
| Rate   | 61.243% | 38.757% |

**Local hits and misses**

|        | Hits    | Misses  |
|--------|---------|---------|
| Number | 899992  | 569552  |
| Rate   | 61.243% | 38.757% |

**Cache level** 1

**Actual access**

|                |                     |
|----------------|---------------------|
| Access number: | 1469544             |
| Access type:   | Instruction reading |
| Address:       | 476ca400 (Hex)      |
| Block:         | 222496              |
| Word:          | 0                   |

**Data concerning cache** 0

|               |         |         |
|---------------|---------|---------|
| Accesses      | Actual  | Total   |
| Instructions  | 1       | 1       |
| Data readings | 1149950 | 1149950 |
| Data writings | 319593  | 319593  |

**Execute**

**Continue**

**Stop**

**Complete**

**Exit**

**Simulation steps:** 1469544

||...||

## Weather

**SMPCache - Cache evolution**

Information View cache P0-P7

| Number of blocks: 16 |                 | Cache of node: P0 |         |             |
|----------------------|-----------------|-------------------|---------|-------------|
| Mapping:             | Set-Associative | Cache level:      | 1       |             |
| Replacement requ.    | LRU             | Cache type:       | Unified |             |
| Cache sets:          | 4               | Cache address:    | 0       |             |
| Address              | Set/Address     | State             | Block   | Last Access |
| 0000 (0)             | 0/0             | Shared (S)        | 358416  | 3970483     |
| 0001 (1)             | 0/1             | Invalid (I)       | 503840  | 3970459     |
| 0002 (2)             | 0/2             | Invalid (I)       | 14592   | 3970467     |
| 0003 (3)             | 0/3             | Invalid (I)       | 333840  | 3970475     |
| 0004 (4)             | 1/0             | Invalid (I)       | Empty   | 0           |
| 0005 (5)             | 1/1             | Invalid (I)       | Empty   | 0           |
| 0006 (6)             | 1/2             | Invalid (I)       | Empty   | 0           |
| 0007 (7)             | 1/3             | Invalid (I)       | Empty   | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

|                |                     |
|----------------|---------------------|
| Access number: | 3970483             |
| Access type:   | Instruction reading |
| Address:       | 80af0200 (Hex)      |
| Block:         | 358416              |
| Word:          | 0                   |

Final writebacks

**Data concerning cache** 0

|               |         |         |
|---------------|---------|---------|
| Accesses      | Actual  | Total   |
| Accesses      | 3970483 | 3970483 |
| Instructions  | 1700791 | 1700791 |
| Data readings | 1944234 | 1944234 |
| Data writings | 325458  | 325458  |

**Execute**

Continue

Stop

1

Complete

Exit

**Simulation steps:** 3970483

---

**SMPCache - Cache evolution**

Information View cache P0-P7

| Number of blocks: 32 |                 | Cache of node: P0 |         |             |
|----------------------|-----------------|-------------------|---------|-------------|
| Mapping:             | Set-Associative | Cache level:      | 1       |             |
| Replacement requ.    | LRU             | Cache type:       | Unified |             |
| Cache sets:          | 8               | Cache address:    | 0       |             |
| Address              | Set/Address     | State             | Block   | Last Access |
| 0000 (0)             | 0/0             | Shared (S)        | 358416  | 3970483     |
| 0001 (1)             | 0/1             | Invalid (I)       | 503840  | 3970459     |
| 0002 (2)             | 0/2             | Invalid (I)       | 14592   | 3970467     |
| 0003 (3)             | 0/3             | Invalid (I)       | 333840  | 3970475     |
| 0004 (4)             | 1/0             | Invalid (I)       | Empty   | 0           |
| 0005 (5)             | 1/1             | Invalid (I)       | Empty   | 0           |
| 0006 (6)             | 1/2             | Invalid (I)       | Empty   | 0           |
| 0007 (7)             | 1/3             | Invalid (I)       | Empty   | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

|                |                     |
|----------------|---------------------|
| Access number: | 3970483             |
| Access type:   | Instruction reading |
| Address:       | 80af0200 (Hex)      |
| Block:         | 358416              |
| Word:          | 0                   |

Final writebacks

**Data concerning cache** 0

|               |         |         |
|---------------|---------|---------|
| Accesses      | Actual  | Total   |
| Accesses      | 3970483 | 3970483 |
| Instructions  | 1700791 | 1700791 |
| Data readings | 1944234 | 1944234 |
| Data writings | 325458  | 325458  |

**Execute**

Continue

Stop

1

Complete

Exit

**Simulation steps:** 3970483







## Questions

**Does the global miss rate increase or decrease as the cache size increases? Why?**

As more data may be stored in the cache at the same time, the global miss rate decreases as cache size increases. There will definitely be an influence of cache size on hit and miss rates because larger cache block size could increase the hit rate when adjacent memory blocks are accessed whereas larger cache block size i.e. if we increase the size of cache then fragmentation may occur and there could be false sharing in multiprocessor system. So, the foregoing reason concludes that increasing cache size increases hit ratio while decreasing miss ratio.

**Does this increment or decrement happen for all the benchmarks or does it depend on the different locality grades?**

When the cache size was increased, the miss rate was reduced in all benchmarks. The miss rate is affected by the locality grade since larger cache sizes benefit from spatial locality.

**What does it happen with the capacity and conflict (collision) misses when you enlarge the caches? And, what does it happen with the compulsory and coherence misses when you enlarge the caches? Are there conflict misses in these experiments? Why?**

Capacity and conflict misses were decreasing. The compulsory miss rate component remains constant as cache size is increased since it is unaffected by cache size.

**In these experiments, it may be observed that for great cache sizes, the miss rate is stabilized. Why? We can also see great differences of miss rate for a concrete increment of cache size. What do these great differences indicate? Do these great differences of miss rate appear at the same point for all the programs? Why?**

Since the compulsory miss is independent of cache size, the degree of misses stabilizes at large cache sizes.

**Compare these results with the results obtained in the Task 2 of Project 1. You can observe that the miss rates are higher for multiprocessor traces than for uniprocessor traces. Do you think that, in general, parallel programs exhibit more or less spatial and temporal locality than serial programs? Why? Is it due to the shared data? In conclusion, does the increase of cache size improve the multiprocessor system performance?**

The performance of multiprocessor systems is not entirely enhanced by increasing the cache size. The total cache size rises as we increase the number of processors. Consequently, it typically results in a decrease in capacity misses.

## Task 02

Influence of the Cache Size on the Bus Traffic - Show the influence of the cache size on the bus traffic during the execution of a parallel program in a SMP.

### Configuration

Processors in SMP = 8.

Cache coherence protocol = MESI.

Scheme for bus arbitration = LRU.

Word wide (bits) = 16.

Words by block = 32 (block size = 64 bytes).

Blocks in main memory = 524288 (main memory size = 32 MB).

Mapping = Set-Associative.

Cache sets = They will vary depending on the number of blocks in cache, but you must always have four-way set associative caches (remember: Number\_of\_ways = Number\_of\_blocks\_in\_cache / Number\_of\_cache\_sets).

Replacement policy = LRU.

Configure the blocks in cache using the following configurations: 16 (cache size = 1 KB), 32, 64, 128, 256, 512, 1024, and 2048 (cache size = 128 KB). For each of the configurations, obtain the bus traffic (in bytes per memory access) for the system using the trace files: FFT, Simple, Speech and Weather. In order to compute the bus traffic, assume that cache block transfers move 64 bytes (the block size) on the bus data lines, and that each bus transaction involves six bytes of command and address on the bus address lines. Therefore, you can compute the address traffic (including command) by multiplying the obtained bus transactions by the traffic per transaction (6 bytes). In the same way, you can compute the data traffic by multiplying the number of block transfers by the traffic per transfer (64 bytes).

The total bus traffic, in bytes per memory access, will be the addition of these two quantities divided by the number of memory accesses (references) in the trace (see Table 2).

## Screenshots

### FFT



**SMPCache - Cache evolution**

Information View cache P0-P7

|                     |                 |                |         |             |
|---------------------|-----------------|----------------|---------|-------------|
| Number of blocks:   | 64              | Cache of node: | P0      |             |
| Mapping:            | Set-Associative | Cache level:   | 1       |             |
| Replacement requir. | LRU             | 1/1            |         |             |
| Cache sets:         | 16              | Cache type:    | Unified |             |
| Cache address:      | 0               |                |         |             |
| Address             | Set/Address     | State          | Block   | Last Access |
| 0 0000 (0)          | 0/0             | Modified (M)   | 86048   | 931459      |
| 0 0001 (1)          | 0/1             | Exclusive (E)  | 0       | 931469      |
| 0 0002 (2)          | 0/2             | Exclusive (E)  | 43024   | 931451      |
| 0 0003 (3)          | 0/3             | Exclusive (E)  | 28688   | 931419      |
| 0 0004 (4)          | 1/0             | Invalid (I)    | Empty   | 0           |
| 0 0005 (5)          | 1/1             | Invalid (I)    | Empty   | 0           |
| 0 0006 (6)          | 1/2             | Invalid (I)    | Empty   | 0           |
| 0 0007 (7)          | 1/3             | Invalid (I)    | Empty   | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

|                |                |
|----------------|----------------|
| Access number: | 931469         |
| Access type:   | Data reading   |
| Address:       | 00000000 (Hex) |
| Block:         | 0              |
| Word:          | 0              |

**Final writebacks**

**Data concerning cache** 0

|               |        |        |
|---------------|--------|--------|
| Accesses      | Actual | Total  |
| Accesses      | 931469 | 931469 |
| Instructions  | 390534 | 390534 |
| Data readings | 367258 | 367258 |
| Data writings | 173677 | 173677 |

**Execute**

Continue

Stop

1 Complete

Exit

**Simulation steps:** 931469

**SMPCache - Cache evolution**

Information View cache P0-P7

|                     |                 |                |         |             |
|---------------------|-----------------|----------------|---------|-------------|
| Number of blocks:   | 128             | Cache of node: | P0      |             |
| Mapping:            | Set-Associative | Cache level:   | 1       |             |
| Replacement requir. | LRU             | 1/1            |         |             |
| Cache sets:         | 32              | Cache type:    | Unified |             |
| Cache address:      | 0               |                |         |             |
| Address             | Set/Address     | State          | Block   | Last Access |
| 0 0000 (0)          | 0/0             | Invalid (I)    | 2432    | 931388      |
| 0 0001 (1)          | 0/1             | Invalid (I)    | 0       | 931469      |
| 0 0002 (2)          | 0/2             | Invalid (I)    | 86048   | 931459      |
| 0 0003 (3)          | 0/3             | Invalid (I)    | 100384  | 931380      |
| 0 0004 (4)          | 1/0             | Invalid (I)    | Empty   | 0           |
| 0 0005 (5)          | 1/1             | Invalid (I)    | Empty   | 0           |
| 0 0006 (6)          | 1/2             | Invalid (I)    | Empty   | 0           |
| 0 0007 (7)          | 1/3             | Invalid (I)    | Empty   | 0           |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

|                |                |
|----------------|----------------|
| Access number: | 931469         |
| Access type:   | Data reading   |
| Address:       | 00000000 (Hex) |
| Block:         | 0              |
| Word:          | 0              |

**Final writebacks**

**Data concerning cache** 0

|               |        |        |
|---------------|--------|--------|
| Accesses      | Actual | Total  |
| Accesses      | 931469 | 931469 |
| Instructions  | 390534 | 390534 |
| Data readings | 367258 | 367258 |
| Data writings | 173677 | 173677 |

**Execute**

Continue

Stop

1 Complete

Exit

**Simulation steps:** 931469





## Simple









## Speech





**SMPCache - Cache evolution**

Information View cache P0-P7

| Number of blocks: 256 |                                                        | Cache of node: P0 |         | Cache level: 1/1 |
|-----------------------|--------------------------------------------------------|-------------------|---------|------------------|
| Mapping:              | Set-Associative <th>Cache type:</th> <td>Unified </td> | Cache type:       | Unified |                  |
| Replacement requ:     | LRU                                                    | Cache address:    | 0       |                  |
| Cache sets:           | 64                                                     |                   |         |                  |
| Address               | Set/Address                                            | State             | Block   | Last Access      |
| 0000 (0)              | 0/0                                                    | Shared (S)        | 307776  | 1469530          |
| 0001 (1)              | 0/1                                                    | Invalid (I)       | 311872  | 1469538          |
| 0002 (2)              | 0/2                                                    | Invalid (I)       | 481856  | 1469534          |
| 0003 (3)              | 0/3                                                    | Invalid (I)       | 519488  | 1469540          |
| 0004 (4)              | 1/0                                                    | Invalid (I)       | Empty   | 0                |
| 0005 (5)              | 1/1                                                    | Invalid (I)       | Empty   | 0                |
| 0006 (6)              | 1/2                                                    | Invalid (I)       | Empty   | 0                |
| 0007 (7)              | 1/3                                                    | Invalid (I)       | Empty   | 0                |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

|                |                     |
|----------------|---------------------|
| Access number: | 1469544             |
| Access type:   | Instruction reading |
| Address:       | 476ca400 (Hex)      |
| Block:         | 222496              |
| Word:          | 0                   |

Final writebacks

**Data concerning cache** 0

|               |         |         |
|---------------|---------|---------|
| Accesses      | Actual  | Total   |
| Instructions  | 1       | 1       |
| Data readings | 1149950 | 1149950 |
| Data writings | 319593  | 319593  |

**Execute**

**Continue**

**Stop**

**Complete**

**Exit**

Simulation steps: 1469544

**SMPCache - Cache evolution**

Information View cache P0-P7

| Number of blocks: 512 |                                                       | Cache of node: P0 |         | Cache level: 1/1 |
|-----------------------|-------------------------------------------------------|-------------------|---------|------------------|
| Mapping:              | Set-Associative <th>Cache type:</th> <td>Unified</td> | Cache type:       | Unified |                  |
| Replacement requ:     | LRU                                                   | Cache address:    | 0       |                  |
| Cache sets:           | 128                                                   |                   |         |                  |
| Address               | Set/Address                                           | State             | Block   | Last Access      |
| 0000 (0)              | 0/0                                                   | Exclusive (E)     | 12800   | 88               |
| 0001 (1)              | 0/1                                                   | Shared (S)        | 8704    | 1469495          |
| 0002 (2)              | 0/2                                                   | Shared (S)        | 10752   | 1469010          |
| 0003 (3)              | 0/3                                                   | Invalid (I)       | Empty   | 0                |
| 0004 (4)              | 1/0                                                   | Invalid (I)       | Empty   | 0                |
| 0005 (5)              | 1/1                                                   | Invalid (I)       | Empty   | 0                |
| 0006 (6)              | 1/2                                                   | Invalid (I)       | Empty   | 0                |
| 0007 (7)              | 1/3                                                   | Invalid (I)       | Empty   | 0                |

Events visor Node inspector Hits-Misses Bus States Configuration

**Actual access**

|                |                     |
|----------------|---------------------|
| Access number: | 1469544             |
| Access type:   | Instruction reading |
| Address:       | 476ca400 (Hex)      |
| Block:         | 222496              |
| Word:          | 0                   |

Final writebacks

**Data concerning cache** 0

|               |         |         |
|---------------|---------|---------|
| Accesses      | Actual  | Total   |
| Instructions  | 1       | 1       |
| Data readings | 1149950 | 1149950 |
| Data writings | 319593  | 319593  |

**Execute**

**Continue**

**Stop**

**Complete**

**Exit**

Simulation steps: 1469544



## Weather





**SMPCache - Cache evolution**

Information View cache P0-P7

| Number of blocks: 256 |                 | Cache of node: P0 |         | Cache level: 1/1 | Cache address: 0 |
|-----------------------|-----------------|-------------------|---------|------------------|------------------|
| Mapping:              | Set-Associative | Cache type:       | Unified |                  |                  |
| Cache sets: 64        |                 | Cache level: 1/1  |         |                  |                  |
| Address               | Set/Address     | State             | Block   | Last Access      |                  |
| 0000 (0)              | 0/0             | Invalid (I)       | 0       | 3969678          |                  |
| 0001 (1)              | 0/1             | Invalid (I)       | 14592   | 3970467          |                  |
| 0002 (2)              | 0/2             | Invalid (I)       | 12544   | 3970419          |                  |
| 0003 (3)              | 0/3             | Shared (S)        | 370752  | 3969057          |                  |
| 0004 (4)              | 1/0             | Invalid (I)       | Empty   | 0                |                  |
| 0005 (5)              | 1/1             | Invalid (I)       | Empty   | 0                |                  |
| 0006 (6)              | 1/2             | Invalid (I)       | Empty   | 0                |                  |
| 0007 (7)              | 1/3             | Invalid (I)       | Empty   | 0                |                  |

Events visor Node inspector Hits-Misses Bus States Configuration

**Bus**

**Number of bus transactions**

|        | Number |
|--------|--------|
| BusRd  | 930265 |
| BusRdX | 245435 |
| BusWb  | 199384 |
| BusUpd | 0      |

Bus arbitration: LRU

Block transfers: 1219438

Bus transactions: 1375084

**Actual access**

Access number: 3970483  
Access type: Instruction reading  
Address: 80af0200 (Hex)  
Block: 358416  
Word: 0

**Data concerning cache** 0

|               | Actual  | Total   |
|---------------|---------|---------|
| Accesses      | 3970483 | 3970483 |
| Instructions  | 1700791 | 1700791 |
| Data readings | 1944234 | 1944234 |
| Data writings | 325458  | 325458  |

**Execute**

Continue

Stop

Complete

Exit

Simulation steps: 3970483

**SMPCache - Cache evolution**

Information View cache P0-P7

| Number of blocks: 512 |                 | Cache of node: P0 |         | Cache level: 1/1 | Cache address: 0 |
|-----------------------|-----------------|-------------------|---------|------------------|------------------|
| Mapping:              | Set-Associative | Cache type:       | Unified |                  |                  |
| Cache sets: 128       |                 | Cache level: 1/1  |         |                  |                  |
| Address               | Set/Address     | State             | Block   | Last Access      |                  |
| 0000 (0)              | 0/0             | Invalid (I)       | 14592   | 3970467          |                  |
| 0001 (1)              | 0/1             | Invalid (I)       | 0       | 3969678          |                  |
| 0002 (2)              | 0/2             | Invalid (I)       | 12544   | 3970419          |                  |
| 0003 (3)              | 0/3             | Shared (S)        | 460032  | 3343785          |                  |
| 0004 (4)              | 1/0             | Invalid (I)       | Empty   | 0                |                  |
| 0005 (5)              | 1/1             | Invalid (I)       | Empty   | 0                |                  |
| 0006 (6)              | 1/2             | Invalid (I)       | Empty   | 0                |                  |
| 0007 (7)              | 1/3             | Invalid (I)       | Empty   | 0                |                  |

Events visor Node inspector Hits-Misses Bus States Configuration

**Bus**

**Number of bus transactions**

|        | Number |
|--------|--------|
| BusRd  | 903505 |
| BusRdX | 243072 |
| BusWb  | 194088 |
| BusUpd | 0      |

Bus arbitration: LRU

Block transfers: 1186694

Bus transactions: 1340665

**Actual access**

Access number: 3970483  
Access type: Instruction reading  
Address: 80af0200 (Hex)  
Block: 358416  
Word: 0

**Data concerning cache** 0

|               | Actual  | Total   |
|---------------|---------|---------|
| Accesses      | 3970483 | 3970483 |
| Instructions  | 1700791 | 1700791 |
| Data readings | 1944234 | 1944234 |
| Data writings | 325458  | 325458  |

**Execute**

Continue

Stop

Complete

Exit

Simulation steps: 3970483





## Questions

### **Does the global bus traffic increase or decrease as the cache size increases?**

1. FFT - According to the findings, as cache size increases, bus traffic drops.
2. Speech - The outcome demonstrates that when cache size increases, bus traffic reduces.
3. Simple - As the cache size grows, bus traffic declines.
4. Weather - The findings indicate that as cache size increases, bus traffic reduces.

### **Why (give two reasons, one for the data traffic and another for the address+command bus traffic)?**

1. FFT - The link between bus traffic, address and data traffic explains why bus travel is declining. According to the findings, bus traffic is impacted by a decrease in bus traffic when there is less data traffic flowing. Similar to address traffic, bus traffic is impacted by a decrease in address traffic when it is present.
2. Speech – As per the findings, when there is less data traffic flowing, it influences bus traffic by reducing bus traffic. In the same way that address traffic influences bus traffic, a decrease in address traffic also affects bus traffic.
3. Simple - According to the observation, bus traffic declines as a result of a fall in data traffic flow. Furthermore, when there is a decrease in the flow of address traffic, this observation also applies to the address traffic due to a bus traffic drop
4. Weather - According to the findings, a decrease in data traffic has an impact on bus traffic by reducing it. In the same way that address traffic influences bus traffic, a decrease in address traffic also affects bus traffic.

As the cache size grows, global bus traffic will decrease. This is conceivable due to the following factors:

1. As cache size rises, the time required to access data decreases, and data accessibility increases as more data may be stored beside the CPU. As the size increases, more room is available to store data, lowering traffic because the data required by the CPU is mostly already in the cache.
2. The miss-hit and miss ratio will decrease with an increase in cache size. Global bus traffic is often divided into address and data traffic. The amount of address bus traffic will decrease as the cache size increases because more data can be stored there. The amount of misses decreases as a result of less bus traffic, which also results in less transfer between main memory and the cache and less address+command bus traffic.

**Does this increment or decrement happen for all the benchmarks?**

Yes, there was a drop in bus traffic across all benchmarks.

**In these experiments, it may be observed that for great cache sizes, the bus traffic is stabilized. Why?**

The reason why bus traffic stabilizes at large cache sizes is that the cache can hold more data, which minimizes the frequency of data transfers between the cache and main memory. Less bus transactions follow, which stabilizes the volume of bus traffic. The cache is a high-speed data storage layer in a computer system that holds a portion of data, usually temporary, in order to expedite the processing of subsequent requests for that data. It has the capacity to store more data when the cache size is raised. This indicates that the system is more likely to locate the necessary data in the cache than in main memory, which results in a quicker operation. This is referred to as a cache hit. However, the system must extract the necessary data from main memory if it cannot be found in the cache. Cache misses occur when this happens and necessitate a bus transaction, which raises bus traffic. There are therefore more cache hits and fewer cache misses when the cache size is large. As a result, fewer bus transactions are required, which eventually leads to a decrease and stabilization of bus traffic.

**We can also see great differences in bus traffic for a concrete increment of cache size. What do these great differences indicate?**

1.FFT - The discrepancies show that bus traffic slowly declines as cache size rises. This can mean that fewer data transfers are occurring. Also, because miss rates reduce as cache size increases, the difference suggests that because there are fewer misses, there will be fewer bus transactions and hence less bus traffic.

2.Weather - The variances show that bus traffic slowly declines as cache size rises. This can mean that fewer data transfers are occurring. The difference also shows that since there are fewer misses, there will be fewer bus transactions, which will reduce bus traffic, as the miss rates likewise fall as the cache size increases.

3.Simple- The difference shows that bus traffic significantly decreases as cache size advances. Furthermore, this pattern suggests that fewer data transfers are occurring. On the other hand, when cache capacity grows, the miss rate falls. Last but not least, less miss rates translate into fewer bus transactions, which in turn reduces bus traffic.

4.Speech- The same finding was noted for the speech; the difference shows that bus traffic decreases drastically as cache size increases. Furthermore, this trend suggests that fewer data transfers are occurring. On the other hand, when cache size grows, the miss rate reduces. Last but not least, less miss rates translate into fewer bus transactions, which in turn reduces bus traffic.

**Do these great differences of bus traffic appear at the same point for all the programs? Why?**

Whether the significant variations in bus traffic occur at the same time for every program is not made clear by the context. Nevertheless, it may be assumed that depending on the program, different points at which these disparities manifest might occur. This is due to the fact that different programs deal with different data and traffic patterns, which may have an impact on how quickly bus traffic declines as cache size grows. The program's complexity, volume of data processed, and cache usage are some of the variables that can affect when these variations in bus traffic become apparent.

**In conclusion, does the increase of cache size improve the multiprocessor system performance? Are the conclusions you obtain similar to the previous ones for the miss rate (Task 1)?**

We can conclude that the multiprocessor system has improved when the cache size rises based on the results, which show that the bus transaction rate, address, data, bus traffic, and miss rate have all dropped as the cache size increases.

## Task 03

Influence of the Cache Coherence Protocol on the Miss Rate - Study the influence of the cache coherence protocol on the miss rate during the execution of a parallel program in a symmetric multiprocessor.

### Configuration

Processors in SMP = 8.

Scheme for bus arbitration = LRU.

Word wide (bits) = 16.

Words by block = 32 (block size = 64 bytes).

Blocks in main memory = 524288 (main memory size = 32 MB).

Blocks in cache = 256 (cache size = 16 KB).

Mapping = Set-Associative.

Cache sets = 64 (four-way set associative caches).

Replacement policy = LRU.

Configure the cache coherence protocol using the following configurations: MSI, MESI, and DRAGON. For each of the configurations, obtain the global miss rate for the system using the memory traces: FFT, Simple, Speech and Weather.

## Screenshots

### FFT:

MSI



### MESI



## DRAGON

**Simple:****MSI**

## MESI



## DRAGON



**Speech:****MSI****MESI**

## DRAGON



## Weather

### MSI



## MESI



## DRAGON



## Questions

**Do all the protocols have the same miss rate? Which is the coherence protocol with the best miss rate? And which does it have the worst? In particular, is the miss rate the same for the MSI and MESI protocols? Why?**

Not all protocols exhibit the same miss rate; rather, the miss rate of a protocol is contingent upon its design and implementation. The protocol boasting the optimal miss rate may differ based on the specific workload and system configuration. Generally, the MESI (Modified, Exclusive, Shared, Invalid) protocol tends to outperform other protocols, such as MSI (Modified, Shared, Invalid), by incorporating an additional state (Exclusive) that mitigates unnecessary invalidations, thereby lowering the miss rate.

Conversely, the protocol with the highest miss rate can vary, but simpler protocols like the invalidation-based protocol typically experience elevated miss rates. This is due to the practice of invalidating a cache line whenever another cache writes to it, resulting in more frequent cache misses.

It's important to note that the miss rates differ between the MSI and MESI protocols. As previously mentioned, the MESI protocol tends to exhibit a lower miss rate, thanks to its inclusion of an extra state that aids in reducing unnecessary invalidations.

**Do you observe any difference between the update-based protocol and the invalidation-based protocols? Which? Why? Are the coherence misses the same for these two kinds of protocols?**

A distinction exists between the update-based protocol and invalidation-based protocols. In the update-based protocol, the value of a cache line is updated in all caches whenever any cache writes to it. This practice can decrease the occurrence of cache misses since the most recent value remains consistently available in the cache. Conversely, the invalidation-based protocol invalidates a cache line whenever another cache writes to it, resulting in a higher frequency of cache misses.

The coherence misses differ for these two types of protocols. The update-based protocol typically experiences fewer coherence misses because it updates the value of a cache line in all caches upon being written to. In contrast, the invalidation-based protocol tends to incur more coherence misses due to its approach of invalidating a cache line whenever written to by another cache.

**Do you think that the results and conclusions obtained with these experiments are of general application or they may change depending on the used benchmarks? In conclusion, does the use of a concrete cache coherence protocol improve the multiprocessor system performance? Why?**

The outcomes and inferences drawn from these experiments may lack universal applicability, as they are subject to change based on the benchmarks employed. Varied benchmarks exhibit distinct memory access patterns, influencing both miss rates and the efficacy of diverse coherence protocols. Consequently, it becomes crucial to utilize a diverse array of benchmarks to attain a comprehensive comprehension of the performance associated with different coherence protocols.

In conclusion, the use of a specific cache coherence protocol can indeed impact multiprocessor system performance. The outcomes of the experiments highlight that not all protocols exhibit the same miss rate, and their effectiveness depends on various factors, including design, implementation, workload, and system configuration.

The MESI (Modified, Exclusive, Shared, Invalid) protocol, in particular, tends to demonstrate a lower miss rate compared to alternatives like MSI (Modified, Shared, Invalid). This advantage is attributed to the MESI protocol's incorporation of an additional state (Exclusive), which helps reduce unnecessary invalidations, consequently lowering the miss rate. On the other hand, simpler protocols, such as the invalidation-based protocol, generally exhibit higher miss rates due to the frequent invalidation of cache lines upon being written to by another cache.

Moreover, the coherence misses, which impact the synchronization of data among caches, also differ between update-based and invalidation-based protocols. The update-based protocol tends to incur fewer coherence misses because it updates the value of a cache line in all caches whenever it is written to, ensuring that the most recent value is consistently available. Conversely, the invalidation-based protocol, by invalidating cache lines upon being written to by another cache, tends to result in more coherence misses.

While these results provide valuable insights, it is crucial to note that the effectiveness of a cache coherence protocol is context-dependent. The choice of benchmarks significantly influences the memory access patterns and, consequently, the miss rate and performance of different coherence protocols. Therefore, to obtain a comprehensive understanding of performance implications, it is essential to use a variety of benchmarks. In summary, the selection and implementation of a concrete cache coherence protocol can indeed enhance multiprocessor system performance, but its effectiveness is contingent on various factors and should be evaluated within the specific context of the system and workload.

## Task 04

Influence of the Cache Coherence Protocol on the Bus Traffic Analyze the influence of the cache coherence protocol on the bus traffic during the execution of a parallel program in a SMP.

### Configuration

Processors in SMP = 8.

Scheme for bus arbitration = LRU.

Word wide (bits) = 16.

Words by block = 32 (block size = 64 bytes).

Blocks in main memory = 524288 (main memory size = 32 MB).

Blocks in cache = 256 (cache size = 16 KB).

Mapping = Set-Associative.

Cache sets = 64 (four-way set associative caches).

Replacement policy = LRU.

Configure the cache coherence protocol using the following configurations: MSI, MESI, and DRAGON. For each of the configurations, obtain the bus traffic (in bytes per memory access) for the system using the trace files: FFT, Simple, Speech and Weather. In order to compute the bus traffic, assume that cache block transfers move 64 bytes (the block size) on the bus data lines, and that each bus transaction involves six bytes of command and address on the bus address lines. Therefore, you can compute the address traffic (including command) by multiplying the obtained bus transactions by the traffic per transaction (6 bytes). In the same way, you can compute the data traffic by multiplying the number of block transfers by the traffic per transfer (64 bytes). The total bus traffic, in bytes per memory access, will be the addition of these two quantities divided by the number of memory accesses (references) in the trace (see Table 2).

## Screenshots

### FFT:

MSI



### MESI



## DRAGON

**Simple:****MSI**

## MESI



## DRAGON



## Speech:

### MSI



### MESI



## DRAGON



## Weather

### MSI



## MESI



## DRAGON



## Calculations Table

|         | Snoopy Protocol | Bus Transactions | Address Traffic    | Block Transfers | Data Traffic        | Address Traffic + Data Traffic | Number of memory references | Total Bus Traffic |
|---------|-----------------|------------------|--------------------|-----------------|---------------------|--------------------------------|-----------------------------|-------------------|
| FFT     | MSI             | 3430243.00       | <b>20581458.00</b> | 2944993.00      | <b>188479552.00</b> | 209061010.00                   | 7451717.00                  | <b>28.06</b>      |
|         | MESI            | 3273837.00       | <b>19643022.00</b> | 2741568.00      | <b>175460352.00</b> | 195103374.00                   | 7451717.00                  | <b>26.18</b>      |
|         | DRAGON          | 1690864.00       | <b>10145184.00</b> | 813596.94       | <b>52070204.00</b>  | 62215388.00                    | 7451717.00                  | <b>8.35</b>       |
| Simple  | MSI             | 12641064.00      | <b>75846384.00</b> | 11168605.00     | <b>714790720.00</b> | 790637104.00                   | 27029752.00                 | <b>29.25</b>      |
|         | MESI            | 12539908.00      | <b>75239448.00</b> | 11035219.00     | <b>706254016.00</b> | 781493464.00                   | 27029752.00                 | <b>28.91</b>      |
|         | DRAGON          | 11503193.00      | <b>69019158.00</b> | 8353913.66      | <b>534650474.00</b> | 603669632.00                   | 27029752.00                 | <b>22.33</b>      |
| Speech  | MSI             | 9229455.00       | <b>55376730.00</b> | 9000294.00      | <b>576018816.00</b> | 631395546.00                   | 11771664.00                 | <b>53.64</b>      |
|         | MESI            | 8190260.00       | <b>49141560.00</b> | 7960387.00      | <b>509464768.00</b> | 558606328.00                   | 11771664.00                 | <b>47.45</b>      |
|         | DRAGON          | 8234730.00       | <b>49408380.00</b> | 7945684.06      | <b>508523780.00</b> | 557932160.00                   | 11771664.00                 | <b>47.40</b>      |
| Weather | MSI             | 10982265.00      | <b>65893590.00</b> | 9768844.00      | <b>625206016.00</b> | 691099606.00                   | 31764036.00                 | <b>21.76</b>      |
|         | MESI            | 10859541.00      | <b>65157246.00</b> | 9646703.00      | <b>617388992.00</b> | 682546238.00                   | 31764036.00                 | <b>21.49</b>      |
|         | DRAGON          | 9550397.00       | <b>57302382.00</b> | 7588793.53      | <b>485682786.00</b> | 542985168.00                   | 31764036.00                 | <b>17.09</b>      |

## Questions

**Do all the protocols have the same bus traffic? Which is the coherence protocol with the best bus traffic? And which does it have the worst? In particular, is the bus traffic the same for the MSI and MESI protocols? Why (for this answer, remember how the miss rate was for these two protocols – Task 3)?**

As per the information in the table above, it's evident that not all protocols exhibit the same level of bus traffic. The DRAGON memory traces emerge as the most favorable for coherence protocol, while the least favorable ones are associated with MSI and MESI memory traces.

The bus traffic for MSI and MESI protocols is nearly identical, attributable to both being invalidation-based protocols. This similarity in bus traffic is a consequence of the higher miss rate inherent in invalidation-based protocols. With protocols such as MSI and MESI, as the number of processors increases, multiple caches end up sharing the same block. Consequently, during a write operation, a cache compels other caches to invalidate that particular block, leading to new misses and a rise in the count of block transfers.

In essence, in invalidation-based protocols like MSI and MESI, an escalation in the number of processors results in several caches sharing blocks, leading to more frequent invalidations during write operations. This, in turn, increases both the miss rate and bus traffic. Simply put, as the number of processors grows for a given problem size, the working set initially fitting in the

cache gives way to a scenario where coherence misses dominate, necessitating a greater number of bus transactions to maintain cache coherence.

**Do you observe any difference between the update-based protocol and the invalidation based protocols? Which? Why (give at least two reasons)?**

Certainly. The DRAGON protocol is an example of an update-based protocol, while the MSI and MESI protocols are instances of invalidation-based protocols.

**Update-Based Protocols:** These protocols, such as the DRAGON protocol, eradicate all coherence misses by updating all copies of a memory block with a new value instead of invalidating them when a write request to a shared block is made. The advantage of eliminating coherence misses comes at the cost of an increased number of global write actions.

**Invalidate-Based Protocols:** Invalidation-based protocols like MSI and MESI exhibit lower traffic due to a sequence of writes from the same processor with no intervening access from other processors. Only the first write in this sequence triggers global traffic. However, the drawback is that the invalidation of remote copies leads to coherence misses.

**Do you think that the results and conclusions obtained with these experiments are of general application or they may change depending on the used benchmarks? Indicate other scenario in which the invalidation protocol does much better than the update protocol. In conclusion, does the use of a concrete cache coherence protocol improve the multiprocessor system performance? Why? Are the conclusions you obtain similar to the previous ones for the miss rate (Task 3)?**

We have compiled the results displayed in the table above. The miss rate for both the MSI and MESI protocols is identical, aligning with the theoretical expectation since MESI is essentially an enhancement of MSI aimed at reducing bus transactions in the coherence protocol. The table highlights that, despite the nearly equivalent miss rates of MSI and MESI, the MESI protocol results in less bus traffic.

Additionally, the DRAGON protocol exhibits the lowest miss rate and bus traffic. This is attributed to the fact that DRAGON is an update-based protocol, in contrast to MSI and MESI, which are invalidation-based protocols. In update-based protocols like DRAGON, when a processor writes to a shared location, the value is updated in the caches of all other processors holding that memory block. Moreover, the DRAGON protocol refines updates to single-word writes instead of full cache block transfers.

On the contrary, invalidation-based protocols, such as MSI and MESI, set the cache state of the memory block to invalid in all other processors' caches during a write operation. Consequently, those processors need to obtain the block through a miss, resulting in significant bus traffic. It's important to note that various scenarios can be devised where the invalidation protocol outperforms the update protocol.

In summary, we infer that an increase in the number of processors for a parallel application leads to higher miss rates and bus traffic. This is particularly evident in invalidation-based protocols

like MESI, where an escalation in processors increases the likelihood of multiple caches sharing the same block. As a result, during a write operation, one cache compels others to invalidate that block, leading to new misses and an increase in block transfers. Furthermore, a greater number of processors necessitates more bus transactions to maintain cache coherence. In essence, as the number of processors grows for a given problem size, the working set initially fitting in the cache shifts from being dominated by local misses to being dominated by coherence misses.

## Task 05

Influence of the Number of Processors on the Miss Rate - Study the influence of the number of processors on the miss rate during the execution of a parallel program in a symmetric multiprocessor.

### Configuration

Cache coherence protocol = MESI.

Scheme for bus arbitration = LRU.

Word wide (bits) = 16.

Words by block = 32 (block size = 64 bytes).

Blocks in main memory = 524288 (main memory size = 32 MB).

Blocks in cache = 256 (cache size = 16 KB).

Mapping = Set-Associative.

Cache sets = 64 (four-way set associative caches).

Replacement policy = LRU.

Configure the number of processors in the SMP using the following configurations: 1, 2, 4, and 8. For each of the configurations, obtain the global miss rate for the system using the three traces that were generated by the post-mortem scheme: FFT, Simple and Weather.

## Screenshots

### FFT:

#### 1 Processors



#### 2 Processors



## 4 Processors



## 8 Processors



**Simple:****1 Processors****2 Processors**

## 4 Processors



## 8 Processors



## Weather

### 1 Processors



### 2 Processors



## 4 Processors



## 8 Processors



**Table**

| Processors | FFT   | Simple | Weather |
|------------|-------|--------|---------|
| 1          | 8.1%  | 14.6%  | 15.2%   |
| 2          | 20.9% | 23.9%  | 20.0%   |
| 4          | 26.8% | 30.9%  | 22.4%   |
| 8          | 32.5% | 36.3%  | 26.9%   |

**Questions**

**Does the global miss rate increase or decrease as the number of processors increases? Why?**

The worldwide miss rate rises with an escalation in the number of processors. This is due to the heightened likelihood of two or more processors attempting to access identical data concurrently as the number of processors grows. In such instances, a cache miss is triggered.

**Does this increment or decrement happen for all the benchmarks or does it depend on the different locality grades, shared data...?**

The rise in the overall miss rate occurs across all benchmarks. This is attributed to the fact that these benchmarks collectively access shared data. As multiple processors access the same data, the likelihood of a cache miss increases.

**What happens with the capacity and coherence misses when you enlarge the number of processors?**

With an escalation in the number of processors, both capacity misses and coherence misses witness an increase. Capacity misses occur when a processor attempts to access data not present in its cache, while coherence misses arise when a processor tries to access data residing in another processor's cache.

**Are there conflict misses in these experiments? Why?**

The observed trends in increasing miss rates with a higher number of processors, variations across traces, and potential saturation effects strongly suggest the presence of conflict misses in these experiments. The contention for cache resources, particularly in set-associative caches, is a key factor contributing to the observed miss rate patterns. Further investigation into cache utilization, access patterns, and specific cache sets experiencing contention would provide more detailed insights into the nature of conflict misses in this symmetric multiprocessor system.

**Do you think that the results and conclusions obtained with these experiments are of general application or they may change depending on the cache coherence protocol? Why?**

The results and conclusions obtained from these experiments may indeed change depending on the cache coherence protocol used. The choice of cache coherence protocol is a crucial factor that can influence the generalizability of results. The conclusions drawn from experiments using one coherence protocol may not directly apply to systems employing a different protocol. To make broad generalizations about the impact of the number of processors on miss rates and conflict misses, it's essential to consider the specific characteristics of the coherence protocol in use. Researchers and practitioners should carefully choose coherence protocols based on the requirements of their applications and consider how these choices may influence system behavior.

The results and conclusions derived from these experiments may vary depending on the cache coherence protocol utilized. Cache coherence protocols, such as MESI (Modified, Exclusive, Shared, Invalid), play a critical role in managing the consistency of shared data among multiple processors. The specific actions taken during read and write operations, the handling of invalidations, and updates can differ across coherence protocols, impacting the occurrence of conflict misses. The sensitivity of a coherence protocol to access patterns, the management of concurrent accesses, and the associated bus utilization and overhead can all influence the observed performance characteristics of a multiprocessor system. Consequently, conclusions about the impact of the number of processors on miss rates and conflict misses should be interpreted in the context of the specific coherence protocol employed, and generalizability to systems with different protocols may be limited. Careful consideration of coherence protocol choices is crucial for understanding and applying findings to diverse multiprocessor environments.

**In conclusion, does the increase of the number of processors improve the multiprocessor system performance? Why and in what sense?**

In summary, augmenting the number of processors has the potential to enhance multiprocessor system performance. Nevertheless, the effectiveness of this improvement hinges on the availability of sufficient bus bandwidth. If the multiprocessor system possesses ample bus bandwidth, performance will likely see an uptick; otherwise, the system may experience slowdowns.

## Task 06

Influence of the Number of Processors on the Bus Traffic - Evaluate the influence of the number of processors on the bus traffic during the execution of a parallel program in a symmetric multiprocessor.

### Configuration

Cache coherence protocol = MESI.

Scheme for bus arbitration = LRU.

Word wide (bits) = 16.

Words by block = 32 (block size = 64 bytes).

Blocks in main memory = 524288 (main memory size = 32 MB).

Blocks in cache = 256 (cache size = 16 KB).

Mapping = Set-Associative.

Cache sets = 64 (four-way set associative caches).

Replacement policy = LRU.

Configure the number of processors in the SMP using the following configurations: 1, 2, 4, and 8. For each of the configurations, obtain the bus traffic (in bytes per memory access) for the system using the three traces that were generated by the post-mortem scheme: FFT, Simple and Weather. Remember: In order to compute the bus traffic, assume that cache block transfers move 64 bytes (the block size) on the bus data lines, and that each bus transaction involves six bytes of command and address on the bus address lines. Therefore, you can compute the address traffic (including command) by multiplying the obtained bus transactions by the traffic per transaction (6 bytes). In the same way, you can compute the data traffic by multiplying the number of block transfers by the traffic per transfer (64 bytes). The total bus traffic, in bytes per memory access, will be the addition of these two quantities divided by the number of memory accesses (references) in the trace (see Table 2).

## Screenshots

### FFT:

#### 1 Processors



#### 2 Processors



## 4 Processors



## 8 Processors



## Simple:

### 1 Processors



### 2 Processors



## 4 Processors



## 8 Processors



## Weather

### 1 Processors



### 2 Processors



## 4 Processors



## 8 Processors



## Calculation Table

|         | Processors | Bus Transactions | Address Traffic | Block Transfers | Data Traffic | Address Traffic + Data Traffic | Number of memory references | Total Bus Traffic |
|---------|------------|------------------|-----------------|-----------------|--------------|--------------------------------|-----------------------------|-------------------|
| FFT     | 1          | 928043.00        | 5568258.00      | 928043.00       | 59394752.00  | 64963010.00                    | 7451717.00                  | 8.72              |
|         | 2          | 665631.00        | 3993786.00      | 538970.00       | 34494080.00  | 38487866.00                    | 1863029.00                  | 20.66             |
|         | 4          | 3051536.00       | 18309216.00     | 2591185.00      | 165835840.00 | 184145056.00                   | 7451717.00                  | 24.71             |
|         | 8          | 3283206.00       | 19699236.00     | 2813616.00      | 180071424.00 | 199770660.00                   | 7451717.00                  | 26.81             |
| Simple  | 1          | 5357756.00       | 32146536.00     | 5357756.00      | 342896384.00 | 375042920.00                   | 27030092.00                 | 13.88             |
|         | 2          | 9674938.00       | 58049628.00     | 8564862.00      | 548151168.00 | 606200796.00                   | 27030092.00                 | 22.43             |
|         | 4          | 3610539.00       | 21663234.00     | 2844622.00      | 182055808.00 | 203719042.00                   | 27030092.00                 | 25.32             |
|         | 8          | 12503960.00      | 75023760.00     | 10996624.00     | 703783936.00 | 778807696.00                   | 27030092.00                 | 28.81             |
| Weather | 1          | 7020187.00       | 42121122.00     | 7020187.00      | 449291968.00 | 491413090.00                   | 31764036.00                 | 15.47             |
|         | 2          | 8662499.00       | 51974994.00     | 7761028.00      | 496705792.00 | 548680786.00                   | 31764036.00                 | 17.27             |
|         | 4          | 9463257.00       | 56779542.00     | 8480669.00      | 542762816.00 | 599542358.00                   | 31764036.00                 | 18.87             |
|         | 8          | 1085641.00       | 6513846.00      | 9646703.00      | 617388992.00 | 623902838.00                   | 31764036.00                 | 19.64             |

## Questions

**Does the global bus traffic increase or decrease as the number of processors increases? Why (give two reasons, one for the data traffic and another for the address+command bus traffic)? Does this increment or decrement happen for all the benchmarks or does it depend on the different locality grades, shared data, ...? In general, which is the step with more bus traffic: from 1 to 2 processors, from 2 to 4 processors, or from 4 to 8 processors? Why?**

The global bus traffic generally increases with an increasing number of processors due to heightened parallelism and coordination demands. Two main reasons contribute to this rise: Firstly, increased data traffic results from more processors accessing data concurrently. Secondly, the need for enhanced coordination and synchronization among processors, including cache coherence and arbitration, leads to elevated address and command bus traffic. The increment in bus traffic may vary among benchmarks, contingent on factors such as locality grades and shared data characteristics. In general, the step with the most bus traffic occurs from 1 to 2 processors, attributed to initial overhead and heightened coordination requirements during the transition from a single processor to two processors.

**Do you think that the results and conclusions obtained with these experiments are of general application or they may change depending on the cache coherence protocol? Why?**

The results and conclusions obtained from these experiments may not be universally applicable and could vary depending on the cache coherence protocol. The choice of cache coherence protocol plays a crucial role in determining how multiple processors interact and coordinate their access to shared data. Different cache coherence protocols, such as MESI in this case, introduce distinct mechanisms for maintaining cache consistency. Therefore, the impact of the number of processors on bus traffic, as well as the overall system behavior, can be influenced by the specific characteristics and functionalities of the chosen cache coherence protocol. As a result, the findings from these experiments may not be directly transferable to systems employing alternative cache coherence protocols, and the generalizability of the conclusions should be approached with consideration for the specific protocol in use.

**Change the results obtained with these experiments in order to measure the bus traffic in bytes per instruction (you must divide by the number of instruction captures -not memory accesses- in the traces). Supposing that these applications run at a sustained 100 MIPS per processor (and they are integer-intensive applications), what is the address and data bus bandwidth requirement for each application and configuration of the number of processors?**

Example: FFT For 2 processor:

$$\text{Total bus traffic} = \text{Data traffic} + \text{Address traffic} = \text{Block transfers} * 64 \text{ bytes} + \text{Bus transactions} * 6 \text{ bytes} = 538970 * 64 + 665631 * 6 = 38487866$$

$$\text{Instructions executed} = 781142$$

$$\text{Bus traffic per instr} = 38487866 / 781142 = 49.27$$

$$\text{Address BW} = 49.27 * 2 * 100,000,000 * (6/70) = 844.628 \text{ MB/s}$$

$$\text{Data BW} = 49.27 * 2 * 100,000,000 * (64/70) = 9009.371429 \text{ MB/s}$$

**Table:**

|         | Processors | Bus Transactions | Address Traffic | Block Transfers | Data Traffic | Address Traffic + Data Traffic | Instructions | Total Bus Traffic | Address BW | Data BW  |
|---------|------------|------------------|-----------------|-----------------|--------------|--------------------------------|--------------|-------------------|------------|----------|
| FFT     | 1          | 928043           | 5568258         | 928043          | 59394752     | 64963010                       | 3124454      | 20.8              | 178.8      | 1900.4   |
|         | 2          | 665631           | 3993786         | 538970          | 34494080     | 38487866                       | 3124454      | 38.5              | 661.3      | 7028.7   |
|         | 4          | 3051536          | 18309216        | 2591185         | 165835840    | 184145056                      | 3124454      | 58.9              | 2027.4     | 21547.3  |
|         | 8          | 3283206          | 19699236        | 2813616         | 180071424    | 199770660                      | 3124454      | 63.9              | 4398.9     | 46751.3  |
| Simple  | 1          | 5357756          | 32146536        | 5357756         | 342896384    | 375042920                      | 11594172     | 32.3              | 278.2      | 2956.6   |
|         | 2          | 9674938          | 58049628        | 8564862         | 548151168    | 606200796                      | 11594172     | 52.3              | 899.3      | 9557.7   |
|         | 4          | 3610539          | 21663234        | 2844622         | 182055808    | 203719042                      | 11594172     | 61.4              | 2111.8     | 22444.2  |
|         | 8          | 12503960         | 75023760        | 10996624        | 703783936    | 778807696                      | 11594172     | 67.2              | 4621.5     | 49116.4  |
| Weather | 1          | 7020187          | 42121122        | 7020187         | 449291968    | 491413090                      | 1700791      | 288.9             | 2484.8     | 26408.4  |
|         | 2          | 8662499          | 51974994        | 7761028         | 496705792    | 548680786                      | 1700791      | 322.6             | 5548.8     | 58971.9  |
|         | 4          | 9463257          | 56779542        | 8480669         | 542762816    | 599542358                      | 1700791      | 352.5             | 12126.3    | 128876.9 |
|         | 8          | 1085641          | 6513846         | 9646703         | 617388992    | 623902838                      | 1700791      | 366.8             | 25238.0    | 268226.8 |

**The computation in the question above gives the average bandwidth requirement under the assumption that the bus bandwidth is enough to allow the processors to execute at full speed. This computation provides a useful basis for sizing the system. For example, on a machine with 4.8 GB/s of data bandwidth, how many processors will the bus be able to support without saturating for each benchmark? Remember: If the bandwidth is not sufficient to support the application, the application will slow down.**

Based on the earlier calculations, the bus exhibits the capability to support around 5.2 processors (with a bandwidth requirement per processor of 0.92) before reaching saturation in the FFT benchmark, 1.6 processors (with a bandwidth requirement per processor of 2.99) for the simple benchmark, and 2.1 processors (with a bandwidth requirement per processor of 2.25) for the weather benchmark. It is essential to underscore that these figures are approximations, and the actual number of processors the bus can accommodate will vary based on several factors, including the specific architecture of the multiprocessor system and the characteristics of the application.

**In conclusion, does the increase of the number of processors improve the multiprocessor system performance? Why and in what sense? Are the conclusions you obtain similar to the previous ones for the miss rate (Task 5)?**

Across all benchmarks, a consistent trend emerges wherein global bus traffic increases with a higher number of processors. Variations in the magnitude of traffic escalation or reduction emphasize that the impact on bus traffic depends on the unique attributes of each benchmark and the quantity of involved processors. These findings highlight the importance of considering benchmark-specific behaviors when assessing the influence of the number of processors on global bus traffic. In essence, the effects of increasing the number of processors on system performance are subject to diverse factors, including the specific workload, locality, and design choices such as cache coherence protocols.

The conclusions for Task 6, focusing on the influence of the number of processors on global bus traffic, and Task 5, which addresses the miss rate and bus bandwidth in the context of multiprocessor system performance, share some similarities. Both conclusions emphasize the impact of the number of processors on system performance, highlighting the importance of factors such as bus bandwidth and the unique characteristics of benchmarks. They both underscore the need to consider specific behaviors of benchmarks and system configurations when assessing the implications of increasing the number of processors. However, Task 5 specifically focuses on the potential enhancement of system performance through increased processors, contingent upon sufficient bus bandwidth, while Task 6 delves into the nuanced variations in bus traffic across benchmarks and processor quantities. In essence, both conclusions contribute to a comprehensive understanding of the intricate relationship between the number of processors and multiprocessor system performance, each offering insights into different facets of the system's behavior.