

Chengchen Zhang  
CSE237A Individual Project Part 1

Looking first at the results for the execution times of the three workloads on one core shows that when the frequency is lowered, the execution times double, except for the third workload which experienced a 1.77x increase. This makes sense given that the high frequency is 1.2 GHz and the low frequency is 600 MHz which is exactly half of the high frequency. If the cores could perform at half the speed, it follows that it would take twice as long to finish the workload.

Looking at the power consumption though, running on low frequency cost ~1.75x more power than running on high frequency and this follows from the fact that 1.2GHz consumes 580mW/s and 600 Mhz consumes 500mW/s so although low frequency is half the speed, it consumes 86% of high frequency.

Looking at the workload characteristics of the three workloads, workload 1 had very few L1 cache misses while workload 2 averaged around 20% and workload 3 averaged around .29% and these percentages did not vary depending on frequency. The differences in percentages could be due to the structure of the workloads. Workload 1 is performing sequential access on array that is being sorted which makes for high predictability on what memory should be cached, while workload 2 is performing a random access at each step of the loop using `srand()` leading to high number of L1 misses. Workload three must also be performing accesses that are non-sequential which leads to its high L1 miss percentage. While workload 2 has low L2 miss percentage, its being offset by over 2 million TLB misses, which can also be attributed to the random access because the random element its trying to look up is not in the TLB. In contrast workload 3 had almost no TLB misses, which seems to indicate that the virtual memory of workload 3 fits nicely inside the TLB so there are very few misses. The three workloads have comparable execution times, but workload is 3 is a bit faster than the first 2.

There does not seem to be an observable change in PMU characteristics when running two workloads together except for last case when running workload 3 in both core 0 and 1. Except for the last case, this seems to be because the since the workloads are independent of one another, the separate cores can perform the workloads in parallel without interference with one another since neither are dependent on one another's work. This makes sense when looking at the source code for the workloads 1 and 2 since the sorting, and the random access do not depend on one another and can be done in parallel at the same time on different cores with no ill effect on speed.

The one case where the PMU is different when using two cores is when both core 0 and 1 are running workload 3 at the same time. Since the workload code for workload 3 is unknown, I can only speculate that running workload 3 on two different cores requires too much memory for LLC to hold both the data for core 0 and 1 leading to increase of LLC misses from 5% to ~26% which ultimately lead to 4x longer execution times for both high and low frequency cases.

There does not seem to be a relationship between execution times and miss ratios when comparing the three workloads, however when running workload 3 on cores 1 and 0, when the LLC miss ratio increased from 5% to 26%, the execution time increased 4x. If the cache ratio

increased on either workload 1 or 2, the execution times would also increase since cache misses take means the processor misses a lot of cycles waiting for the data to be fetched.

The relationship between the execution time and the cycle count is a direct function of the frequency that the workloads were run at which makes logical sense since CPU frequency defines how many cycles the processor can process per second. Thus cycle count = frequency \* execution time.

The number of instructions did not vary too much for any given workload and this can just be attributed to how the workloads had straightforward instruction sets without too much control flow that varied.

Tables:

Work Loads on Single Core in High and Low Frequency

|                     | WL1 - H   | WL1 - L   | WL2 - H   | WL2 - L    | WL3 - H   | WL3 - L   |
|---------------------|-----------|-----------|-----------|------------|-----------|-----------|
| Instructions        | 809200389 | 818572552 | 325514058 | 311566460  | 96636094  | 95830391  |
| Execution Time      | 974676    | 1980596   | 1091695   | 2221509    | 864368    | 1536434   |
| L1 Cache Miss Ratio | 0.000262  | 0.000401  | 0.200037  | 0.197798   | 0.303332  | 0.287753  |
| LLC Miss Ratio      | 0.120816  | 0.100803  | 0.00106   | 0.001078   | 0.049015  | 0.055663  |
| Power(us*mW)        | 565312080 | 990298000 | 633183100 | 1110754500 | 501333440 | 768217000 |
| TLB Misses          | 8250      | 6881      | 2587653   | 2376543    | 1415      | 1370      |

Workload 1 on Core 0 and Workload 2 on Core 1

|                     | WL1-0H    | WL2-1H    | WL1-0L     | WL2-1L     |
|---------------------|-----------|-----------|------------|------------|
| Execution Time      | 973693    | 1069324   | 2021889    | 2164910    |
| L1 Cache Miss Ratio | 0.00253   | 0.205015  | 0.000752   | 0.201651   |
| LLC Miss Ratio      | 0.1278    | 0.000148  | 0.121852   | 0.000419   |
| Power(us*mW)        | 564741940 | 620207920 | 1172695620 | 1082455000 |
| TLB Misses          | 454       | 25775092  | 15894      | 25798308   |

Workload 1 on Core 0 and Workload 2 on Core 1

|                     | WL1-0H    | WL3-1H    | WL3-0L    | WL3-1L    |
|---------------------|-----------|-----------|-----------|-----------|
| Execution Time      | 980701    | 694390    | 1987917   | 1568750   |
| L1 Cache Miss Ratio | 0.00232   | 0.311743  | 0.000325  | 0.313661  |
| LLC Miss Ratio      | 0.158     | 0.029     | 0.126944  | 0.062839  |
| Power(us*mW)        | 568806580 | 402746200 | 993958500 | 784375000 |
| TLB Misses          | 6148      | 1482      | 6928      | 15        |

Workload 2 on Core 0 and Workload 3 on Core 1

|  | WL2-0H | WL3-1H | WL2-0L | WL3-1L |
|--|--------|--------|--------|--------|
|  |        |        |        |        |

|                     |           |           |            |           |
|---------------------|-----------|-----------|------------|-----------|
| Execution Time      | 1456687   | 1086311   | 2720012    | 1953405   |
| L1 Cache Miss Ratio | 0.198972  | 0.31075   | 0.19       | 0.31      |
| LLC Miss Ratio      | 0.024274  | 0.076     | 0.023      | 0.094     |
| Power(us*mW)        | 844878460 | 630060380 | 1360006000 | 976702500 |
| TLB Misses          | 25850957  | 1370      | 25918421   | 5         |

Workload 1 on Core 0 and Workload 1 on Core 1

|                     | WL1-0H    | WL1-1H    | WL1-0L     | WL1-1L    |
|---------------------|-----------|-----------|------------|-----------|
| Execution Time      | 975533    | 950063    | 19850540   | 1929412   |
| L1 Cache Miss Ratio | 0.000255  | 0.000037  | 0.00037    | 0.000183  |
| LLC Miss Ratio      | 0.013     | 0.026     | 0.13       | 0.07      |
| Power(us*mW)        | 565809140 | 551036540 | 9925270000 | 964706000 |
| TLB Misses          | 7535      | 2306      | 9691       | 1072      |

Workload 2 on Core 0 and Workload 2 on Core 1

|                     | WL1-0H    | WL1-1H    | WL1-0L     | WL1-1L     |
|---------------------|-----------|-----------|------------|------------|
| Execution Time      | 1151728   | 1111979   | 2287243    | 2204477    |
| L1 Cache Miss Ratio | 0.2       | 0.2       | 0.2        | 0.2        |
| LLC Miss Ratio      | 0.004     | 0.002     | 0.002      | 0.0015     |
| Power(us*mW)        | 668002240 | 644947820 | 1143621500 | 1102238500 |
| TLB Misses          | 25840515  | 25766746  | 25897720   | 25793671   |

Workload 3 on Core 0 and Workload 3 on Core 1

|                     | WL1-0H     | WL1-1H     | WL1-0L     | WL1-1L     |
|---------------------|------------|------------|------------|------------|
| Execution Time      | 2406467    | 2026876    | 4467314    | 3574213    |
| L1 Cache Miss Ratio | 0.28       | 0.31       | 0.27       | 0.3        |
| LLC Miss Ratio      | 0.25       | 0.22       | 0.32       | 0.26       |
| Power(us*mW)        | 1395750860 | 1175588080 | 2233657000 | 1787106500 |
| TLB Misses          | 9109       | 654        | 1577       | 46         |

## **Graphs:**

Single core workload graphs on high and low frequencies:





Graphs for comparing PMU Characteristics across workloads on two cores. Every 2 bars were performed together. WL1-0H = Workload 1 on core 0 on high frequency.



