

# Mtech thesis presentation

## Stage-1

Presented by: Shikhar Jain

CS22MTECH02002

Advisor: Dr. Ramakrishna Upadrashta

# CONTENT

- *Dynamic voltage and frequency scaling (DVFS)*
  - Motivation
  - Related Work
  - Research questions & results
    - Platform & Tools
    - Framework Schema & Challenges faced
    - Results
      - Intel-Rocketlake
      - ARM+External Power Meter
  - Key Observations
  - Future Work
- *IR2Vec Pip package*
- *GeMS*
- *Contributions Summary*

# DVFS

## (Dynamic Voltage and Frequency Scaling)

# Motivation: DVFS as a solution for Green Computing

$$P = c \cdot V^2 \cdot f + P_S$$

Core Voltage      Core Frequency

*But Voltage can't be controlled by user*



**Energy Consumed :  $E = P \cdot T$**   
F(Core-frequency)

**Objective to Optimize for**  
 **$EDP = E \cdot T$**

# **RELATED WORK**

# Fix the Code Don't tweak hardware [CGO'14]

- Study decoupling of execute (compute) and access (mem) phases
  - Target: affine (polyhedral) and non-affine code
  - Improvements
    - EDP reduced upto 25%
    - Performance
- Platform: Intel Haswell
- Limitations
  - Software prefetching
  - Optimal frequency selection by profiling



# PMU-Events-Driven DVFS [ACM TOMPECS '22]



- Platform used: Coffee-Lake (Core i7-8700K)
- Limitations
  - Heuristics, Not program property guided
  - Algorithm invocation overhead

# RAPL in Action [ACM TMPEC'18]

- Low performance overhead
- Correlation(Package power, Core temperature)
- MSR overflow
- Per Core measurement not supported



# **Research questions & Results**

# Some Research Questions and Solutions

## Research Questions:

1. Study DVFS for new architectures
2. Study mutual effects of frequency and temperature
  - o Frequency
    - EDP, Core-temperature
    - For Arm - Power, Current, Voltage
  - o Temperature
    - Core-frequency modulation
    - Kernel energy consumption
3. Validate Experimental Results
  - o Across machines
  - o Across softwares (Intel RAPL)
  - o Vendor solutions (Intel P\_state)

## Some Solutions:

- [ARM] Implement an automated server client framework
- [Intel] Implement an experimental framework

# **Experimental Setup**

# Platforms & Tools

## Intel

- Intel Rocketlake (11th Gen) i5 11400 x2

## Arm

- Quad core Cortex-A72 (ARM v8) 64-bit SoC x2
- UM25C External Power Meter

## Intel

- Performance monitoring
  - PAPI-7.2
- Core-temperature
  - lm-sensors
- Prefetcher Control
  - Likwid-features
- Msr tools

## Common

- Perf
- Core isolation/affinity
  - Isolcpus, taskset
- Power governors
  - Cpupower/Cpufreq-utils

## Arm

- Communication
  - Pybluez
  - Sockets
- Core- temperature
  - Gpizero package

# Common framework schema



# Problems & solutions

## MSR overflow detection and correction

- MSR Range 0 - 262143 unit
- Examples
  - 3mm , ~68k iterations, 8.8 hrs
  - jacobi-2d , ~46k iterations, 11.7 hrs

$(N*262143 + (\text{MSR value})) * \text{energy unit}$ ; N=no.of overflows

## Loss of statistical soundness

- Data Cache Flushing
- Disable prefetchers
- Usage of standalone threads
- Prevent core overheating
- Ensure correct bios configuration

# **Results for Intel (x86)**

# Results - Energy, EDP and Execution time on PolyBench: Jacobi-2d



# Challenge - Dissimilarity between same machines: gemm

Machine 1



Machine 2



# Challenge - Dissimilarity between same machines: atax

Machine 1



Machine 2



# Static frequency selection vs Intel P\_state



# **Results for ARM-64**

# Power Measurement: Problems & proposed solutions



# Server-Client & Power Meter Communication Framework



# Server & Client State Diagram



# Results- Energy, EDP and Execution time : Jacobi-2D



# Key Observations

- Core-frequency effects
  - EDP, Energy Consumption, Execution Time, Core-temperature
- No modulation in assigned core frequency
- “Buckets” of optimal frequency, different across similar machines
- Intel
  - HWP is comparable to static frequency selection
  - Validation across similar machine isn’t feasible
- ARM
  - Core temperature effects optimal frequency
  - Volts  $\propto$  core-frequency, Current  $\propto$  core-frequency, Power  $\propto$  Current

# Future Work

- **Idea: DVFS over uncore & off-core domains**
  - Research Question:
    - “*Proposing Compiler driven approach for uncore DVFS*”
  - Related Work: [1] SysScale
- **Idea: Don’t “conquer” HWP or intel pstate, add software solutions on top**
  - Research Question:
    - “*Explore relevant alternative power saving techniques*”
  - Related Work: [2] CoPPer

[1] Jawad Haj-Yahya, Mohammed Alser, Jeremie Kim, A. Giray Yağlıkçı, Nandita Vijaykumar, Efraim Rotem, and Onur Mutlu. 2020. SysScale: exploiting multi-domain dynamic voltage and frequency scaling for energy efficient mobile processors. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA '20)

[2] C. Imes, H. Zhang, K. Zhao and H. Hoffmann, "CoPPer: Soft Real-Time Application Performance Using Hardware Power Capping," 2019 IEEE International Conference on Autonomic Computing (ICAC)

# **IR2Vec Python Package**

# Highlights

- **Aim:** To deploy **IR2Vec** as a pip installable package
  - Output
    - Many-linux 2014 compliant wheel files (Python 3.6-3.11)
    - Package Source-Code
  - **Tools:** Cpython, LLVM-12, Docker, Many-linux 2014
  - Artifacts were pushed to **IR2Vec** repo.

# Python-3.10 \*\*

```
import IR2Vec_pkg as ir2vec
d = ir2vec.generateEmbeddings("/home/shikharj/trisolv.ll", "fa", "f")
if d["Status"]:
    for x in d["Function_Dict"]:
        print("key: ", x)
        print("Value: ", d["Function_Dict"][x])
    print("\n\n")
    print(d["Instruction_Dict"].keys())
else:
    print(d["Message"])
```

300\*1 distributed representation

# GeMS(Generating Million Scopes)

*Presented at: IMPACT 2023 (13th Workshop)*

*Venkatakeerthy S, Nilesh Shah, Anilava Kundu, Shikhar Jain and Ramakrishna Upadrashta.*

# GeMS : Beyond SCOP generation

```
for(int i = 0;i<1000;i++){  
    C[184*i]=A[184*i]*B[184*i]-E[176*i];  
    D[184*i]=D[192*i]+E[192*i]*E[184*i];  
}
```

A Generated SCOP



- **Reuse** is “control” knob
- $IPC \propto Reuse$
- Reuse Mem Bound
- Reuse Compute Bound
- Enforcing Compulsory misses [Spatial Locality]

IPC < 1.0 (or Machine Balance)

# Contributions & Future Work

- Contributions
  - DVFS
    - Study of
      - New architectures, Intel's proprietary p\_state governor
      - Polybench-suite on intel & arm
      - Empirical relationship among core temperature, frequency, etc.
    - Tools/framework
      - PAPI 7.2 native events for intel 11th gen (Rocketlake)
      - Server client communication framework, automated setup
  - IR2Vec Manylinux2014 compliant python package
- Publication

**GeMS: Towards Generating Millions of SCoPs.**  
**Accepted in IMPACT 2023 Workshop, Jan 2023.**  
Authors: Venkatakeerthy S, Nilesh Shah, Anilava Kundu, **Shikhar Jain** and Ramakrishna Upadrasta.
- Future Work
  - Alternative power saving techniques
  - Formulation of static model guided by compiler , LLVM support

# THANK YOU

# Motivation : Feasibility of DVFS towards energy conservation

Power  $\propto$  Performance (~MIPS,FLOPS)

$$P = c \cdot V^2 \cdot f + P_S$$

Core Voltage  
Core Frequency

$$P_{\text{static}} \sim N * V * e^{-Vt}$$

*But Voltage can't be controlled by user*

$$P_{\text{dynamic}} \sim N * C * V^2 * f * A$$

Execution Time  $\propto$  Core-frequency

Energy Consumed during kernel Execution:  $P \cdot T \rightarrow \text{Energy} \propto F(\text{Core-frequency})$

Objective to Optimize for  
 $EDP = E \cdot T$



# Points to be understood

- Study DVFS for :
  - New architectures like intel 11th gen & cortex-A72
  - Vendor specific solutions (Intel P\_state)
  - Results validation across similar machines
- Core frequency effects on properties :
  - EDP, Core-temperature
  - For Arm - Power, Current, Volts
- Effects of high core temperature on :
  - Core-frequency modulation
  - Kernel's energy consumption
- Intel RAPL as “internal powermeter”
  - Implement Missing tools/frameworks for both Intel & Arm

# AIM

- Study of recent (SOTA) static & dynamic techniques.
- Prospects of DVFS across
  - Multiple architectures (Workstation [Intel] & Embedded [ARM] )
  - Newer architectures (ex Intel 11th gen -Rocketlake)
- Correlation of Core frequency with
  - Execution time, Energy consumed
  - EDP, Temperature
  - Power, Volt, current (ARM)
- Vendor provided solutions vs Static DFS (Intel)
- Parameters effecting actual kernel execution
  - Temperature
  - Architecture constraints (Intel)
- Implement Missing tools/frameworks (ARM)

# AIM

- Study DVFS
  - Vendor provided solutions vs Static DFS (Intel)
  - Across diverse & recent architectures
- Correlate core frequency with
  - Execution time, Energy consumed
  - EDP, Temperature
  - Power, Volt, current (ARM)
- Parameters effecting actual kernel execution
  - Temperature
  - Architecture constraints (Intel)
- Implement Missing tools/frameworks (ARM)

# AIM

- Prospects of DVFS?
  - Vendor provided solutions vs Static DFS (Intel)
- Correlate core frequency with
  - Execution time, Energy consumed
  - EDP, Temperature
  - Power, Volt, current (ARM)
- Parameters effecting actual kernel execution
  - Temperature
  - Architecture constraints (Intel)
- Implement Missing tools/frameworks (ARM)

# Final Steps

1. Check drop in assigned core freq. due to high temperature

## Correct Bios Configuration

2. Result Collation

- Race to Halt: off
- Intel Turbo Boost: off
- 3. Ground Truth generation (Min-EDP)

4. Plots Generation

# Platforms description

## Intel Workstation

- Intel Rocketlake (11th Gen): i5 11400 x2
- Tools: cpupower, taskset, isolcpus, Likwid, lm-sensors, msr-tools, perf, papi

## ARM

- ARM-64 (Raspberry-Pi4) x2
- UM25C External Power Meter
- Tools: cpupower, gpiozero, PyBluez, perf



# Intel RAPL (Running Average power Limit)



## Model Specific Registers

- Control registers provided by Processor vendors
- Get or manipulate Core/Off-core features



# RAPL in Action

- Low performance overhead
- Correlation between Package power and Core temperature
- Fixed refresh rate(~1ms)
- Model specific register(MSR) overflow
- Non-Atomic MSR updates
- Individual core measurement not supported

How to read MSRs in C

```
uint64_t msr_value;
/* Haswell: units of 61 microjoules */
/* MSR_PKG_ENERGY_STATUS is at address 0x611 */

double energy_units = pow(0.5, 14);
int fd = open("/dev/cpu/0/msr", O_RDONLY);
if (fd < 0) {
    perror("open");
    return -1;
}
if (pread(fd, &msr_value, 8, 0x611) < 0) {
    perror("pread");
    return -1;
}
double energy = msr_value * energy_units;
printf("%f\n", energy);
```

# Observations & solutions

## MSR overflow detection and correction

- Energy Status MSR
- Range 0 - 262143 unit
- Examples
  - 3mm ,~68k iterations ,8.8 hrs
  - jacobi-2d , ~46k iterations, 11.7 hrs

## Loss of statistical soundness

- Data Cache Flushing
- Disable prefetchers
- Usage of standalone threads
- Isolcpus, taskset

## Some precautions

- **Uniform Core temperature**
  - Low sampling Frequency
  - Low core-utilization
- **Correct Bios Configuration**
  - Race to Halt: off
  - Intel Turbo Boost: off
  - Intel Hyper-Threading: off

# RAPL in Action

- Low performance overhead
- Correlation between Package power & Core temperature
- Fixed refresh rate(~1ms)
- Model specific register(MSR) overflow
- Individual core measurement not supported

## How to read MSRs in C

```
uint64_t msr_value;  
/* Haswell: units of 61 microjoules */  
/* MSR_PKG_ENERGY_STATUS is at address 0x611 */  
  
double energy_units = pow(0.5, 14);  
int fd = open("/dev/cpu/0/msr", O_RDONLY);  
if (fd < 0) {  
    perror("open");  
    return -1;  
}  
if (pread(fd, &msr_value, 8, 0x611) < 0) {  
    perror("pread");  
    return -1;  
}  
double energy = msr_value * energy_units;  
printf("%f\n", energy);
```



# Important Tools & Interfaces

## Performance Monitoring

- Perf (Arm & Intel)
- PAPI-7.2 (Intel)

## Communication

- Pybluez package (Arm)
- Sockets package (Arm)

## Core Temperature Measurement

- lm-sensors(Intel) &
- Gpizero package(Arm)

## Core Isolation and affinity

- isolcpus
- Taskset

## Miscellaneous

- Likwid-features (Intel)
- Cpupower/Cpufrequtils
- Msr tools(Intel)

# Important Tools & Interfaces

## Performance Monitoring

- Perf (Arm & Intel)
- PAPI-7.2 (Intel)

## Communication

- Pybluez package (Arm)
- Sockets package (Arm)

## Core Temperature Measurement

- lm-sensors(Intel) &
- Gpizero package(Arm)

## Core Isolation and affinity

- isolcpus
- Taskset

## Miscellaneous

- Likwid-features (Intel)
- Cpupower/Cpufrequtils
- Msr tools(Intel)

Get IPC



# BACKUP



# Experiences with Experiments

- MSR overflow detection and correction
  - Max 262143 units can be measured
  -
- Correct Bios Configuration
  - Race to Halt : off
  - Intel EIST : off/on
  - Intel Turbo Boost : off
  - Intel Hyper-Threading: off
- Maintain Similar core temp for each kernel run
  - Enforcing Sleep
  - Low sampling Frequency ; Helps in low core-utilization
  - Better kernel execution schedule
- Ensuring minimal overhead of context-switch
  - IsolCpus
  - Taskset
- For Better sampling accuracy
  - Data Cache Flushing
  - Disable prefetchers
  - Kernel-execution, sampling [Core-Freq ,Core-temp] as standalone threads

# Key Observations

- DFS as a software solution for DVFS on Intel Rocketlake
  - Intel Rocketlake implements **Hardware P-state** based DVFS
  - Only software DFS doesn't lead to significant EDP improvements
- DFS approach on ARM leads to fixed bins of core frequency
  - Higher Core-frequency better the EDP, Energy Consumption, Execution Time
  - Core temperature plays significant role in deciding optimal frequency

# TIMELINE



# CPU Power Management for linux systems

- FS-PS : Pipeline slot stall
- FS-TS : Total cycle stall
- FS-MS : Memory cycle stall
- FS-LLCM : Off-core requests
- FS-CPI : Cycles per instruction



Separate analysis and optimizations for access & execute phase

# Observations

## MSR overflow detection and correction

- Energy Status MSR
- Max measurable energy (uJ) 262143 unit
- Example
  - 3mm ,~68k iterations ,8.8 hrs
  - jacobi-2d ,~46k iterations, 11.7 hrs

## Correct Bios Configuration

- Race to Halt: off
- Intel Turbo Boost: off
- Intel Hyper-Threading: off
- Speed Step: on/off

## Minimal overhead of context-switch

- Isolcpus
- Taskset

## Uniform core temperature

- Enforcing Sleep
- Low sampling Frequency, low core-utilization
- Better kernel execution schedule

## For Better sampling accuracy

- Data Cache Flushing
- Disable prefetchers
- Kernel-execution, Core-Freq ,Core-temp as standalone threads

# Tasks Performed

## Contributed

- Energy Efficient Application
- An Application Framework



RAPL PPP0/PP1 ENERGY STATUS Model Specific Register (MSR)

Anilava Kundu [CS20MTECH01002]

## New Contributions

- Enabled PAPI 7.2 native events support on rocketlake
- DVFS study on Intel Rocketlake
- Fully Automated framework for ARM experiments



# Important Tools & Interfaces

- Likwid-features
  - Disable Prefetchers HW\_PREFETCHER,CL\_PREFETCHER,DCU\_PREFETCHER,IP\_PREFETCHER
- Cpupower/Cpufrequtils
  - To change core power governor and policy
- Msr-tools
  - read/write to model specific registers as per there address & design
- Performance monitoring
  - Perf : cli tool to get various HW stats(eg LLC miss, CPU-Cycle Stalls) for an application
  - PAPI: Provide library support to instrument the code.
  - **Enabled native events support on Rocketlake for PAPI 7.2**
- Lm-sensors & gpiozero package(ARM)
  - To read Core temperature
- Taskset & Isolcpus
  - To isolate physical cores from OS scheduling(except kernel interfence)
  - Set and retrieve CPU affinity of a running process
- PyBluez
  - To handle Bluetooth sockets and associated data transfer

# An idea about C-states



IA32\_PM\_ENABLE (770H)



IA32\_HWP\_CAPABILITIES (771H)

## CONTENT



Lorem Ipsum

Lorem Ipsum

Lorem Ipsum

Lorem Ipsum

### Avg Energy and [EDP , Avg Exec. Time vs Freq | For EDP & Avg.Exec time Correlation : 1.0 with p-value: 0.0



### Avg Energy and Temperature(Max,Avg) in Celsius vs Freq



Previous  
Work

# ARM Setup: Server-Client & Power Meter Communication Framework



- Check drop in assigned core freq. due to high temperature
- Result Collation
- Groundtruth generation (Min-EDP)
- Plots Generation

# Intel P states imp links

<https://vstinner.github.io/intel-cpus.html>

<https://www.kernel.org/doc/Documentation/cpu->



Fig. 2. Hierarchy of power management components.

# CONTENT

- Dynamic voltage
  - Motivation
  - Literature Survey
    - PMU based
    - Fix the clock
  - Experiments
    - Importance
    - Intel-R
    - ARM+
  - Problems faced
  - Key Observations
  - Possible directions
- IR2Vec Pip package
- GeMS



# Clairvoyance [CGO '17]



# Results- Max & Avg Temperature vs Core frequency



# Results-Power, Volts, Current vs Core frequency: ARM





Fig. 3. The *ondemand* governor operation: an illustration.

- Intel Hyper-Threading: off
- Speed Step: on/off

## Minimal overhead of context-switch

Isolcpus  
Taskset

# Related Works: HW and SW techniques

- H/W: Edinburg paper etc.

- Intel P-state
  - ARM?

How to read MSRs

```
uint64_t msr_value;
/* Haswell: units of 61 microjoules */
/* MSR_PKG_ENERGY_STATUS is at address 0x611 */

double energy_units = pow(0.5, 14);
int fd = open("/dev/cpu/0/msr", O_RDONLY);
if (fd < 0) {
    perror("open");
    return -1;
}
if (pread(fd, &msr_value, 8, 0x611) < 0) {
    perror("pread");
    return -1;
}
double energy = msr_value * energy_units;
printf("%f\n", energy);
task SKELETONS;
```

DAE architectures discuss ??

First paper in this series to tackle dvfs v

DAE{compiler-side}

Decoupling the access and execute pha

is better than running in coupled way . a

DAE under min-max freq selection or

optimal EDP corresponding to min edp

better than coupled

Turning off/on HW prefetcher actually eff

results

Lesser pipeline stalls in execute phase

Sometimes total stalls are lesser as

compared to CAF



# Important Tools

- Tools for performance analysis
  - Perf
  - PAPI
  - Likwid
  - CpuFreq and Cpupower
- Hardware interfaces
  - Intel P-state
  - Intel RAPL



# Energy Efficient Compilation

- Explain plots from jack dongra's paper
- This will help in explaining the aim/relevance of this problem
- Plots clearly highlights need of energy conservation methods for mem-bound kernels
- Also they show effect of core frequency accordingly

## Compute Bound(dgemm)



## Mem Bound(dgemm)



# Experimental Setup

## Workstation/Desktop

- Intel RocketLake S
- Two desktops
  - 6 physical, 1 virtual
- Tools: cpupov
- Made PAPI-7



## Embedded Board

- 2 ARM-64 (Raspi 4)
- Quad core Cpu
- Tools: cpupov

# Power Plane-0 Energy MSR Overflow Detection:

- Both Intel
  - Has energy
  - Can overflow
  - MSR
- Rocketlake
  - Power plane 0
  - Power plane 1
- Rocketlake
  - Power plane 0
  - Power plane 1



samples per sec

```
5 262142 , 262142,1679819196  
5 262143 , 262143,1679819196  
5 262143 , 262143,1679819196  
0 , 0,1679819196  
0 , 0,1679819196
```

time stamps

```
262143,262143,1679833190  
262143,262143,1679833190  
0,0,1679833190  
0,0,1679833190
```

Can mention methodology used to capture this fact?

# Experiments: Rocket Lake

- Two same configuration machines are not comparable for energy measurements



Dvfstest1



Dvfstest2

## Experiments - Rocket Lake



} DVFS techniques



Loss due to Intel P\_state

%EDP lost-Intel P\_state

226.56

Line Plot

-200

# Precautions taken for experiments

- Race to Halt, Intel Speed Shift, Thermal Efficiency Boost disabled
- Enforcing Sleep to maintain similar Temp.
- Better Schedule {Interleaving of Kernel Execution}
- Data Cache Flushing
- IsolCpus{kernel} & taskset
- Disable H/W prefetchers
- Kernel-execution, sampling [Core-Freq ,Core-temp] as standalone threads
- Low sampling frequency to keep core-utilization low

# Key Observations

- DVFS modell



# Pos

- 
- 



# **IR2Vec as a Pip-installable Python Package**

# Contributions & Future Work

- DVFS using compilers and hardware techniques
  - Related work: H/W, S/W and related power optimizations
  - Experimental Setup:
    - Intel
    - ARM
  - MSR overflow detection
  - Core-temperature relation with Core-frequency
  - Study of Intel P-state w.r.t static *dvfs* techniques
- IR2Vec as a pip installable package
- GeMS: Presented in IMPACT 2023
- Future Work
  - Exploration of techniques apart from DVFS towards Energy optimizations

| PolyBench-Kernel | Optimal EDP Userspace (Js) | EDP HWP Enabled (Js) |
|------------------|----------------------------|----------------------|
| Jacobi-2d        | 216727.53                  | 216190.41            |
| Seidel-2d        | 641133.54                  | 624895.51            |
| gemm             | 60970.99                   | 60644.51             |
| atax             | 0.0087 <b>THANKS</b>       | 0.0086               |
| bicg             | 0.013                      | 0.014                |
| jacobi-1d        | 0.0046                     | 0.0044               |

| MONTH/TASK      |                                        | JAN                                                                                                 | FEB                                  | MARCH                                                                                                                                                                    | APRIL                                                                                                                                                      | MAY |
|-----------------|----------------------------------------|-----------------------------------------------------------------------------------------------------|--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Literacy Survey | CARTAD,<br>ROOFLINE MODEL              | CARM,<br>INTEL ADVISOR                                                                              | RAPL IN ACTION,<br>PMU Driven EVENTS | ENERGY AWARE ROOFLINE,<br>COPPer                                                                                                                                         |                                                                                                                                                            |     |
| Benchmark       | Polybench                              | Polybench                                                                                           | Polybench                            | Polybench                                                                                                                                                                | mibench                                                                                                                                                    |     |
| Architecture    | Intel I5 11th Gen                      | Intel I5 11th Gen                                                                                   | Intel I5 11th Gen                    | ARM Cortex A72                                                                                                                                                           | ARM Cortex A72                                                                                                                                             |     |
| Custom Tools    | Intel Setup                            | Intel Experiments rounds                                                                            | ARM SETUP                            | Overflow-detection, Temperature Affects etc.                                                                                                                             | LLVM Pass                                                                                                                                                  |     |
| Tools           | ADVISOR,SDE,VTUNE                      | ADVISOR,SDE,VTUNE                                                                                   | PERF                                 | PERF                                                                                                                                                                     | PERF,<br>PAPI                                                                                                                                              |     |
| Results         | Exploration beyond IPC,<br>ROOFLINE+OI | Data collection,understanding, improved ways of experiments<br>Measured current,voltage,Energy,E DP | ARM vs Intel                         | Temperature measurement & its effects .<br>Understanding "joint play" of Arch. & Compiler Techn.<br>Role of Power-Capping and specific "problem" on intel DVFS via "OI". | Static OI estimation using FLops LLVM PASS & BULLSEYE, Issues with PAPI and perf, Discussion of modelling technique & accordingly studying few more papers |     |

# ARM (64-Raspberry-Pi4)

- Discuss Communication framework which is used to get the plots
- Maybe explain it using DFA

# AGL Builds (Comparative Study)

## Timeline

