



Berkeley Architecture Research

# Activity-based Power Modeling for Arbitrary RTL with Automatic Signal Selection

Donggyu Kim, Vighnesh Iyer

EE241B Spring 2017

5/1/2017



# Simple Analytic Power Model

$$\begin{aligned} P_{total} &= P_{dyn} + P_{leak} \\ &= \alpha C_L V_{DD}^2 + I_{leak} V_{DD} \end{aligned}$$

CAD tools can statically figure out

Activity factor for random design?





# In Fact, Dynamic Power Is

$$P_{dyn} = \frac{1}{2} V_{DD}^2 \left( \sum_{i \in \text{all signals}} C_i D_i \right) [1]$$

$C_i$ : capacitance signal  $i$  drives

$D_i$ : toggles per cycle of signal  $i$

We hope

$$\approx \frac{1}{2} V_{DD}^2 \left( \sum_{k \in \text{some signals}} C_k D_k \right)$$

[1] Najm. "A Survey of Power Estimation Techniques in VLSI Circuits, IEEE Trans on VLSI, 1994



# What Are *Some Signals*?

- Architects' Approach
  - Intuitively select signals / events

$$\frac{1}{2} V_{DD}^2 \left( \sum_{k \in \text{some signals}} C_k D_k \right)$$



- How to get  $C_k$ ?
  - Existing hardware: regression with event counters [1]
  - RTL designs: regression with signals activities [2]
  - Pre-RTL: analytic modeling [3]

[1] Isci et al. "Runtime power monitoring in high-end processors: methodology and empirical data", MICRO 2003

[2] Sumwoo et al. "PrEsto: An FPGA-accelerated Power Estimation Methodology for Complex Systems", FPGA 2010

[3] Li et al. "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures", MICRO 2009



# What About Your Novel Designs?

- Signal selection is not easy
  - Processors:  $\sim O(100)$  papers
- Collecting all signal activities are extremely expensive
  - Can we boot linux on gate-level simulation?
- Even existing macro models are expensive [1] [2]
  - Can we run SPEC benchmarks on RTL software simulation?
  - There are still too many signals to watch on FPGA



[1] Gupta et al. "Power modeling for high-level power estimation", IEEE Trans on VLSI, 2000

[2] Bogliolo et al. "Regression-based RTL power modeling", ACM Trans on Design Automation of Electronic Systems, 2000



# What Signals for Power Modeling?

- Observation

- ***State*** determines the signal activities of the whole system.
  - ***A small number of control signals and data busses*** mainly determine the next state of the system.





## Example: GCD

```
class GCD extends Module {
    val io = IO(new GCDIO)
    val x  = Reg(UInt())
    val y  = Reg(UInt())
    when (x > y) { x := x - y }
    unless (x > y) { y := y - x }
    when (io.e) { x := io.a; y := io.b }
    io.z := x
    io.v := y === 0.U
}
```



# Power Training Tool Flow





# Results



- 20 sets for training, 20 sets for tests
- GCD, Parity: quadratic models
- Stack, Risc, RiscSRAM: qubic models

| Design                                | GCD  | Parity | Stack | Risc | RiscSRAM |
|---------------------------------------|------|--------|-------|------|----------|
| Number of Selected Signals and Busses | 7    | 3      | 8     | 11   | 15       |
| Normalized RMS error (%)              | 29.3 | 1.9    | 31.9  | 4    | 10       |
| Average error (%)                     | 5.1  | 0.9    | 26.9  | 2.3  | 6.5      |
| Maximum error (%)                     | 16.5 | 2.4    | 153.7 | 7.4  | 20.5     |



# How Many Signals for RISCV-mini?

- 3-stage pipeline, caches, exceptions



- 108 signals / busses
- 50% duplicates



# Power Modeling for Complex Combinational Logic Blocks



# Power Macromodels for Complex Combinational Logic



- Linear power models require full instrumentation
- Macromodels reduce and average circuit stimulus
- More accurate for complex logic (ALU, SEC/DED), faster for real-time online power estimation



# Macromodel Variables



- 4 variable macromodel proposed [1]

$P_{in}$  = average input signal probability

$D_{in}$  = average input transition density

$SC_{in}$  = average input spatial correlation coefficient

$D_{out}$  = average output transition density

- Provides sufficient predictive power for estimation
- Easy to implement for FPGA sample capture



# How Variables Track Power





# How Variables Track Power Our Experimental Observations



c6288: 16x16 multiplier



c7552: 32-bit adder



# Methodology



- Check feasibility and accuracy of macromodel with our process & glitching power
- Generate training and test sequences
  - Build power model from training set
  - Interpolate/prediction testing set and find error
- Use PrimeTime PX for gate-level power estimate





# Types of Models



- 4D Table: a map of (Pin, Din, SCin, Dout) tuples to a power estimate
- How do we extrapolate beyond the convex hull? Are sharp discontinuities an issue?
- Continuous linear models (linear, quadratic, cubic terms)
  - constant/linear/cross/triple-cross/self-squared/etc... coefficients
  - easy to construct (least squared) and predict (matrix multiply)





# Results



- Why does the cubic model perform better than the 4D table?



# Results



- How can we reduce max error? Is the distribution of error more important than max error?



# Future Work

- Realistic Target Designs
  - Rocket-chip, Hwacha, BOOM
- Train & Test Inputs
  - Assembly tests /  $\mu$ benchmarks
  - Random instruction streams
  - Real applications through Strober[1]
- FPGA Simulation
  - Automatic activity counter instrumentation

[1] Kim et al. “Strober : Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL”, ISCA 2016



# Q & A

