



# 數位系統之低功率設計技術簡介

## Introduction to Low-Power Design Techniques for Digital Systems

Prof. Chien-Nan Liu  
Institute of Electronics  
National Chiao-Tung Univ.

Tel: (03)5712121 ext:31211  
E-mail: jimmyliu@nctu.edu.tw  
<http://www.ee.ncu.edu.tw/~jimmy>

## Heterogeneous Integration

教育部顧問室  
SOC人才培育計畫



## Outline

- **Introduction**
- **Multiple Power Domains**
- **Dynamic Voltage and Frequency Scaling**
- **Clock Gating**
- **Dual Threshold Voltages**
- **Power Gating**
- **Conclusion**

## Heterogeneous Integration

# Motivation

- Portability requirement is increasing
  - Longer battery life with increased functionality
- Power related issues are getting serious
  - Heat dissipation requirement (rocket nozzle?)
  - Electromigration
  - Power/Ground bounce due to IR-drop
- Power is as important as performance
  - Should be considered in the design phase



Source: Courtesy, Intel P. 3

## Heterogeneous Integration

## Serious Power-Related Issues

IC / ASIC Designs Requiring Re-Spins by Flaw



## Heterogeneous Integration

Source: Courtesy, Synopsys

P. 4

# Where is Power Consumed?

- **Dynamic power (switching power)**

- Power consumed when the output of a gate changes value
- Quadratically dependent on voltage

$$P = C_L V_{dd}^2 f$$

$C_L$ : loading capacitance;  $V_{dd}$ : supply voltage  
f: transition frequency



- **Static power (leakage power)** 插上電就會有的power

- Power consumed by each element at all times
  - From several sources
- Grows exponentially when voltage is reduced
- Increases as transistor size shrinks



## Heterogeneous Integration

P. 5

## Power Vs. Energy

$$\text{Power} = \frac{\text{Energy}}{T}$$



- Two approaches require the same energy
- Lower power design could simply be slowed down ??

## Heterogeneous Integration

P. 6

# Power Saving Opportunities

越早對power進行優化，能省下的比率越高  
但是越再前面的階段就越難估計power

- A power-conscious design methodology addresses power at every level of design hierarchy
- Applying low-power techniques at higher design levels can reduce more power
- The focuses of this slide are the low-power technique at RTL or higher design levels



Ref : [1]

P. 7

## Heterogeneous Integration

# Power Reduction Techniques

- Reduce supply voltage ( $V_{dd} \downarrow$ )
  - Multiple power domains 把電路分成多個部分，分別套用不同的電壓
    - Lower  $V_{dd}$  value for non-critical blocks
- Reduce signal transitions ( $f \downarrow$ )
  - Dynamic voltage and frequency scaling (DVFS)
    - Reduce frequency for light-loading blocks
  - Clock gating
    - Freeze clock signal
- Use high-Vt cells for lower leakage power (leakage  $\downarrow$ )
  - Dual threshold voltages 使用不同Vt的cell，要速度就low Vt，要leakage少就high-Vt
    - Replace the cells on non-critical paths with high-VT cell
- Shut down power supply when not used ( $V_{dd} \downarrow$ , leakage  $\downarrow$ )
  - Power gating (MTCMOS) 沒用到的部分直接把他的電源切掉

## Heterogeneous Integration

P. 8

# Outline

---

- Introduction
- **Multiple Power Domains**
- Dynamic Voltage and Frequency Scaling
- Clock Gating
- Dual Threshold Voltages
- Power Gating
- Conclusion

## Heterogeneous Integration

P. 9

## Multiple Power Domain Concept

---

- Different components run with different voltages
- Also known as multi-supply-voltage (MSV) designs



## Heterogeneous Integration

IBM © 2008 IBM Corporation

P. 10

# Adv. of Multiple Power Domains

- Since dynamic power is proportional to  $V_{dd}^2$ , lowering  $V_{dd}$  on selected blocks helps reduce power significantly.
- Unfortunately, lowering the voltage also increases the delay of the gates in the design
- Level shifters are used to propagate the signals between different domains 如果有多個voltage region · 需要透過level shifter  
轉換電壓讓這些區域能夠正常的溝通



## Heterogeneous Integration

P. 11

# Basic MSV Design Principles

- Adopt multiple supply voltages (MSV) to trade performance and power saving
- Assign higher VDD to timing-critical cells for timing optimization
  - GOOD for performance
  - BAD in power consumption
- Assign lower VDD to non-timing-critical cells for power saving
  - BAD for performance
  - GOOD in power consumption

## Heterogeneous Integration

P. 12

# Physical Implementation Requirements

- The devices with the same supply voltage will be put together
    - Form a “voltage island” 進行placement時，相同voltage domain的盡量被擺放在一起，因此constraint變多
  - Creation of voltage islands
    - During high-level synthesis or physical synthesis 在high-level synthesis時就會去生成不同的voltage island
  - Physical implementation
    - Floorplanning of voltage islands
    - Power network planning
  - As few voltage levels as possible
    - Reduce the types of level shifters and ease the power/ground distribution
    - Level shifter itself occupies area and consumes power
    - Level shifter takes time to switch  
→ increases delay
- 如果設置太多不同的voltage，也會需要更多的level shifter，需要去權衡level shifter本身的area與power consumption



Three supply voltages and five voltage islands

## Heterogeneous Integration

P. 13

## Low-to-High Level Shifter – Case 1

- 1V swing may not reach threshold in the 3V domain  
→ Add Level Shifters



把low voltage domain的訊號"1"直接傳到high voltage domain的話，可能會被視為"0"

## Heterogeneous Integration

P. 14

# Low-to-High Level Shifter – Case 1

- “Up-shifting” level converters require two supply rails and share a common ground



## Heterogeneous Integration

P. 15

# Low-to-High Level Shifter – Case 2

- A 0.9V signal driving a 1.2V gate will turn on both the NMOS and PMOS networks, causing short-circuit currents



就算low "1"傳到high voltage domain成被視為"1"，  
其可能不足以使high voltage domain的pmos關閉，  
導致short circuit current



## Heterogeneous Integration

P. 16

## Low-to-High Level Shifter – Case 2

- A low voltage swing input signal would not necessarily be strong enough to turn the NMOS input transistor fully on.
  - Internal signal cannot be discharged to 0V



## Heterogeneous Integration

P. 17

## High-to-Low Level Shifter

- CMOS gate can be driven higher than the power supply voltage without problems, up to the gate breakdown voltage
  - Additional level shifter is not required

high "1"傳到low voltage domain時可能會導致low voltage domain的元件被燒毀  
但通常如果差得不多的話就不會加入high轉low的level shifter



## Heterogeneous Integration

P. 18

# Timing Issues

- Gate delay is different with different supply voltages
  - Clock arrival time may be quite different, too
  - Level shifters will induce extra delay as well
  - Timing analysis becomes more complex
- Optimization with accurate timing analysis



## Heterogeneous Integration

P. 19

# Other Issues in MSV Approach

- Level shifter design
  - Reduce its power, area, and delay 當可能地去優化level shifter
- Library characterization
  - Support every possible supply voltages in a library requires too much characterization efforts 如果當前cell吃的voltage沒有被寫在library中，就會導致 timing 分析變得很難進行
  - When a cell is running at different voltage that are not characterized in the library, timing analysis becomes much more complex
- Floorplanning, power planning, power grids
  - Voltage-island awareness is a new consideration in physical design
  - Reduce the required level shifters is an additional target Floorplan 時劃分不同的 voltage island 也不容易
- Power up and power down sequencing
  - A pre-designed sequence for powering up the design may be required to avoid deadlock 系統開關機時要注意各個voltage block之間的開關順序，避免造成deadlock

## Heterogeneous Integration

P. 20

# Outline

---

- Introduction
- Multiple Power Domains
- Dynamic Voltage and Frequency Scaling
- Clock Gating
- Dual Threshold Voltages
- Power Gating
- Conclusion

## Heterogeneous Integration

P. 21

## Dynamic Voltage & Frequency Scaling

---

- Dynamically control the voltage and frequency
  - Increase voltage when high performance is required
  - Reduce the voltage when not needed

與 voltage island 不同的就是 DVFS 是動態的調整電壓與頻率，而非劃分 voltage domain
- DVFS is a method to reduce the energy for a task by scaling the operating voltage/frequency
  - Power consumption if a CMOS-based circuit is
$$P = \alpha \bullet C \bullet V^2 \bullet f$$

*a : switching factor  
C : effective capacitance  
V : operating voltage  
f : operating frequency*

  - Energy required to run a task during T is
$$E = P \bullet T \propto V^2$$

Assuming  $V \propto f$ ,  $T \propto f^{-1}$
- By lowering CPU frequency, CPU energy can be reduced

## Heterogeneous Integration

P. 22

# Energy Reduction with DVFS

- Ex: a task with workload W should be completed by a deadline, D



- DVFS reduces the CPU energy by providing “just enough” computation power 如何去估計出“剛剛好讓CPU能準時執行完工作的power”很難

## Heterogeneous Integration

P. 23

## Choosing a Frequency in DVFS

- Workload of a task,  $W_{task}$  is defined as the total number of CPU clock cycles required to finish the task.

$$W_{task} = \sum_{i=1}^N CPI_i \quad N : \text{total number of instructions in a task}$$

*CPI : clock cycles per instruction*

- Task execution time,  $T_{task}$  is a function of the CPU frequency,  $f_{CPU}$

調降電壓後，會導致某些元件的delay變高，因此對於某些其他仍在正常速度運作的元件來說可能會取到錯誤的值  
--> 不能只調降電壓，頻率也需要調整

$$T_{task} = \frac{W_{task}}{f_{CPU}}$$

- Given a deadline of D,  $f_{target}$  denotes the CPU frequency that results in  $T_{task}$  closest to D

$$f_{target} = \frac{W_{task}}{D} \Rightarrow T_{task} = D$$

## Heterogeneous Integration

P. 24

# Dynamic Operating Voltage Range

- Best operation region is the period that frequency increases monotonically over voltage
  - Make sure the circuits can operate reliably and the delay paths vary monotonically



## Heterogeneous Integration

P. 25

# Dynamic Energy Dissipation

- Energy is the integration of power over the time taken to complete a task of work
  - Lowering frequency can reduce power, but total energy is the same
- Lowering the supply voltage can reduce the energy dissipation simultaneously



## Heterogeneous Integration

P. 26

# DVFS Block Diagram

- Supply Voltage
  - The CPU subsystem is powered by a programmable power supply.
  - The rest of the chip is powered by fixed power supply
- System Clock
  - The SysClock Generator dynamically control the PLL to generate required clock signals with different frequency



Ref:[2]

P. 27

## Heterogeneous Integration

要有DVFS的功能的  
話電路就必須有產生  
不同頻率clk的裝置

# Minimum Operating Point

- Decide the minimum CPU clock speed that meets the workload requirements
- Decide the lowest supply voltage that will support that clock speed



Ref:[2]

## Heterogeneous Integration

P. 28

# DVFS Operation ( $F_{NEW} > F_{OLD}$ )

- The target clock is higher than current clock

- Supply voltage increases 先增加voltage看頻率能不能夠上來，  
如果可以再將頻率調高
- Clock frequency increases



## Heterogeneous Integration

P. 29

# DVFS Operation ( $F_{NEW} < F_{OLD}$ )

- The target clock is lower than current clock.

- Clock frequency decreases 調降時要先調整頻率，再把電壓調降  
先調降電壓就會變成用低電壓跑高頻，可能導致design輸出錯誤value
- Supply voltage decreases



## Heterogeneous Integration

P. 30

# DVFS Implementation Issues

- Determining which voltages and clock values to support
  - Too few operating points 提供太少效果不顯著
    - Spend a significant time ramping between two levels
    - The energy saving efficiency during the ramping times are less than the steady-state values
  - Too many operating points 提供太多種會導致很多時間都花在切換狀態
    - Power supply will spend most of time “hunting” between different target voltage levels



## Heterogeneous Integration

P. 31

# Clock Speed/Supply Voltage Values

- Rules to determine the number of operating points
  - What are the appropriate clock frequencies for different workloads
  - Which frequencies have clock periods that are multiples of the PLL period
    - Just change the clock divider, not the PLL frequency 如果需要的頻率都是整數倍只需要改變clk divider就好了
  - What voltage is required to support each target frequency
    - 如果不是的話，就需要去看PLL能夠調整出那些頻率，再去進一步設計DVFS的組合
- Refine the selection of operating points
  - An FPGA implementation
  - High level simulation model
    - 通常會透過轉成FPGA或是跑high level synthesis去進行測試

## Heterogeneous Integration

P. 32

# Other DVFS Issues

---

- Determining the minimum voltage to meet a particular (sub)system performance level
- Achieving timing closure over a range of voltages and clock speeds
- Control sequencing (power management)
- Generate required clock speed/supply voltage values
- Verification across different situations
  - Especially at transition cases DVFS在狀態切換時的驗證很難做，這也就是為什麼常常電腦明明在待機結果待一待就死了

## Heterogeneous Integration

P. 33

## Outline

---

- Introduction
- Multiple Power Domains
- Dynamic Voltage and Frequency Scaling
- Clock Gating
- Dual Threshold Voltages
- Power Gating
- Conclusion

## Heterogeneous Integration

P. 34

# Clock Gating (1/2)

- Up to 50% or even more of the dynamic power in a chip is in the distribution network of the clock
  - Clock buffers have the highest toggle rate in the system
  - Typically there are lots of clock in a design
  - Clock buffers often have a high driving strength to minimize clock delay
- The most common technique to reduce this power is to turn clocks off when they are not required → Clock Gating
  - No need for Muxes to re-circulate the data for these flip-flops



clk是最耗power的地方  
--> toggle rate最高

## Heterogeneous Integration

P. 35

# Clock Gating (2/2)

- Much more effective if the same gating function is applied to large set of registers



## Heterogeneous Integration

P. 36

# How Clock Gating Works ?

- In the original RTL, the register is updated or not depending on a variable (EN)
- The same result can be achieved by gating the clock based on the same variable

做clk gating不能直接寫在verilog code中，他會合成出一個mux，然後實際上clk沒有被gating



## Heterogeneous Integration

Ref:[2]  
P. 37

## Issues of Clock Gating

- Clock gating also idle the succeeding functional units
- Need logic to generate **disable** signal
  - Increase the complexity of control logic 需要額外的控制訊號來gating clk
  - Timing critical to avoid glitches at OR gate output gating clk的時機沒有掌握好 容易產生glitch
- Additional gate delay on clock signal 產生額外的delay
  - Gated OR gate can replace a buffer in the clock distribution tree
- May generate an extra clock edge while **disable** is deasserted

此negedge與clk不同步 (clock=0, disable=1) → 1 ; (clock=0, disable=0) → 0

- Should be careful about the deassertion time



把clk tree中的一些buffer換成gated-OR，平時當buffer，必要時可以當clk gated的gate用

## Heterogeneous Integration

Deassertion problem:  
以gated-OR為例子，一開始disable=1，clk跳動不會影響FF  
但是當clk=0時，我們將disable=0，這樣FF會收到一個negedge的clk，可能導致錯誤

P. 38

# Clock Gating Levels

- **Fine Grain** 精細控制每個FF的clk
  - Portions of the pipeline registers are disabled depending on whether the information they hold is used in the next stage
- **Medium Grain**
  - Disable cache pre-charging during cache miss
- **Coarse Grain** 粗略的控制
  - Eliminate switching of the clock's main driver
- **Depending on the target applications**

## Heterogeneous Integration

P. 39

# Gated Clock Distribution

- If the paths are perfectly balanced, clock skew is zero
  - Can insert clock gating at multiple levels in clock tree
  - Can shut off entire sub-tree if all gating conditions are satisfied



## Heterogeneous Integration

P. 40

# Clock Gating in the Design Flow



Ref:[3]

P. 41

## Heterogeneous Integration

# Clock Gating Example



Ref:[3]

P. 42

## Heterogeneous Integration

# Other Clock Gating Issues

---

- **Where and when can we gate the clock?**
  - Find suitable gating functions for many latches
- **Need careful checking to prevent the function being changed** 通常做完clk gating後會跑formal verification檢驗電路功能是否維持不變
  - Formal equivalence checking may be a solution
  - Scalability of sequential equivalence checking is a problem
- **Do we really reduce power?**
  - Need better power estimation capabilities to consider clock gating effects

## Heterogeneous Integration

P. 43

## Outline

---

- **Introduction**
- **Multiple Power Domains**
- **Dynamic Voltage and Frequency Scaling**
- **Clock Gating**
- **Dual Threshold Voltages**
- **Power Gating**
- **Conclusion**

## Heterogeneous Integration

P. 44

# Leakage Power Reduction

- Leakage power dissipation has become a major part in advance process
- Reducing  $V_t$  can reduce the leakage current
  - Also reducing performance



SiO<sub>2</sub> Lkg : Gate Oxide Tunneling Leakage  
SD Lkg : Sub-threshold Leakage

Ref: [4]

P. 45

# Dual-VT Design for Leakage Control

- “Dual VT” is a technique to minimize the total number of fast, leaky low-VT cells by deploying them only when required to meet timing
- Dual-VT can increase timing slack and also can reduce the leakage power



# Heterogeneous Integration

P. 46

# Dual-Threshold Voltage (Dual-VT)

- Dual Threshold Voltage CMOS (DVTCMOS)
  - Use lower threshold for devices within the critical paths
  - Use higher threshold for devices outside the critical paths

→ Decrease leakage power without performance penalty



Ref: [4]

## Heterogeneous Integration

P. 47

## Mixed-Vt Gates

以NAND gate來看，可以發現nmos的路徑會比較長，因此nmos採用low-Vt, pmos用high-Vt

- Preserve the delay while decreasing the leakage



0→1 delay  
(Output from GND to VDD)

1→0 delay  
(Output from VDD to GND)

$$\rightarrow \text{delay}_{0 \rightarrow 1} < \text{delay}_{1 \rightarrow 0}$$



Typical timing analysis  
→ only maximum delay is considered!!!!

Slow down the rising case  
 $\rightarrow \text{delay}_{0 \rightarrow 1} = \text{delay}_{1 \rightarrow 0}$

Ref: [4]

## Heterogeneous Integration

P. 48

# Example : Mixed-Vt Gates (NAND2)



Ref: [4]

## Heterogeneous Integration

P. 49

## Summary of Dual-Vt CMOS

- **High-Vt cell VS low-Vt cell**
  - High-Vt cell has **lower leakage current** and **lower performance**
  - Low-Vt cell has **higher performance** and **higher leakage current**
- **Usually, there is a minimum performance which be met before optimizing power**
  - First, synthesizing with the highest performance, using low-Vt library
  - Then, swapping the cells on non-critical paths to low-leakage cells because their performance degradation will not affect the overall speed
- **Library supports for different Vt is an extra cost**
  - Similar to preparing two different library sets
- **Mixed-Vt gates combines the advantages of DVTCMOS at transistor level**
  - The cell library with Mixed-Vt gates is not popular yet

## Heterogeneous Integration

P. 50

# Outline

---

- Introduction
- Multiple Power Domains
- Dynamic Voltage and Frequency Scaling
- Clock Gating
- Dual Threshold Voltages
- **Power Gating**
- Conclusion

## Heterogeneous Integration

P. 51

## Power Profiles (No Power Gating)

---

- SLEEP events initiate entry to the low power mode
- WAKE events initiate entry to active mode



普通的IDLE模式讓design在沒事的時候可以待機，但是一有事就要馬上醒來  
雖然可以有效減少power，但還是不夠多

## Heterogeneous Integration

P. 52

# Power Profiles (Power Gating)

- Power gating further reduces the standby power
- Require longer wakeup time in the WAKE event



## Heterogeneous Integration

# Block Diagram of Power Gating Design

- The basic components are composed with
  - Power switching fabric
  - Isolation cell
  - Power gating controller



## Heterogeneous Integration

要作為power gated 開關的mos必須為high-Vt，因為他必須承受一整個區塊的電流，因此我們必須將其電阻降低(避免IR drop)，降低電阻必須將此mos面積放大，而漏電流與面積成正比，因此選擇high-Vt的mos會比較好

## Multi-Threshold-Voltage CMOS (MTCMOS)

- High-V<sub>th</sub> power switches are controlled by SLEEP signal
  - Low-V<sub>th</sub> logic gates are used to achieves high performance
  - Reduces leakage power dramatically due to the series-connected high-V<sub>th</sub> power switch header/footer選一個做就可以
- Typically only a **header** or a **footer** sleep transistor is used.
- A single sleep transistor may be shared among several logic gates



## Heterogeneous Integration

P. 55

## Sleeper Transistor (MTCMOS)

### • Header 放在VDD

The internal nodes and outputs of a power gated block collapse down to the ground rail when the switch is turned off

### • Footer 放在GND

The internal nodes and outputs all charge to the supply rail when the switch is turned off



## Heterogeneous Integration

P. 56

# Tradeoff on Sleep Transistor Sizing

- Maximize the saving of power consumption
- Minimize the impact of performance



## Heterogeneous Integration

A 2-stage pipelined 40-bit ALU (IBM)

P. 57

# Granularity of Power Switches

- **Fine Grain** 每個cell都做power gated
  - The switch is placed inside each cell
  - Better control, but larger area overhead
- **Coarse Grain** 一個區塊共用power gated
  - Connected on the permanent power
  - Easier to implement, but have performance penalty during power-up



## Heterogeneous Integration

P. 58

# Arrangement of Power Switches

- **Ring Style**

- A ring of switches connects VDD to a switched or virtual VDD power mesh that covers the power gated block  
直接在design外面繞一圈(應該就是power ring內的一層)
- Simpler power plan, but larger area overhead

- **Grid Style** 把power gated整在design裡面 · 面積較小但會增加繞線難度

- The sleep transistors are distributed throughout the power gated region.
- Less area and performance overhead, but increase routing complexity



Heterogeneous Integration

P. 59

## Signal Isolation

做完power gated後，某些gate的output會變成floating，因此我們要設計Clamped cell阻止floating訊號流動

- **Without power supply, the output signals will become floating**
  - Isolation cells are required to stop the propagation of those signals
- **For timing consideration, the isolation transistor can be used to clamp the output signal**



透過調整普通AND gate的sizing，使其在一端為0，一端為floating的狀態下仍能輸出0



OR也一樣



Heterogeneous Integration

P. 60

# State Retention and Restoration

---

- Before shutting down the power supply, the register states must be kept for future restoration  
切斷power後 · register中的data會消失 · 因此我們要先將必要的register的data存起來
- To resume the operation of the gated blocks
  - The retention state must be restored when the block is powered up
- State retention and restoration strategies 有三種approach
  - A software approach that reads and writes registers
  - A scan-based approach that uses scan chains to store states off chip
  - A register-based approach that uses built-in retention registers

## Heterogeneous Integration

P. 61

## Software Approach

---

- Software approach flow
  - During the power shutdown sequence, a processor reads the registers of the power gated blocks 利用processor執行軟體將register中的data讀出後儲存
  - The states are stored in the processor's memory
  - During the power up sequence, the processor reads its memory and write the states back into the power gated blocks
- Drawbacks memory bus會有traffic · 因此會需要一段時間才能執行完
  - The bus traffic slows the power down and up sequence
  - Bus conflicts can make the store/restore times non-deterministic
  - Software must be written and integrated into the system's software for handling power down and up

## Heterogeneous Integration

P. 62

# Scan-Based Approach

- Reusing scan chains for manufacturing test to perform state retention has no area overhead
- Scan-based approach flow 將register中的data shift出來
  - During power down, the registers are shifted out as in scan testing mode and stored into the memory
  - During power up, the memory states are shifted in by scan chains
- Shifting the register states out and back is extra energy cost  
但是shift register很耗power，而且一個bit一個bit shift其實花的時間也滿長的  
(但還是比軟體快)



## Heterogeneous Integration

P. 63

# Register-Based Approach

- Build in a “shadow” register to preserve the register states during power down and restore it at power up
  - The shadow register is always powered on for state retention
- Easier for implementation, but have larger area overhead



## Heterogeneous Integration

P. 64

# Power Control Sequence (1/2)

- **To shut down power** 關機步驟

- Flush through any bus or external operations in progress 儲存bus data
- Stop the clocks in the appropriate phase to minimize leakage into the power-gated region 停止clk
- Assert the isolation control signal to park all outputs in a safe conditions 將isolation control signal打開 · 確定data不會再傳出去
- Assert the state retention save condition 備份register data
- Assert reset to the block, so that it powers up in the reset condition 清除register data
- Assert the power gating control signal to power down the block 執行power gating



## Heterogeneous Integration

P. 65

# Power Control Sequence (2/2)

- **To restore power**

- De-assert the power gating control signal to restore power back 將power打開
- De-assert reset to ensure clean initialization following the gated power-up 對register做reset
- Assert the state retention restore condition 將data load 回register
- De-assert the isolation control signal to restore all outputs
- Restart the clocks



## Heterogeneous Integration

P. 66

# Power-On Rush Current

- There is a large current rush in sleep-to-active transition which can cause EM and IR-drop issues
  - Also called “in-rush current”
- Using parallel (mother-daughter) sleep transistors can reduce rush current 將充電路徑用多條一點 分散電流
  - Optimizing the ratio of the daughter and mother transistor widths
  - Scheduling the turn-on times of the two switches so as to minimize the wakeup delay
- Power-up scheduling is another useful technique 分段打開不同區塊的 power gate
  - Scheduling the turn-on times of different blocks to avoid current rush and reduce the peak current



## Heterogeneous Integration

P. 67

# Summary of Power Gating Approach

- Power gating design can reduce the leakage power when the block is powered off power gated除了可以降低leakage power , dynamic power也會降低
  - Using High-VT transistors to limit the leakage current
- Power gating design is composed of
  - Power switch fabric
  - Isolation and retention cell
  - Power controller
- Performance degradation should be prevented 因為power gated會加入一些額外的cell , 因此對performance也會造成一定的影響
  - Performance becomes worse because IR-drop effects on normal operation mode
  - In-rush current during power-up can cause extra supply noise issues

## Heterogeneous Integration

P. 68

# Conclusion

---

- **Low-power design is a major trend on today's applications**
  - Portable devices, medical devices, bio-electronics, ...
  - Low-power techniques at higher design levels can reduce more power
- **Low-power techniques mentioned in this slide**
  - Reduce supply voltage ( $V_{dd}\downarrow$ )
    - Multiple power domains
  - Reduce signal transitions ( $f\downarrow$ )
    - Dynamic voltage and frequency scaling (DVFS)
    - Clock gating
  - Use high-Vt cells for lower leakage power ( $\text{leakage}\downarrow$ )
    - Dual threshold voltages, Mixed-Vt gates
  - Shut down power supply when not used ( $V_{dd}\downarrow$ ,  $\text{leakage}\downarrow$ )
    - Power gating (MTCMOS)
- **Power management is important to control those mechanisms**

## Heterogeneous Integration

P. 69

---

# Reference

---

- [1] Anand Raghunathan, Niraj K. Jha, Sujit Dey, “High-level power analysis and optimization” Boston : Kluwer Academic, c1998.
- [2] Michael Keating, David Flynn, Rob Aitken, Alan Gibbons, Kaijian Shi, “Low Power Methodology Manual: For System-on-Chip Design “ Berlin, Germany : Spring-Verlag, 2007.
- [3] Pokhrel, K., “Physical and Silicon Measures of Low Power Clock Gating Success: An Apple to Apple Case Study”, SNUG, 2007.
- [4] Frank Sill, Claas Cornelius, Stephan Kubisch, Dirk Timmermann, “Mixed Gates: Leakage Reduction techniques applied to Switches for Networks-on-Chip”, ReCoSoC, 2006.
- [5] Karen Yorav, “The challenges of low power design”, Tutorial , Haifa Verification Conference 2008.

## Heterogeneous Integration

P. 70