

# Power Aware Design for Next Generation's Many cores Computing Platforms

Dr. Roberto Zafalon<sup>1</sup>

**Abstract** - We describe a comprehensive set of design techniques, applicable at different levels of abstraction that have proven to bring great potential for power optimization in industrial embedded Multi-Processing platforms.

**Index Terms** - Ultra low power computing platform, power optimization, low power design, leakage power, system-level energy optimization, Network on Chip, Multi Processing architecture.

## I. Introduction

The rapid advances of nomadic computing applications, pushed by the fast-evolving technology for wireless network connectivity (Wireless LAN 802.11/WiFi standards, 802.16/WiMAX metropolitan area networks, mobile DVB, etc.) and by the need to provide personalized, context-aware multimedia services, are driving the evolution of mobile terminals into complex multimedia platforms whose ultimate integration constraint is the max power budget. The application's requirements for higher computation features must be met by increasing performance of Many Cores SoC architectures, while at the same time ensuring energy efficiency and high flexibility on a wide range of application contexts. Today's highly complex parallel-computing architectures, together with extremely high clock speeds, even in the embedded world, have contributed to create a "power crisis", as every 1% improvement in processor performance has historically brought a 3% increase in power consumption, leading single thread processors to simply consume too much power (with consequent cooling cost and reliability concerns) to be affordable.

**As of today, the CMOS Roadmap has 3 main technology show-stoppers:**

1. Sub-threshold Leakage Current ( $I_{off}$ )
2. Huge Process Variation Spread
3. Interconnect Performance and SI

<sup>1</sup>European Projects Director, STMicroelectronics,  
European R&D and Public Affairs.  
Via Olivetti 2, 20041 Agrate Brianza, Italy

## II. Needs for Ultra Low Leakage Design Techniques

In the last decade, huge effort has been invested to come up with a wide range of design solutions that help in solving the dynamic power consumption problem for different types of electronic devices, components and systems. However, starting from 90 nm and below, the traditionally dominating component of leakage power, i.e. the subthreshold current ( $I_{sub}$ ) which grows by 5X per technology node, begun to be threaten by the tunneling current through the gate oxide ( $I_{gate}$ ) that, although is still smaller in absolute numbers, grows at a much faster pace: 500X per technology node. Many approaches, which sounded promising to attack leakage on paper, showed a lot of con's and side-effects at 90, 65 and 45nm. In this session we will overview some of the most appealing platform design techniques and architectural strategy to address the leakage power issue in real life's nanoscale Systems on Chip. The number of Digital Processing Elements (DPE) in embedded systems has been rapidly growing in the last 3 years. The requirement for the processing performance will rapidly grow more than 200 times in the next 15 years (drivers will be 3DTV, 3D Gaming, ubiquitous Navigation, autonomous car driving). In this work, the ST Platform 2012 is introduced, targeting a flexible yet high performance ManyCore computing architecture. P2012 is a high-performance programmable accelerator whose architecture meets requirements for next generation SoC products at 32nm and beyond.

The goal of P2012 is two fold:

- provide flexibility through massive programmable and scalable computing capability,
- Provide solid ways of dealing with increasing ultra low power constraints and manufacturability issues.

In order to achieve these two goals, we will use the following key enablers:

- 3D stacking techniques allowing to maximize efficiency of memory hierarchy organization (i.e. L1+L2 cache)
- Application driven specialization.
- Ultra low power design techniques

We believe that the combination of these design dimensions into an architecture proposal can offer a viable alternative to the inefficient mix of heterogeneous HW/SW components the industry is using today. P2012 aims at designing and prototyping a regular computing fabric able to improve manufacturing yield while achieving high throughput and substantial energy saving, in terms of both dynamic and static power.

Organized around an efficient Network-on-Chip communication infrastructure, P2012 allows connecting a large number of decoupled STxP70 Symmetric Multi Processor clusters (SMP), offering an excellent flexibility, scalability and high computation density. P2012 offers a software environment that allows firmware developers to master the inherent complexity of parallel programming. It is designed to provide an easy migration path to product divisions, both from a SW and a HW viewpoint. The following figure shows the hardware template around which the platform is built.

Important solutions to achieve the required goals include various design technologies to maximize the device

performance while minimizing the power budget. High-level synthesis methodologies are of course important, while a very considerate design technology particularly in Logic-Circuit-Physical design stage is highly desirable. On the other hand, the rapid power consumption growth will have a critical impact on chip packaging and cooling, without mentioning battery life issue in case of mobile applications. Leakage power issue will increase much more than expected at typical trends, because of process variability and temperature effects.

We describe a comprehensive set of design techniques, applicable at different levels of abstraction that have proven to bring great potential for leakage optimization in industrial design environments. They range from gate/circuit level (e.g. dual V<sub>th</sub>, MTCMOS, sleep transistor insertion, hierarchical clock-gating), to memory blocks (e.g. array partitioning, sub-banking, bit line splitting, cache decay, drowsy state memory, exploit locality, etc) and architectural styles (e.g. region-based adaptive V<sub>dd</sub> and Body Biasing, V<sub>th</sub> hopping, Power gating, etc.). A selection of significative industrial solutions obtained by the application of low-power



techniques to proprietary designs covering different application domains (including high-performance microprocessors, memory/cache structure and hardware platforms for embedded multi-media processing) will be reported as well.

## References

- [1] K. Roy et al. "Leakage Current Mechanisms and Leakage Reduction Techniques in Deep-Submicrometer CMOS Circuits," *Proceedings of the IEEE*, Vol. 91, N 2, 2003.
- [2] F.Fallah, M.Pedram, "Standby and Active Leakage Current Control and Minimization in CMOS VLSI Circuits," *IEICE Trans. on Electronics*, Vol. E88-C, 2005.
- [3] A. Ramaligam et al. "Sleep transistor sizing using timing criticality and temporal currents," *10th ASPDAC*, 2005.
- [4] A. Macii, L. Benini, M. Poncino, Memory Design Techniques for Low-Energy Embedded Systems, Kluwer Academic Publishers, Boston, MA, 2002.
- [5] "International Technology Roadmap for Semiconductors 2007 Edition.", <http://public.itrs.net>.
- [6] F. Angiolini, L. Benini, A. Caprara, "An Efficient Profile-Based Algorithm for Scratchpad Memory Partitioning", *IEEE Trans on CAD*, Nov 2005, Vol.24, No. 11, pp. 1660
- [7] S.Kaxiras, Z.Hu, M.Martonosi, "Cache decay: Exploiting generational behavior to reduce cache leakage power," *ISCA'01: Int. Symp. Computer Architecture*, June 2001.
- [8] N. Concer, S. Iamundo and L. Bononi, "aEqualized: A Novel Routing Algorithm for the Spidergon Network on Chip", *IEEE DATE-09*, p. 749.
- [9] K. Srinivasan, K. S. Chatha, and G. Konjevod, "Application specific network-on-chip design with guaranteed quality approximation algorithms," in *Proceedings of ASP-DAC'07*.
- [10]A. Marongiu, L. Benini, and M. Kandemir. Lightweight barrierbased parallelization support for non-cache-coherent mpsoc platforms. In *Proceedings of CASES '07*, pages 145–149, New York, NY, USA, 2007. ACM.