

# DfX for Nanoelectronic Embedded Systems

Saraju Mohanty

NanoSystem Design Laboratory (NSDL)

Dept. of Computer Science and Engineering

University of North Texas, Denton, TX 76203, USA.

Email: [saraju.mohanty@unt.edu](mailto:saraju.mohanty@unt.edu)

A green light to greatness.

12/18/2013



UNT



# Computing Evolution

# Ancient Computing Machines -- Mechanical



2400 BC

- The abacus
- The first known calculator
- Invented in Babylonia



1832 AD

- The Babbage Difference Machine
- Tabulated polynomial functions
- Invented in Britain

A green light to greatness.

12/18/2013



UNT

# The First Electronic Computer



1946

- **ENIAC** -- The first electronic general-purpose computer.
- Turing-complete, digital, and programmable.
- Invented in USA.

A green light to greatness.

# Current Computing Systems



Desktop PC



Laptop or  
Notebook PC



Tablet



Smart  
Phone

Slate PC

A green light to greatness.

# Smallest Single-Board Computers



Raspberry Pi



BeagleBone

A green light to greatness.





# The Workhorses

# Variety of Integrated Circuits or Chips?



Low-Cost ASIC



Communication Chip



Secure Media Processor



Intel Core i7 LGA1366 processor has 1366 pins.



ADC Chip

A green light to greatness.

# Intel Haswell Chip -- 2013

## 4th Generation Intel® Core™ Processor Die Map 22nm Tri-Gate 3-D Transistors



Quad core die shown above

Transistor count: 1.4 Billion

Die size: 177mm<sup>2</sup>

\*\* Cache is shared across all 4 cores and processor graphics

A green light to greatness.



UNT

# GPU with Highest Transistor Count



Nvidia GK110 has  
7.1 billion  
transistors of a  
28nm technology.

Source: <http://www.tomshardware.com/news/nvidia-tesla-k20-gk110-gpu,15683.html>

A green light to greatness.



10  
**UNT**

# Processor for Mobile Systems: Essentially AMS-SoCs



NVIDIA's Tegra 2 die

Source: <http://www.anandtech.com>



Snapdragon S4 Block Diagram

Source: <http://www.cnx-software.com>

A green light to greatness.



# The Drivers

# ■ Technology Miniaturization (aka Technology Scaling) Nano

## ■ New Technology (Alternative Devices)



A green light to greatness.



# How Small in Nano??



- "nano" means one-billionth, or  $10^{-9}$
- A sheet of paper is about 100,000 nanometers thick
- A human hair is approx. 100,000 nanometers wide

Source: <http://www.nano.gov/nanotech-101/what/nano-size>

A green light to greatness.

12/18/2013



**UNT**

15

# A Typical Nanoelectronic System



Heterogeneous components with  
millions of nanoscale devices.



High-K  
nano-CMOS



Triple Gate



Graphene  
Nanoribbon

A green light to greatness.



Good and Bad,  
and DFX

# Scaling Reduces Power Dissipation



1 Virtex-7 2000T = 4 Largest Monolithic FPGAs  
19 Watts 112 Watts

Source: <http://low-powerdesign.com/sleibson>

A green light to greatness.



UNT

# Scaling Reduces Cost of Electronics

In 1986: 1.3 megapixels CCD sensor Kodak camera was \$13,000. You can buy now for few dollars.



Nikon D7000  
DSLR camera.

16 MP → \$700

Source: <http://www.lensrentals.com/blog/2012/04/d7000-dissection>

A green light to greatness.



UNT®

# Nanoelectronics : Challenges



A green light to greatness.

12/18/2013



UNT

20

# DfX -- Design for X (aka Design for Excellence)

X = set of IC design challenges

- ❑ Manufacturability
- ❑ Power
- ❑ Variability
- ❑ Cost
- ❑ Yield
- ❑ Reliability
- ❑ Test
- ❑ Debug



Source: ISVLSI 2012 Andrew Kahng Keynote

A green light to greatness.



UNT



# Design for Power (DfP)

# Consumer Electronics Demand More and More Energy

Energy consumption in homes by end uses  
quadrillion Btu and percent



■ space heating   ■ air conditioning   ■ water heating   ■ appliances, electronics, and lighting

Quadrillion BTU (or quad): 1 quad =  $10^{15}$  BTU = 1.055 Exa Joule (EJ).

Source: U.S. Energy Information Administration.

A green light to greatness.



UNT

# Different Electronic Systems: Common Story



- Smarter ... Faster ... High Throughput ...
- Power Hungry !! Battery Hungry !!

# Battery Dependency: Not Overstated



One 787 Battery:  
12 Cells / 32 V DC



- Boeing 787's across the globe were grounded in early 2013.

Source: <http://www.newairplane.com>

A green light to greatness.

# Battery Dependency: Not Overstated



- Great idea: Smartwatch with functioning like smartphone.
- Big Problem: Battery life of one time charging is only 1 day.

Source: <http://www.businessinsider.com>

A green light to greatness.



UNT

# A Typical Electronic System: Where Energy Consumed??



Power of a Mobile System



Power dissipation breakdown in idle  
mode of a connected mobile device

Source: Pering MobiSys 2006

# DfP: Possible Solution Fronts



A green light to greatness.

12/18/2013



**UNT**

32

# DfP: Design of an Universal Level Converter for Dynamic Power Management

A green light to greatness.  
12/18/2013



UNT

# One Example Electronic System: Secure Digital Camera



A green light to greatness.

# Universal Voltage-Level Converter: One Topology



- 20 transistor area efficient design.
- Energy hungry transistors are circled.

- Energy hungry transistors have thicker oxide.
- 90nm CMOS dual-oxide physical design of ULC.



A green light to greatness.

# Universal Voltage-Level Converter: Operations

## Operations of the ULC:

- Level-up conversion
- Level-down conversion
- Blocking of input signal

| Select Signal | Type of Operation |
|---------------|-------------------|
| 0             | Block Signal      |
| 0             | Up Conversion     |
| 1             | Down Conversion   |



A green light to greatness.

12/18/2013

# Universal Voltage-Level Converter: Has Minimal Overhead

| Designs          | Technology (nm) | Power          | Delay    | Conversion              | Design Approach                       |
|------------------|-----------------|----------------|----------|-------------------------|---------------------------------------|
| Ishihara<br>2004 | 130nm           | ---            | 127 ps   | Level-up and down       | Level converting flip flops           |
| Yu<br>2001       | 350nm           | 220.57 $\mu$ W | ---      | Level-up                | SDCVS                                 |
| Sadeghi<br>2006  | 100nm           | 10 $\mu$ W     | 1 ns     | Level-up                | Pass transistor and Keeper transistor |
| ULC              | 90 nm           | 12.26 $\mu$ W  | 113.8 ps | Level-up/down and block | All conversion types and Programmable |

A green light to greatness.



UNT



# Design for Variability (DfV)

# Nanoelectronics Variability ?

- Discrepancy between the chip parameters --  
Design Time versus Actual Post Fabrication



Same Design Fabricated



Each Chip has  
Different Performance



Each Transistor  
is Different



Source: <http://apcmag.com/picture-gallery-how-a-chip-is-made.htm>

A green light to greatness.



UNT

# Process Variation: Parameters



Source–drain resistance is different for different chips in a same die.

Fast [Red, Blue, Green, Yellow, Purple] Slow



Gate-to-source and gate-to-drain overlap capacitance is different for different chips in a same die.

Source: Bernstein et al., IBM J. Res. & Dev., July/Sep 2006.

A green light to greatness.

# Process Variation: The Impact

- Yield Loss
- Reliability Issue
- Higher Cost

A green light to greatness.

12/18/2013



UNT

44

# Process Variation: Sources



**Sophisticated  
Lithography**

A green light to greatness.



**UNT**

# Process Variations : Solution



A green light to greatness.

12/18/2013

# Process Variations Aware Optimization: Key Idea

Histograms



A green light to greatness.

12/18/2013



UNT

49

# DfV: Statistical Nano-CMOS RTL Optimization for Power

A green light to greatness.

12/18/2013



UNT

51

# Nano-CMOS RTL Statistical Optimization



A green light to greatness.

12/18/2013



**UNT**

# Statistical RTL Optimization: Formulation

Minimize:  $FoM_{Total}^{DFG}(\mu_I^{DFG}, \sigma_I^{DFG})$

Subjected to (Resource/Time Constraints):

Allocated( $FU_{k,i}$ )  $\leq$  Available( $FU_{k,i}$ ),  $\forall$  cycle  $c$

$D_{CP}^{DFG}(\mu_D^{DFG}, \sigma_D^{DFG}) \leq D_{Con}(\mu_D^{Con}, \sigma_D^{Con})$

# Statistical RTL Optimization: Results on DSP Benchmarks



(For ARF Benchmark)



(For BPF Benchmark)

A green light to greatness.



UNT



# Design for Cost (\$) (DfC)

# Chip Cost



Source: [http://www.ami.ac.uk/courses/ami4202\\_mdesign/u02/](http://www.ami.ac.uk/courses/ami4202_mdesign/u02/)

# One of the Key Issues: Time/Effort

- The simulation time for a Phase-Locked-Loop (PLL) lock on a full-blown (RCLK) parasitic netlist is of the **order of many days!** → High NRE cost.



PLL



- Issues for AMS-SoC components:**

- How fast can design space exploration be performed?
- How fast can layout generation and optimization be performed?

# Standard Design Flow – Very Slow



- Standard design flow requires multiple manual iterations on the back-end layout to achieve parasitic closure between front-end circuit and back-end layout.
- Longer design cycle time.
- Error prone design.
- Higher non-recurrent cost.
- Difficult to handle nanoscale challenges.

# Automatic Optimization on Netlist (Faster than manual flow; still slow)



- Automatic iteration over netlist improves design optimization.
- Still needs multiple simulations using analog simulator (SPICE).
- SPICE is slow.

# Two Tier Speed Up Through Metamodel



A green light to greatness.

12/18/2013



UNT

63

# Proposed Flow: Key Perspective

- Novel design and optimization methodology that will produce robust AMS-SoC components using **ultra-fast automatic iterations over metamodels** (instead of netlist) and two manual layout steps.
- The methodology easily accommodates multidimensional challenges, reduces design cycle time, improves circuit yield, and reduces chip cost.

A green light to greatness.



UNT

# Metamodel-Based Design Flow



A green light to greatness.

12/18/2013



UNT

# Metamodels : Selected Types



A green light to greatness.

12/18/2013



**UNT**

67

# Metamodels : Polynomial Example



Actual  
Circuit  
(SPICE  
netlist) of  
AMS-SoC  
Components

Statistical  
Sampling



Polynomial  
Function  
Fitting

$$f(W_n, W_p) = 7.94 \times 10^9 + 1.1 \times 10^{16}W_n + 1.28 \times 10^{15}W_p.$$

# Sampling Techniques: 45nm Ring Oscillator Circuit (5000 points)

Monte Carlo



MLHS



LHS



DOE



A green light to greatness.



UNT

# Polynomial Metamodels

- The generated sample data can be fitted in many ways to generate a metamodel.
- The choice of fitting algorithm can affect the accuracy of the metamodel.
- A simple metamodel has the following form:

$$y = \sum_{i,j=0}^k (\alpha_{ij} \times x_1^i \times x_2^j)$$

- $y$  is the response being modeled (e.g. frequency),  $x = [W_n, W_p]$  is the vector of variables and  $\alpha_{ij}$  are the coefficients.

# Metamodel: Polynomial Comparison

| Case Study Circuits                                                         | Polynomial Order | $\mu$ error (in MHz) | $\sigma$ error (in MHz) |
|-----------------------------------------------------------------------------|------------------|----------------------|-------------------------|
| Ring Oscillator<br><b>45nm CMOS</b><br><b>Target <math>f</math> : 10GHz</b> | 1                | 571.0                | 286.7                   |
|                                                                             | 2                | 195.4                | 78.1                    |
|                                                                             | 3                | 37.2                 | 18.0                    |
|                                                                             | 4                | 20.0                 | 10.7                    |
|                                                                             | 5                | 17.1                 | 9.6                     |
| LC-VCO<br><b>180nm CMOS</b><br><b>Target <math>f</math> : 2.7GHz</b>        | 1                | 42.3                 | 40.1                    |
|                                                                             | 2                | 39.4                 | 37.8                    |
|                                                                             | 3                | 35.4                 | 33.9                    |
|                                                                             | 4                | 30.5                 | 29.3                    |
|                                                                             | 5                | 26.5                 | 25.2                    |

Ring oscillator – Order 1

$$\begin{aligned}f(W_n, W_p) = & 7.94 \times 10^9 + 1.1 \times 10^{16} W_n \\& + 1.28 \times 10^{15} W_p.\end{aligned}$$

LC-VCO – Order 1

$$\begin{aligned}f(W_n, W_p) = & 2.38 \times 10^9 - 3.49 \times 10^{12} W_n \\& - 6.66 \times 10^{12} W_p.\end{aligned}$$

A green light to greatness.



UNT

# Artificial Neural Network (ANN) Metamodeling

- Feed-forward dual layer (FFDL) ANNs are considered.
- FFDL ANN created for each FoM:
  - Nonlinear hidden layer functions are considered each varying hidden neurons 1-20:

$$b_j(v_j) = \tanh(\lambda v_j)$$



A green light to greatness.



UNT

# Metamodel Comparison: Polynomial Vs Nonpolynomial

- Nonpolynomial (Artificial Neural Network) is more suitable large circuits.

180nm CMOS PLL with Target Specs:  $f = 2.7\text{GHz}$ ,  $P = 3.9\text{mW}$ ,  $8.5\mu\text{s}$ .

| Figures-of-Merits (FoM) | Polynomial<br># of Coefficients | Nonpolynomial<br>(Neural Network) |
|-------------------------|---------------------------------|-----------------------------------|
| Frequency               | 48                              | 77.96 MHz                         |
| Power                   | 50                              | 2.6mW                             |
| Locking Time            | 56                              | 1.9 $\mu\text{s}$                 |

- 56% increase in accuracy over polynomial metamodels.
- On average 3.2% error over golden design surface.

# Selected Algorithms for Optimization over Metamodels



# Exhaustive Search : 45nm RO



- Searches over two parameter space.
- Parameters incremented over specified steps.

# DOE Assisted Tabu Search: 45nm RO



- Search space is recursively divided into rectangles and each time the rectangle with superior result is selected.

# Comparison of the Running Time of Heuristic Algorithms: 45nm RO



- **Optimization without metamodels:** the tabu search optimization is faster by  $\sim 1000\times$  than the exhaustive search and  $\sim 4\times$  faster than the simulated annealing optimization.
- **Optimization with metamodels:** the simulated annealing optimization is faster by  $\sim 1000\times$  than the exhaustive search and  $\sim 6\times$  faster than the tabu search optimization.

# Case Study Circuit: 180nm PLL



Block diagram of a PLL.

- PLL circuit is characterized for frequency, power, vertical and horizontal jitter (for simple phase noise), and locking time.
- Metamodels are created for each FoM from same sample set.



PLL for 180nm.

# PLL: Polynomial Metamodels ...

- The number of coefficients corresponding to the order of the generated metamodel for settling time.
- This means that the model is over fitted, therefore for the metamodel that represents settling time, a polynomial order of 4 will be used.



A green light to greatness:



UNT

# Artificial Bee-Colony : Overview

1. Initial food sources are produced for all worker bees.
2. Do
  - 1) Each worker bee goes to a food source and evaluates its nectar amount.
  - 2) Each onlooker bee watches the dance of worker bees and chooses one of their sources depending on the dances and evaluates its nectar amount.
  - 3) Determine abandoned food sources and replace with the new food sources discovered by scout bees.
  - 4) Best food source determined so far is recorded.
3. While (requirements are met)

A food source → a solution; A position of a food source → a design variable set; Nectar amount → Quality of a solution; Number of worker bees → number of quality solutions.

# PLL: ABC over Poly. Metamodels

## PLL parameters with constraints and optimized values.

| Circuit        | Parameter   | Min (m) | Max (m) | Optimal Value (m) |
|----------------|-------------|---------|---------|-------------------|
| Phase Detector | $W_{ppd1}$  | 400n    | $2\mu$  | $1.66\mu$         |
|                | $W_{npd1}$  | 400n    | $2\mu$  | $1.11\mu$         |
|                | $W_{ppd2}$  | 400n    | $2\mu$  | $784n$            |
|                | $W_{npd2}$  | 400n    | $2\mu$  | $689n$            |
|                | $W_{ppd3}$  | 400n    | $2\mu$  | $1.54\mu$         |
|                | $W_{npd3}$  | 400n    | $2\mu$  | $737n$            |
| Charge Pump    | $W_{nCP1}$  | 400n    | $2\mu$  | $1.24\mu$         |
|                | $W_{pCP1}$  | 400n    | $2\mu$  | $1.35\mu$         |
|                | $W_{nCP2}$  | $1\mu$  | $4\mu$  | $1.35\mu$         |
|                | $W_{pCP2}$  | $1\mu$  | $4\mu$  | $2.88\mu$         |
| LC-VCO         | $W_{nLC}$   | $3\mu$  | $20\mu$ | $18.62\mu$        |
|                | $W_{pLC}$   | $6\mu$  | $40\mu$ | $37.48\mu$        |
| Divider        | $W_{p1Div}$ | 400n    | $2\mu$  | $1.65\mu$         |
|                | $W_{p2Div}$ | 400n    | $2\mu$  | $1.54\mu$         |
|                | $W_{p3Div}$ | 400n    | $2\mu$  | $1.38\mu$         |
|                | $W_{p4Div}$ | 400n    | $2\mu$  | $1.96\mu$         |
|                | $W_{n1Div}$ | 400n    | $2\mu$  | $1.09\mu$         |
|                | $W_{n2Div}$ | 400n    | $2\mu$  | $1.17\mu$         |
|                | $W_{n3Div}$ | 400n    | $2\mu$  | $1.29\mu$         |
|                | $W_{n4Div}$ | 400n    | $2\mu$  | $1.95\mu$         |
|                | $W_{n5Div}$ | 400n    | $2\mu$  | $536n$            |

- An exhaustive search of the design space of 21 parameters with 10 intervals per parameter requires  $10^{21}$  simulations.
- $10^{21}$  SPICE simulations is slow; 10min per one.
- $10^{21}$  simulations using polynomial metamodels is fast.
- Time savings:  $\approx 10^{20} \times$  SPICE simulation time.

A green light to greatness:



UNT

# PLL: ABC Optimization: Poly Vs ANN

## Optimization Results

| FoM           | Poly. Metamodel | ANN Metamodel |
|---------------|-----------------|---------------|
| Average Power | 3.9 mW          | 3.9 mW        |
| Frequency     | 2.6909 GHz      | 2.7026 GHz    |

## Optimization Time Comparison

| Algorithm                           | Circuit Netlist                                                                                 | Poly. Metamodel                      | ANN Metamodel                                              |
|-------------------------------------|-------------------------------------------------------------------------------------------------|--------------------------------------|------------------------------------------------------------|
| <b>ABC<br/>(100<br/>iterations)</b> | #bees(20) * 5 min *<br>100 iteration = 10,000<br>minutes = <b>7 days</b><br><b>(worst case)</b> | 5 mins                               | 0.12 mins                                                  |
| <b>Metamodel<br/>Generation</b>     | 0                                                                                               | 11 hours for LHS<br>+ 1 min creation | 11 hours for LHS +<br>10mins training<br>and verification. |

A green light to greatness.



**UNT**

# Conclusions

- Nanoelectronic circuits and systems have multifold design challenges.
- DfX is design for X – Power, Variability, Cost ...
- DfP:
  - 35% of total energy in USA is consumed by electronics.
  - Battery is an critical constraint for portable systems.
  - Energy efficient hardware, software at the same time better battery design needed for effective solutions.
- DfV: Reduce the variability in chip and enhance yield.
- DfC: Reduce NRE, yield, and time to market.
- Much more research is needed for combined consideration of issues, e.g. X ← Variability and Cost

# References

- **S. P. Mohanty** and E. Kougianos, "Incorporating Manufacturing Process Variation Awareness in Fast Design Optimization of Nanoscale CMOS VCOs", *IEEE Transactions on Semiconductor Manufacturing*, Accepted on 12 Nov 2013, DOI: <http://libproxy.library.unt.edu:2083/10.1109/TSM.2013.2291112>.
- **S. P. Mohanty**, M. Gomathisankaran, and E. Kougianos, "Variability-Aware Architecture Level Optimization Techniques for Robust Nanoscale Chip Design", *Elsevier International Journal on Computers and Electrical Engineering (IJCEE)*, 2014, DOI: <http://dx.doi.org/10.1016/j.compeleceng.2013.11.026>.
- O. Okobiah, **S. P. Mohanty**, and E. Kougianos, "Geostatistical-Inspired Fast Layout Optimization of a Nano-CMOS Thermal Sensor", *IET Circuits, Devices & Systems (CDS)*, Volume 7, No. 5, September 2013, pp. 253--262.
- O. Garitselov, **S. P. Mohanty**, and E. Kougianos, "A Comparative Study of Metamodels for Fast and Accurate Simulation of Nano-CMOS Circuits", *IEEE Trans. on Semiconductor Manufacturing*, Vol. 25, No. 1, Feb 2012, pp. 26--36.
- S. P. Mohanty, E. Kougianos, and O. Okobiah, "Optimal Design of a Dual-Oxide Nano-CMOS Universal Level Converter for Multi-Vdd SoCs", *Springer Analog Integrated Circuits & Signal Processing J.*, Vol. 72, No. 2, 2012, pp. 451--467.
- O. Garitselov, **S. P. Mohanty**, and E. Kougianos, "Accurate Polynomial Metamodeling-Based Ultra-Fast Bee Colony Optimization of a Nano-CMOS PLL", *Journal of Low Power Electronics*, Vol. 8, No. 3, June 2012, pp. 317--328.

A close-up photograph of a blue integrated circuit (IC) chip. The chip has a central white square with a grid pattern, surrounded by a larger blue area with a grid of small white dots. The chip is mounted on a light-colored surface with visible metal pins or leads.

# Thank You!!!

Slides Available at:

<http://www.cse.unt.edu/~smohanty>