



# *Digital Integrated Circuits*

## *A Design Perspective*

Jan M. Rabaey  
Anantha Chandrakasan  
Borivoje Nikolic

# Design Methodologies

*December 10, 2002*

# The Design Productivity Challenge



Source: sematech97

# *A Simple Processor*



# A System-on-a-Chip: Example



*Courtesy: Philips*

# *Impact of Implementation Choices*



# Design Methodology



- Design process traverses iteratively between three abstractions: behavior, structure, and geometry
- More and more automation for each of these steps

# *Implementation Choices*



# *The Custom Approach*



*Intel 4004*

# *Transition to Automation and Regular Structures*



**Intel 4004 ('71)**



**Intel 8080**



**Intel 8085**



**Intel 8286**



**Intel 8486**

# Cell-based Design (or standard cells)



Routing channel requirements are reduced by presence of more interconnect layers

# *Standard Cell — Example*



[Brodersen92]

# **Standard Cell – The New Generation**



*Cell-structure  
hidden under  
interconnect layers*

# Standard Cell - Example



| Path          | 1.2V - 125°C         | 1.6V - 40°C          |
|---------------|----------------------|----------------------|
| $In1-t_{pLH}$ | $0.073+7.98C+0.317T$ | $0.020+2.73C+0.253T$ |
| $In1-t_{pHL}$ | $0.069+8.43C+0.364T$ | $0.018+2.14C+0.292T$ |
| $In2-t_{pLH}$ | $0.101+7.97C+0.318T$ | $0.026+2.38C+0.255T$ |
| $In2-t_{pHL}$ | $0.097+8.42C+0.325T$ | $0.023+2.14C+0.269T$ |
| $In3-t_{pLH}$ | $0.120+8.00C+0.318T$ | $0.031+2.37C+0.258T$ |
| $In3-t_{pHL}$ | $0.110+8.41C+0.280T$ | $0.027+2.15C+0.223T$ |

3-input NAND cell  
(from ST Microelectronics):  
 $C$  = Load capacitance  
 $T$  = input rise/fall time

# Automatic Cell Generation



Initial transistor geometries



Placed transistors



Routed cell



Compacted cell



Finished cell

# A Historical Perspective: the PLA



# Two-Level Logic

$$f_0 = x_0x_1 + \bar{x}_2$$

$$f_1 = x_0x_1x_2 + \bar{x}_2 + \bar{x}_0\bar{x}_1$$

Every logic function can be expressed in sum-of-products format (AND-OR)

*minterm*



Inverting format (NOR-NOR) more effective

$$\bar{f}_0 = \overline{\overline{x_0} + \overline{x_1}} + \bar{x}_2$$

$$\bar{f}_1 = \overline{\overline{x_0} + \overline{x_1} + \overline{x_2}} + \bar{x}_2 + \overline{(x_0 + x_1)}$$

# *PLA Layout – Exploiting Regularity*



# Breathing Some New Life in PLAs

## River PLAs

- A cascade of multiple-output PLAs.
- Adjacent PLAs are connected via river routing.



- No placement and routing needed.
- Output buffers and the input buffers of the next stage are shared.

# Experimental Results

## Area:

|                  |       |
|------------------|-------|
| RPLAs (2 layers) | 1.23  |
| SCs (3 layers) - | 1.00, |
| NPLAs (4 layers) | 1.31  |

## Delay

|       |      |
|-------|------|
| RPLAs | 1.04 |
| SCs   | 1.00 |
| NPLAs | 1.09 |

**Synthesis time:** for RPLA , synthesis time equals design time;  
SCs and NPLAs still need P&R.

**Also: RPLAs are regular and predictable**



## Layout of C2670



Standard cell,  
2 layers channel routing



Standard cell,  
3 layers OTC



Network of PLAs,  
4 layers OTC



River PLA,  
2 layers no additional routing

# *MacroModules*



256×32 (or 8192 bit) SRAM  
Generated by hard-macro module generator

# “Soft” MacroModules



```
string mat = "booth";
directive (multtype = mat);
output signed [16] Z = A * B;
```



# *“Intellectual Property”*



*A Protocol Processor for Wireless*

# Semicustom Design Flow



# The “Design Closure” Problem



*Iterative Removal of Timing Violations (white lines)*

# *Integrating Synthesis with Physical Design*



# *Late-Binding Implementation*



# Gate Array — Sea-of-gates



# *Sea-of-gate Primitive Cells*



Using oxide-isolation



Using gate-isolation

# Example: Base Cell of Gate-Isolated GA



# *Example: Flip-Flop in Gate-Isolated GA*



# *Sea-of-gates*



Random Logic

Memory  
Subsystem

LSI Logic LEA300K  
(0.6  $\mu$ m CMOS)

# *The return of gate arrays?*

Via programmable gate array  
(VPGA)



*Exploits regularity of interconnect*

# Prewired Arrays

Classification of prewired arrays (or field-programmable devices):

- Based on Programming Technique
  - Fuse-based (program-once)
  - Non-volatile EPROM based
  - RAM based
- Programmable Logic Style
  - Array-Based
  - Look-up Table
- Programmable Interconnect Style
  - Channel-routing
  - Mesh networks

# Fuse-Based FPGA



*Open by default, closed by applying current pulse*

# Array-Based Programmable Logic



PLA



PROM



PAL

- ⊕ Indicates programmable connection
- ♦ Indicates fixed connection

# Programming a PROM



$$f_0 = x_0x_1 + \bar{x}_2$$

$$f_1 = x_0x_1x_2 + \bar{x}_2 + \bar{x}_0x_1$$

# More Complex PAL



*i* inputs, *j* minterms/macrocill, *k* macrocells

# *2-input mux as programmable logic block*



| Configuration |   |   | $F =$              |
|---------------|---|---|--------------------|
| A             | B | S |                    |
| 0             | 0 | 0 | 0                  |
| 0             | X | 1 | X                  |
| 0             | Y | 1 | Y                  |
| 0             | Y | X | XY                 |
| X             | 0 | Y | $\overline{XY}$    |
| Y             | 0 | X | $\overline{XY}$    |
| Y             | 1 | X | $X \overline{1} Y$ |
| 1             | 0 | X | $\overline{X}$     |
| 1             | 0 | Y | $\overline{Y}$     |
| 1             | 1 | 1 | 1                  |

# *Logic Cell of Actel Fuse-Based FPGA*



# *Look-up Table Based Logic Cell*



# LUT-Based Logic Cell

Figure must be updated



Xilinx 4000 Series

# *Array-Based Programmable Wiring*



# Mesh-based Interconnect Network



# *Transistor Implementation of Mesh*



# Hierarchical Mesh Network



**Use overlayed mesh  
to support longer connections**

**Reduced fanout and reduced  
resistance**

# EPLD Block Diagram

Primary inputs

Macrocell



# Altera MAX



# Altera MAX Interconnect Architecture



**Array-based  
(MAX 3000-7000)**



**Mesh-based  
(MAX 9000)**

# Field-Programmable Gate Arrays

## Fuse-based



# Xilinx 4000 Interconnect Architecture



# *RAM-based FPGA*



Xilinx XC4000ex

# A Low-Energy FPGA (UC Berkeley)



- Array Size: 8x8 (2 x 4 LUT)
- Power Supply: 1.5V & 0.8V
- Configuration: Mapped as RAM
- Toggle Frequency: 125MHz
- Area: 3mm x 3mm

# Larger Granularity FPGAs

## PADDI-2 (UC Berkeley)



- 1-mm 2-metal CMOS tech
- $1.2 \times 1.2 \text{ mm}^2$
- 600k transistors
- 208-pin PGA
- $f_{\text{clock}} = 50 \text{ MHz}$
- $P_{\text{av}} = 3.6 \text{ W} @ 5V$
- Basic Module: Datapath

# *Design at a crossroad*

## System-on-a-Chip



- Embedded applications where cost, performance, and energy are the real issues!
- DSP and control intensive
- Mixed-mode
- Combines programmable and application-specific modules
- Software plays crucial role

# *Addressing the Design Complexity Issue*

## *Architecture Reuse*

Reuse comes in generations

| <i>Generation</i>     | <i>Reuse element</i>  | <i>Status</i>           |
|-----------------------|-----------------------|-------------------------|
| <b>1<sup>st</sup></b> | <b>Standard cells</b> | <b>Well established</b> |
| <b>2<sup>nd</sup></b> | <b>IP blocks</b>      | <b>Being introduced</b> |
| <b>3<sup>rd</sup></b> | <b>Architecture</b>   | <b>Emerging</b>         |
| <b>4<sup>th</sup></b> | <b>IC</b>             | <b>Early research</b>   |

Source: Theo Claasen (Philips) – DAC 00

# Architecture ReUse

- Silicon System Platform
  - Flexible architecture for hardware and software
  - Specific (programmable) components
  - Network architecture
  - Software modules
  - Rules and guidelines for design of HW and SW
- Has been successful in PC's
  - Dominance of a few players who specify and control architecture
- Application-domain specific (difference in constraints)
  - Speed (compute power)
  - Dissipation
  - Costs
  - Real / non-real time data

# *Platform-Based Design*

**“Only the consumer gets freedom of choice;  
designers need freedom *from* choice”**  
**(Orfali, et al, 1996, p.522)**

- A platform is a **restriction on the space of possible implementation choices**, providing a well-defined abstraction of the underlying technology for the application developer
- New platforms will be defined at the **architecture-micro-architecture boundary**
- They will be **component-based**, and will provide a range of choices from structured-custom to fully programmable implementations
- Key to such approaches is the **representation of communication** in the platform model

# Berkeley Pleiades Processor



- 0.25um 6-level metal CMOS
- 5.2mm x 6.7mm
- 1.2 Million transistors
- 40 MHz at 1V
- 2 extra supplies: 0.4V, 1.5V
- 1.5~2 mW power dissipation

# Heterogeneous Programmable Platforms



# *Summary*

- Digital CMOS Design is kicking and healthy
- Some major challenges down the road caused by Deep Sub-micron
  - Super GHz design
  - Power consumption!!!!
  - Reliability – making it work

Some new circuit solutions are bound to emerge
- Who can afford design in the years to come?  
Some major design methodology change in the making!