



## Engineering Innovations





<http://www.apple.com/watch/apple-watch-software/>

3



<http://q.intel.com/introducing-mica-my-intelligent-communication-accessory/>



4

## CPU/SOC design



# Moore's Law

- Each new process generation doubles the number of transistors available to architects and designers
- Some of this increase is consumed by larger structures (caches, TLB, etc.)
- The rest goes to increased complexity:
  - Out-of-order, speculative execution machines
  - Deeper pipelines
  - New technologies (threading, virtualization, security, graphics, ...)
  - Multi-core and SOC designs



5

## Semiconductor Manufacturing Process

Each process generation shrinks the size of features (transistors and wires)

- smaller sizes increase number of chips/wafer, and thus also the yield

$10^{-3}$  m  
 $10^{-6}$  m  
 $10^{-9}$  m  
 $10^{-12}$  m

millimeter  
micrometer  
nanometer  
picometer



source: <http://en.wikipedia.org>

6

## Canonical “Moore’s Law” Chip Designer’s Algorithm

Given constraints

{**Si area, schedule, team, tools, power}**}

How can I best design a chip to

- sell huge unit volumes
- at high yields
- and high profits?

People need a reason to part with their money.  
It's the chip designer's job to give them that reason.

From Robert Colwell's HotChips 2013 presentation.  
Former Chief Architect at Intel and Director, Microsystems Technology Office, DARPA

7

## Moore’s Law

### Exponential improvement trend from 1955 - 2010

- Separation of Concerns
  - Instruction Set Architecture (ISA), Microarchitecture, Functional blocks, Circuits, Placement, Routing, Layout, Silicon Manufacturing, Packaging
    - All could be done by specialists
  - Tools
    - Checked & enabled this separation of concerns
    - CMOS was just plain beautiful technology
      - Simple, small, reliable, fast, low power, high yield...very hard to beat!

From Robert Colwell's HotChips 2013 presentation

8

## Pre-Silicon Design Flow



9

### Future Designers Must Master Chips AND “Neighboring Technologies”

- No more helpful separation of concerns
  - Resilience, thermals, battery life, complexity, validation, performance, schedule, risk...
  - Have to juggle them all at once, *pre-silicon* (but remember i432!)
  - Meanwhile RTL correctness is just as hard as ever
- Must absorb more of overall value proposition
  - Lesson of “Intel Inside”: get buyer to relate perceived value to *your* part of final product
  - Lessons from Apple: iTunes, white earbuds
  - This requires expertise beyond chip/CPU design

From Robert Colwell's HotChips 2013 presentation

10

## Publicized Failures

### Therac-25 radiation therapy machine

- At least six accidents: massive overdoses of radiation.
- Previous models had hardware interlocks, but the Therac-25 depended on software interlocks for safety.

### The Mars Climate Orbiter doesn't orbit (1998)

\$327.6 million project burned up in minutes.

Orbiter approached at the wrong angle due to 2 engineering teams using different units of measurement (pound-force seconds vs metric Newton seconds).

### AMD Further Delays Launches of New APU Products.

Kaveri Launch Slips to March, 2014;  
Carrizo Delayed Even More



Your PC ran into a problem that it couldn't handle, and now it needs to restart.

You can search for little error online (HAL INITIALIZATION FAILED)



### US Safety Board Determines DC Metro Crash Was Failure of Both Track Circuits and Safety Culture

In June 2009, a Washington Metropolitan Area Transit Authority Red Line subway train traveling at a high rate of speed rear-ended a stationary train....Nine people, including the train operator, were killed.



### Glaring FDIV Bug in Pentium (1994/5)

Most famous of the Intel microprocessor bugs.

Caused by an error in a lookup table that was a part of Intel's SRT algorithm that was to be 3-5X faster and more accurate.



French red faces over trains that are 'too wide' Discovered that 2,000 new trains ordered at a cost of 15bn euros (\$20.5bn; £12.1bn) are too wide for regional platforms.

20 May 2014

11



## CPU/SOC design



12

## Complexity

$$A \parallel B \parallel C$$

abc acb bac bca cab cba



13

## Complexity

$$A^2 \parallel B^2 \parallel C^2$$

aabbcc aabcbc aacbcb aacbbc aacbcb aaccbb ababcc abacbc ababcc  
abbacc abbcac abbc ca abcbc abcabc abcabc abccab abccba  
acabbc acabcb acacbb acabbc acbabc acbbca acbbca acbcba  
accabb accbab accbba baabcc baacbc baacbc baaccb babacc babcca  
bacabc bacabc bacabc bacabc bacabc bacabc bacabc bacabc  
bbcaac bbcaac bbccaa bcaa bc aa bcaac bcabac bcabca bcacab bcacba  
bcbaca bcbaca bccaa bccaa bccbaa caabbc caabbc caacbb  
cababc cababc cababc cababc cababc cababc cababc cacabb cacabb  
cbaabc cbaabc cbabac cbabca cbabca cbabca cbbaac cbbaac cbbaaa  
cbcaab cbcaab cbcaab ccaabb ccabba ccabba ccabba cccbbaa cccbbaa

14

## Complexity

A<sup>3</sup> || B<sup>3</sup> || C<sup>3</sup> ?



## The Pentium FDIV bug (1994/5)

- Less than full precision result for some combinations of divisor and dividend when performing FDIV
- 3D graph to right shows a rounding error in the 5<sup>th</sup> significant digit of the function  $x/y$  in the region of  $4195835/3145727$

$$4195833.0 \leq x \leq 4195836.4$$

$$3145725.7 \leq y \leq 3145728.4$$



Larry Hoyle, U. Kansas [http://www.ipsr.ku.edu/staff/lhoyle/pentium\\_fdiv/](http://www.ipsr.ku.edu/staff/lhoyle/pentium_fdiv/) 17

## Fast-forward to Intel i7

- Complete Datapath Verification for all FPU and data processing units.
- Complete Control-path verification of control paths in the EXE (execution) cluster
- Verification of all Assumptions in the Datapath through control invariants

CAV 2009, Kaviola et al.,  
*Replacing Testing with Formal Verification in the Intel® Core™ i7 Processor Execution Engine Validation*

# Design Verification and Validation

- Exhaustive Testing Not Possible

– Example:

- multiply two 32 bit values:  $2^{64}$  possible input sets
- If simulator completes an incredible  $2^{32}$  checks/sec, will take over 100 years to check.

| Seconds       | Approx seconds per approximately |
|---------------|----------------------------------|
| 3600          | hour                             |
| 86400         | day                              |
| 604,800       | week                             |
| 2,628,000     | month                            |
| 31,536,000    | year                             |
| 315,360,000   | decade                           |
| 630,720,000   | score                            |
| 3,153,600,000 | century                          |

19

Aside bonus  
question: Why  
approx seconds?



20

## CPU/SOC design



## Functional Validation: Divide and Conquer



- PreSilicon
  - Architecture
  - Microarchitecture
  - Microcode
  - Formal Verification
  - Emulation
- PostSilicon
  - Functional
  - Compatibility
  - Electrical
- Low level Software
  - Drivers
  - Operating environment
  - Software stack
  - Applications

21

## Pre-silicon Functional Verification



22

## Historic Pre-silicon Functional Validation Mix



[Using Formal Verification to Replace Mainstream Simulation (Smith/Seligman DAC'13)]

23

## Evolving Pre-Silicon Functional Verification



[Using Formal Verification to Replace Mainstream Simulation (Smith/Seligman DAC'13)]

24

# Product Schedule

