

# Open Source HW in 2030

## Why Architects Need It and It Needs Them

Michael Bedford Taylor

UC San Diego

# The Fate of Computing Today

Is determined by a small number of companies...

We have great ideas for how we should compute!



16 nm

I'll get back to you..



# What prevents adoption of our ideas?

1. Our methodology is pragmatic but broken

# What prevents adoption of our ideas?

1. Our methodology is pragmatic but broken  
*performance:*

```
repeat (modify_c_simulator())
until (perf>=10%
      || sim_bug_in_my_favor
      || overtrained_on_my_10_benchmarks
      )
```

# What prevents adoption of our ideas?

1. Our methodology is pragmatic but broken  
performance:

```
repeat (modify_c_simulator())
    until (perf>=10%
        || sim_bug_in_my_favor
        || overtrained_on_my_10_benchmarks
    )
assert(it_would_really_work_in_hw)
```

# What prevents adoption of our ideas?

1. Our methodology is pragmatic but broken  
*performance:*

```
repeat(modify_c_simulator())
until(perf>=10%
      || sim_bug_in_my_favor
      || overtrained_on_my_10_benchmarks
      )
assert(it_would_really_work_in_hw)
```

*power:*

```
assert(we_used_McPat && no_space_to_describe)
```

# What prevents adoption of our ideas?

2. Our stuff works great ... for *our* CPU/GPU microarch  
... but theirs is different

# What prevents adoption of our ideas?

3. We didn't solve all of the important problems

*“Context switching happens .. out of band ..”*

# What prevents adoption of our ideas?

4. “We regret to inform you that your idea was too revolutionary for us to consider as the successor to Core2Duo in our roadmap”

*“This paper entirely rethinks ...*

*Our cycle-accurate trace simulator...”*

# What prevents adoption of our ideas?

## 5. Chicken-and-Egg

*16 nm SoCs need huge volume to amortize costs;*

*your emerging app that needs your accelerator  
is not already in use by many users; too risky  
to dedicate that much die area on iPhone 7*

→ No tech transfer

# What prevents adoption of our ideas?

## 6. The Last Mile

*Your idea is great, but probably only you have the will and patience to adapt it to their system ... and you don't work there.*

# What prevents adoption of our ideas?

7. Your awesome needle in the ISCA/MICRO/... haystack

*Everybody shows good results, but unbeknownst  
to all, yours is actually worth doing!*

# What prevents adoption of our ideas?

8. Smaller and smaller number of commercial architects have less and less time to find a home for our ideas

# Current Tech Transfer Pipeline

```
repeat (modify_c_sim())
    until (perf>=10%
        || sim_bugr
        || overtrained
    )
```



## ASIC Clouds: Specializing the Datacenter

Ikuo Magaki<sup>1</sup>, Mostafa Kharraze<sup>2</sup>, Luisa Vega Gutierrez<sup>3</sup>, and Michael Bedford Taylor<sup>4</sup>

<sup>1</sup>UC San Diego, Toshiaki

<sup>2</sup>UC San Diego

### ABSTRACT

GPU and FPGA-based clouds have already demonstrated the promise of accelerating computing intensive workloads with greatly improved power and performance.

Recently, purpose-built datacenters composed of large arrays of ASIC accelerators, whose purpose is to optimize the execution of specific applications, have become common. These computations, which are becoming increasingly common as the size of datasets grows, are often compute-bound. On the surface, the creation of ASIC clouds have been highly touted as a way to reduce the total cost of ownership (TCO). Surprisingly, however, large-scale ASIC Clouds have already been deployed by a large number of commercial entities to accelerate their datacenter workloads.

We begin with a case study of Bitcoin mining ASIC Clouds, which were first deployed in 2013. We then show that there, we design three more ASIC Clouds, including a YouTube-style video transcoder ASIC Cloud, a large-scale ASIC Cloud for Content Delivery Network (CDN), and a GPU and CPU co-processor ASIC Cloud.

Given an accelerator design, devices Pareto-optimal ASIC clouds can be generated by trading off latency and throughput, and computational fluid dynamics, among others, and then employing clever but brute-force search to find the best jointly to the requirements of the application.

As a result, we can generate thousands of combinations of out-of-order supercomputers in mobile phones and tablets.

The second change is the rise of data centers. In the last ten years, two parallel phase changes in the computational landscape have emerged. The first change has been the shift from the two extremes of cloud and mobile, where increasingly the heavy lifting and data storage are moved to the cloud, and the mobile devices act as datcenters; and interactive portions of applications have moved to the cloud. The second phase change is the rise of out-of-order supercomputers in mobile phones and tablets.

Accordingly, these areas have increasingly required specialized hardware to support them. Recently, researchers and industry have started to examine the conjunction of these two trends. In this paper, we show how they have been demonstrated as viable by Baidu and others who are building them in order to develop distributed neural network

accelerators. In a cloud, clouds have been updated and optimized by Microsoft for Bing [11], by JP Morgan Chase for hedgefund portfolio evaluation [13] and by almost all Wall Street firms for high frequency trading [13]. In these cases, the target application is well known, and the sufficient scale for the targeted application that the upfront development costs are justified. In addition, the total cost of ownership (TCO) and better computational properties. Already, we have seen early examples of customers, such as LinkedIn, using ASICs to accelerate their services [14].

At a single node level, we know that ASICs can offer orders of magnitude better performance than FPGAs and even performance over CPU, GPU, and FPGAs. In this paper, we introduce the concept of ASIC Clouds, which are purpose-built ASIC Clouds are purpose-built datacenters composed of large arrays of ASIC accelerators, whose purpose is to optimize the execution of specific applications. The first ASIC Clouds that are emerging in datacenters today.

ASIC Clouds are not general purpose accelerators, but rather, ASIC Clouds target specific applications.

At the YouTube-style video transcoder, for example, the function, but for many users, or many datasets, for which standalone accelerators have been shown to be effective.

As more and more services are built around the Cloud model, the need for specialized hardware is increasing. For example, Facebook's face recognition algorithms are used on 2 billion uploaded photos a day, each requiring millions of computations per photo. In the medical genomics space, genome-wide association studies, for example, will today there are 20 megawatt facilities in existence, and 40 more are planned. In the financial industry, the total global power budget dedicated to ASIC Clouds, large and small, is projected to be 100 terawatts by 2020, or 500 terawatts.

After Bitcoin, the paper then examines other applications including YouTube-style video transcoding, LinkedIn mining, and Content Delivery Networks.

Finally, the paper concludes with a discussion of the challenges of the ASICs. At the heart of every ASIC Cloud is the challenge of integrating many thousands of accelerators into a single chip. ASICs achieve large reductions in silicon area and energy consumption versus CPUs,

The first two authors contributed equally to this paper.

C sim  
50K LOC

ISCA  
12-pages



Intel  
7-nm  
5 million units

*Maybe we need a few more intermediate points?*

# Proposed Tech Transfer Pipeline



*Switching gears to a different facet of open source...*

# What will the Hardware workforce look like in 15 years?

- Good news: enrollment in undergrad Computer Architecture: 30→400

# What will the Hardware workforce look like in 15 years?

- Good news: enrollment in undergrad Computer Architecture: 30→400
- Bad news: “professor, which chapter of Patterson & Hennessy covers *Apps* ?”

# What will the Hardware workforce look like in 15 years?

- Good news: enrollment in undergrad Computer Architecture: 30→400
- Bad news: “professor, which chapter of Patterson & Hennessy covers **Apps** ?”
- *Students don't want to design hardware at a stodgy old HW company, they want to start the next Instagram!*
- *Attracting the best talent is a serious problem for the vibrancy of our HW industry*

# COMPUTER ORGANIZATION AND DESIGN

THE HARDWARE / SOFTWARE INTERFACE



DAVID A. PATTERSON  
JOHN L. HENNESSY

MK  
MORGAN KAUFMANN



# HW diversity of computing devices is dwindling...



*How can open source revitalize  
the HW field in general?*

Source: Gartner Group, T. Austin

*Can we make hardware design exponentially leaner so we can have more startups exploring more ideas?*

Can we get to a “Minimum Viable Product\*” with a few people years of effort?

Is it possible?

- Most basic version of your product that customers actually pay for/use or in terms of research, show a "real" design



*Under the Pillow of  
Our CS Undergrads...*

# Costs of Latest Nodes Are Skyrocketing



Source: International Business Strategies, T. Austin

# Software Innovation Today



Instagram

Proprietary Code  
500K-->13 people & \$1B

Open Source

Python  
Django  
Memcached  
Postgres/SQL  
Redis  
Apache  
Linux  
GNU \*  
GCC

# Hardware: Where is the Open Source?



Instagram



- Your Secret Sauce
- Closed Source (\$\$)
- ARM A57, A7, M4, M0...
- ARM Interconnect
- IO Pads
- Standard Cells
- DDR Phy
- VCS
- Design Compiler
- IC Compiler
- Spice
- Formality
- Calibre DRC/LVS
- Open Source

# From \$120M to \$5M: Open Source Can Address most of the Cost



Source: International Business Strategies, T. Austin

And going back a few nodes can get us from \$5M to \$500K for a 4X perf. Penalty (post-Dennard scaling)



Source: International Business Strategies, T. Austin

# How can Hardware Design Be More Like Software?

- Open source infrastructure allows us to create systems where we may only have to write 5% of the total code to create an entirely new product. → Leverage, not labor (and not IP \$\$\$)
- Open source Languages and Libraries so we don't have to redesign everything every time. (like STL or Python or Java Libraries)
- Reduce the overhead of creating + testing new designs
  - Open Source CAD, Open Source Packages, Open Source Standard Cells, Open Source Testboards, NO NDA's.
- IAAS clouds allow us to **scale quickly from small companies to large ones** from 1 customer to 1 billion customers → Scaling ideas from the small to the big

# The Open Source HW Vision

Think GNU/Linux, but for everything HW related:

*Open Source CAD Tools (Like GNU)*

VLSI HLS, RTL to GDS ...

PCB Design and Simulation Tools

*Open Source Chip Designs (Like Linux)*

Out-of-order

In-order

GPU

FPGA

*Open Source IP*

*PLLs, I/O, Standard Cells, DRAM Controllers...*

# Emerging open source projects

## Processors

|                         |                                  |
|-------------------------|----------------------------------|
| <b>ISA:</b>             | RISC-V                           |
| <b>In-order:</b>        | Rocket, Pulpino, Leon3, OpenRISC |
| <b>OOO Superscalar:</b> | Boom, Fabscalar                  |
| <b>GPU:</b>             | MIAOW, GPLGPU, Nyuzi             |
| <b>Manycore:</b>        | OpenPiton                        |
| <b>Microcontroller:</b> | OpenMSP430                       |

## CAD Tools (imagine if Linux did not have GCC)

|                               |                           |
|-------------------------------|---------------------------|
| <b>Verilog to GDS:</b>        | Qflow                     |
| <b>Verilog to Gate Level:</b> | Yosys                     |
| <b>Languages:</b>             | Chisel, PyMTL, myHDL, ... |
| <b>FreePDK15:</b>             | Standard Cells            |

## Motherboards

|                     |                      |
|---------------------|----------------------|
| <b>Commercial:</b>  | Facebook OpenCompute |
| <b>Prototyping:</b> | UCSD Basejump        |

# But who will do this work?

We need people who:

- are idealistic
- have lots of free time
- will work for free

Who might that be?

# But who will do this work?

We need people who:

- are idealistic
- have lots of free time
- will work for free

Who might that be?

# Students!

(Remember Linus Torvalds?)

# An Experiment: CSE 190

## CSE 190: The Open Source Hardware Movement with Prof. Michael Taylor:

The open source software movement has blossomed over the last 30 years, and is directly responsible for the current surge in the software industry, where developers can create large startups in which only 5% of the source base is their own code.

Recently, the open source hardware movement has been rapidly gaining ground. In this class, we will study the development of the movement, including progress in open-source processors (RISC-V), open-source GPUs (MIAOW), open-source FPGAs, and open-source libraries ([opencores.org](http://opencores.org)). In this class we will brainstorm about this movement, and students will engage in an open source hardware project of their choice to advance the state-of-the-art in open source hardware development. Prerequisites: A+ or A or A- in CSE 141L or ECE 11, or excellent knowledge of SystemVerilog, or Permission of Instructor.

# CSE 190

First month of class has students presenting on various open source projects and estimating their important and trajectory.

Students then work in teams. To get an A, they needed to have changes accepted to an Open Source Hardware project. (“To GIT you must commit!”)

# Teaching

Search GitHub

Pull requests Issues Gist

+ 



Follow Block or report ⓘ

Overview Repositories Public activity

**Popular repositories**

-  **riscv-boom**  
Berkeley Out-of-Order Machine 
-  **riscv-boom-doc**  
Documentation for the BOOM processor 
-  **rocket-chip**  
Rocket Chip Generator 

**5 contributions in the last year**



Summary of pull requests, issues opened, and commits. [Learn how we count contributions.](#)

Less  More

| Followers | Starred | Following |
|-----------|---------|-----------|
| 4         | 12      | 3         |

s-okai

[sokai@eng.ucsd.edu](mailto:sokai@eng.ucsd.edu)

<https://www.linkedin.com/in/stevenokai/>

Joined on Oct 26, 2013

# Research

*Have your funded students use and commit to open source HW efforts during their research  
... instead of “rolling your own” or using your own proprietary stuff(e.g. Raw)*

The screenshot shows a GitHub profile page for a user named Anuj Rao. At the top, there is a search bar, navigation links for Pull requests, Issues, and Gist, and a sidebar with a '+' icon and a gear icon. Below the header, there is a large profile picture of a young man with dark hair and a beard, wearing a tan jacket over a green polo shirt. To the right of the profile picture, there are tabs for Overview, Repositories, and Public activity, with 'Overview' being the active tab. A green 'Follow' button and a 'Block or report' dropdown are also present. The 'Popular repositories' section lists two repositories: 'rocc-template' and 'trace-debug-submodule', both of which have 0 stars. Below this, a section titled '13 contributions in the last year' displays a grid-based timeline from June of the previous year to May of the current year. Contributions are represented by colored squares: light gray for 'Less', yellow for 'More', and dark green for 'Most'. The timeline shows activity starting in March and April, with a higher density of contributions in April. A legend at the bottom right indicates the contribution scale: 'Less' (light gray), 'More' (yellow), and 'Most' (dark green). On the left side of the main content area, there is a sidebar with the user's name 'Anuj Rao', their GitHub handle 'anujnr', and several contact links: email ('anr044@ucsd.edu'), LinkedIn ('https://www.linkedin.com/in/anu...'), and a note that they joined on Nov 11, 2015. At the very bottom, there is a summary of contribution activity with the text 'Contribution activity' and a dropdown menu set to 'Period: 1 week'.

Search GitHub

Pull requests Issues Gist

+ ⚙️

Follow Block or report ⓘ

Popular repositories

rocc-template 0 ★

trace-debug-submodule 0 ★

13 contributions in the last year

Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May

M W F

Summary of pull requests, issues opened, and commits. [Learn how we count contributions.](#)

Less More

0 Followers 7 Starred 5 Following

Contribution activity

Period: 1 week



# Basejump: A “Base Class” for Open Source HW

# Basejump Skeleton



# Basejump Motherboard



# Click!!



Xilinx Zedboard



# Basejump: Early Adopters



DARPA CRAFT (16nm)



PRINCETON  
UNIVERSITY



ILLINOIS  
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN



Massachusetts  
Institute of  
Technology



UCSD

NSF SaTC Large (Crypto)

Thanks!

