

## Hardware for AI and ML: an Overview

Portland State University  
 Department of Electrical and Computer Engineering (ECE)  
[www.teuscher-lab.com](http://www.teuscher-lab.com)  
[teuscher@pdx.edu](mailto:teuscher@pdx.edu)



teuscher•Lab  
[teuscher-lab.com](http://teuscher-lab.com)

Portland State  
 UNIVERSITY

## What is a computer?

teuscher•Lab  
[teuscher-lab.com](http://teuscher-lab.com)

Portland State  
 UNIVERSITY

### What is computation? (1)



teuscher•Lab  
[teuscher-lab.com](http://teuscher-lab.com)

Portland State  
 UNIVERSITY

### What is computation? (2)



teuscher•Lab  
[teuscher-lab.com](http://teuscher-lab.com)

Portland State  
 UNIVERSITY

### What is computation? (3)



teuscher•Lab  
[teuscher-lab.com](http://teuscher-lab.com)

Portland State  
 UNIVERSITY

### What is computation? (4)



teuscher•Lab  
[teuscher-lab.com](http://teuscher-lab.com)

Portland State  
 UNIVERSITY



teuscher•Lab  
teuscher-lab.com

Portland State UNIVERSITY



teuscher•Lab  
teuscher-lab.com

Portland State UNIVERSITY



teuscher•Lab  
teuscher-lab.com

Portland State UNIVERSITY



teuscher•Lab  
teuscher-lab.com

Portland State UNIVERSITY



teuscher•Lab  
teuscher-lab.com

Portland State UNIVERSITY



teuscher•Lab  
teuscher-lab.com

Portland State UNIVERSITY

## Intrinsic vs designed computation (4)

LETTERS  
PUBLISHED ONLINE 20 SEPTEMBER 2015 | DOI: 10.1038/NNANO.2015.207

nature  
nanotechnology

Evolution of a designless nanoparticle network into reconfigurable Boolean logic

S. K. Bose<sup>a</sup>, C. P. Lawrence<sup>b</sup>, Z. Liu<sup>c</sup>, K. S. Makarewicz<sup>c</sup>, R. M. J. van Damme<sup>c</sup>, H. J. Broersma<sup>c</sup> and W. G. van der Wiel<sup>a\*</sup>

(Bose et al., 2015, <https://doi.org/10.1038/nnano.2015.207>)



teuscher•Lab  
teuscher-lab.com

Portland State  
UNIVERSITY

What kind of computing machinery would an extraterrestrial build?



teuscher•Lab  
teuscher-lab.com

Portland State  
UNIVERSITY

## The goals and drivers have changed



teuscher•Lab  
teuscher-lab.com

Portland State  
UNIVERSITY

## The goals and drivers have changed



<https://royalsocietypublishing.org/doi/10.1098/rsta.2019.0061>

teuscher•Lab  
teuscher-lab.com

Portland State  
UNIVERSITY

## AI vs ML

- Machine Learning (ML) is one technique used to achieve Artificial Intelligence (AI).
- All ML is AI, but not all AI is ML.
- AI is the destination (e.g., intelligent machines), while ML is one path to get there (learning from data).
- Artificial Intelligence (AI): Broader concept encompassing machines that can perform tasks requiring human intelligence. Ultimate goal: Artificial General Intelligence (AGI)
- Machine Learning (ML): Subset of AI focused specifically on algorithms that improve through experience. Example: neural networks

teuscher•Lab  
teuscher-lab.com

Portland State  
UNIVERSITY

## AI vs ML



Source: Tignis

teuscher•Lab  
teuscher-lab.com

Portland State  
UNIVERSITY



## A very brief history of AI



## Special types of neural networks



$$F(x) = a_0 + a_1(x - x_0) + a_2(x - x_0)^2 + \dots + a_n(x - x_0)^n + \dots,$$


Figure 1.17 shows how a Fourier series can be implemented as a neural network. If the function  $F(x)$  is to be developed as a Fourier series it has the form

$$F(x) = \sum_{i=0}^{\infty} (a_i \cos(ix) + b_i \sin(ix)). \quad (1.2)$$

## A very brief history of AI with deep learning



## An abstract neuron



## A neural network



## Types of neural networks





## Hardware for AI/ML



Fig. 19 A comparison of MAC operations performed on conventional and neural computing units.

## Hardware for AI/ML

| Processor                    | Power Consumption | Strengths                                                                                                                                                         | Limitations                                                                                                       |
|------------------------------|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| CPU                          | High              | <ul style="list-style-type: none"> <li>Flexible</li> <li>General-purpose processing</li> <li>Complex instructions and tasks</li> <li>System management</li> </ul> | <ul style="list-style-type: none"> <li>Possible memory access bottlenecks</li> <li>Few cores (4-16)</li> </ul>    |
| GPU                          | High              | <ul style="list-style-type: none"> <li>Parallel cores (~1000s of cores)</li> <li>High Performance AI processing</li> </ul>                                        | <ul style="list-style-type: none"> <li>Power consumption</li> <li>Large footprint</li> </ul>                      |
| FPGA                         | Medium            | <ul style="list-style-type: none"> <li>Configurable logic gates</li> <li>Flexible</li> <li>In-field re-programmability</li> </ul>                                 | <ul style="list-style-type: none"> <li>Programming complexity</li> </ul>                                          |
| ASIC                         | Low               | <ul style="list-style-type: none"> <li>Custom logic designed with libraries</li> <li>Optimized for computing</li> <li>Small footprint</li> </ul>                  | <ul style="list-style-type: none"> <li>Fixed function</li> <li>Expensive custom design</li> </ul>                 |
| Vision Processing Unit (VPU) | Ultra-low         | <ul style="list-style-type: none"> <li>Dedicated image and vision co-processor</li> <li>Small footprint</li> </ul>                                                | <ul style="list-style-type: none"> <li>Limited dataset and batch size</li> <li>Limited network support</li> </ul> |
| Tensor Processing Unit (TPU) | Low to medium     | <ul style="list-style-type: none"> <li>Specialized tool support</li> <li>Optimized for TensorFlow</li> </ul>                                                      | <ul style="list-style-type: none"> <li>Proprietary design</li> <li>Limited framework support</li> </ul>           |

## Hardware for AI/ML



## Co-design & LLM



## (HW/SW) Co-design

- Classical approach:** in college, different technologies are taught in different classes.
- Hardware-software co-design:**
  - Started in the 1990s.
  - Its core idea is the concurrent designs of hardware and software components of complex electronic systems.
- More broadly:**
  - The concurrent design of hardware and software across all layers of the compute stack.
- What are possible metrics and trade-offs?**

## Deep co-design across the entire stack





## Food for thought

- Why is executing a neural network on a general purpose computer inefficient?
- What circuitry/processor would you design to find the shortest path in large graphs?
  - What if the graph is dense?
  - What if the graph is sparse?
- What circuitry/processor would you design to recognize QR codes for an embedded camera?
  - Recognition needs to be fast and low-power.