



## PERSPECTIVE AND CASE STUDIES IN COMPUTING WITH PHYSICAL ANALOG SYSTEMS

JOHN PAUL STRACHAN

FORSCHUNGSZENTRUM JÜLICH (PGI-14)

RWTH AACHEN UNIVERSITY, GERMANY

# Outline

- Introduction and viewpoint for physical computing
- Mixed analog-digital, in-memory computing (with and without errors)
- Applications in A.I., Optimization
- Expanding our computing primitives – associative memories and complex neurons

# Range of physical computing approaches under exploration (Not complete list)

Brain-inspired/  
Neuromorphic



Frederic Vester 1978

Quantum



Forschungszentrum Jülich

Future Computing  
paradigm??



Qiang, X., et al. *Nature Photonics* (2018)

Optical/Photonic

... and more....

Analog



Photo: Steven Fine, Tide-predicting machine

Magnetic/Spintronic



M Sharad, et al. *IEEE Trans Nano* (2012)

# Range of physical computing approaches under exploration (Not complete list)

Novel  
Resources  
exploited

## Brain-inspired/ Neuromorphic



Frederic Vester 1978

## Quantum



Forschungszentrum Jülich

- Boson statistics – piling photons into single modes
- Weak interactions – minimal dissipation over long-distances
- Fast (speed of light)

Qiang, X., et al. *Nature Photonics* (2018)

## Optical/Photonic

... and more....

## Analog



Photo: Steven Fine, Tide-predicting machine

## Magnetic/Spintronic



M Sharad, et al. *IEEE Trans Nano* (2012)

Do I have to  
choose only one?

# Brains leverage variable physics, time-scales, energy-scales

## Example

Numbers from:  
*Principles of Neural Design*, Sterling and Laughlin (2015),  
Chapters 5,6



Passive propagation through neuron body as **electrical signal**  
**Speed:** 1 m/s  
**Energy:** ~23 femtoJoules

**Active** signal propagation across axons as **Ionic currents**  
**Speed:** 100 m/s  
**Energy:** ~6000 femtoJoules  
*Fast, reliable, higher energy*

Neurotransmitters move through **chemical diffusion** from vesicles to receptors  
**Speed:** 0.001 m/s  
**Energy:** ~ 2.3 femtoJoules  
*Slow, unreliable (noisy), very low energy*

# We have variety of applications for which we use computers

| Data Heavy (von Neumann bottleneck)         |               | Compute Heavy (No von Neumann bottleneck) |                                 |                                        |             |                  |
|---------------------------------------------|---------------|-------------------------------------------|---------------------------------|----------------------------------------|-------------|------------------|
| Computing Application                       | Compute Heavy | Characteristics                           |                                 |                                        |             |                  |
|                                             |               | Data Heavy                                | Operational intensity FLOP/byte | Communication (sequential vs parallel) | Parallelism |                  |
| Machine learning                            | high          | low                                       | medium                          | high                                   | high        | high             |
| Graph problems                              | low           | high                                      | high                            | high                                   | high        | high             |
| Bayesian inference                          | high          | high                                      | medium                          | medium                                 | medium      | medium           |
| Markov chain                                | high          | high                                      | high                            | high                                   | high        | high             |
| Data Bases (analytics)                      | low           | medium                                    | high                            | high                                   | high        | high             |
| Data Bases (transactions)                   | medium        | medium                                    | medium                          | medium                                 | medium      | medium           |
| Search (indexing problem)                   | high          | high                                      | high                            | high                                   | high        | high             |
| Optimization problems (resource allocation) | high          | high                                      | medium                          | low (worst case)                       | medium      | low (worst case) |
| Scientific Computing                        | high          | low to high                               | low to high                     | medium                                 | high        | high             |
| Finite Element Modelling                    | high          | low                                       | medium                          | medium                                 | high        | high             |
| Email, chat, etc.                           | low           | high                                      | medium                          | low                                    | high        | low              |
| Signal (image) processing                   | high          | high                                      | high                            | low                                    | high        | medium           |

Version of the “No Free Lunch Theorem” applies here:

- It is unlikely a single type of computer is optimal for all applications

# We have variety of applications for which we use computers

## Computing Application

- Machine learning
- Graph problems
- Bayesian inference
- Markov chain
- Data Bases (analytics)
- Data Bases (transactions)
- Search (indexing problem)
- Optimization problems (resource allocation)
- Scientific Computing
- Finite Element Modelling
- Email, chat, etc.
- Signal (image) processing



**Increasingly specialized and diverse computer hardware is the likely future.**

**End of Moore's Law and Dennard Scaling gives no good alternative.**

**This is a good thing:**

- Diverse ecosystem
- Increased efficiency and performance
- Drive breakthroughs in many fields of research

Who will be the next Nvidia?!

# Key challenge to building more efficient hardware

Today: memory operations are far more expensive than computing operations



Over many decades:  
Memory performance grew at 7% / year  
CPU performance grew at 60% / year

Compute is free, memory is expensive!

# Key challenge to building more efficient hardware

Today: memory operations are far more expensive than computing operations



- Bring computation to the memory
- “In-memory computing”



# Key challenge to building more efficient hardware

Today: memory operations are far more expensive than computing operations



Example: *Multiplication, Add, and Read* operations in crossbar memory circuits.  
Leverage programmable resistive devices (memristors), ReRAM, PCM, Spintronic, Ferro-electric, etc



Vector-matrix product performed through Ohm's Law + Kirchhoff's Current Law  
In-memory analog computing: Multiply & Add operation (@8-bit) in a 60x60 array consumes <0.001 pJ per operation

\*Digital conversion not included here, but is in all future examples

# Engineering memristor crossbars for applications

## Integrating CMOS and Memristors



- In-memory analog multiply-accumulate (MAC) operations
- 25-50nm ReRAM/memristors, 180nm CMOS
- Down to 140ns Compute/Read operations, parallel across all arrays, results latched in Sample/Hold
- Multiple chips can be tiled and operate in parallel



Analog programmable,  $>10^5$  Conductance range  
Stable conductances over years

# Example: acceleration of deep neural networks



Arrays of memristors intertwined with computing to eliminate expensive data movement.

Speed-ups 10-100x over GPUs possible, with lower cost

Z. Wang *et al.* Nat. Mach. Intel. **1**, 434 (2019).

# Convolutional neural networks

Z. Wang et al. Nat. Mach. Intel. 1, 434 (2019).



Weight Dense (Unit: S)



Weight  
Conv2

Batch 0

Weight  
Conv1

Batch 0

Training performance



# Scalable and Flexible Architectures Developed for Machine Learning



**Node**

Each “node” consists of multiple tiles with an on-chip network

**Tile**

In-memory ReRAM crossbars (each at 128x128) performing matrix vector multiplications

Digital units for logic, scalar, and vector ops

3-stage pipeline, instruction decoder, and instruction memory

Estimate @22nm CMOS

~250 mm<sup>2</sup>, <50 Watts

46,000 crossbars

100 Trillion Ops/sec (8bit operations)

→ Outperform GPUs in Throughput/Watt and cost



**Core**



Within each “tile” are multiple “cores” with a shared memory for holding data to/from other tiles and to/from cores.

- A Shafiee, et al, ISCA (2016)
- A Nag, et al. IEEE Micro (2018)
- A Ankit. et al, ASPLOS (2019)
- A Ankit. et al, IEEE Trans. on Computers (2020)

# Challenge: Presence of “Noise” in analog computing



## Many sources of errors:

- Interconnecting wires  $\neq 0$  resistance
- Device resistances are finite
- Nonlinear devices  $G = G(V)$
- Johnson/Shot noise sources
- Temperature effects
- Device resistance fluctuations

## Experimental Data:



M. Hu, et. al, Adv. Mater. 2018

F. Cai, et al., Nature Electr 2020



## Two Research Directions:

- Compute with noise
  - Compute despite noise
  - But noise is real
- Find applications that can utilize lower precision and analog noise
  - Find ways to reduce/correct these errors

$$\frac{\text{Number of operations}}{\text{Number of assumed operations}} = \frac{N^2}{N}$$

# Analog Error Correcting Codes (analog ECC)



Novel encoding schemes invented by Prof Ronny Roth, Technion  
Summer visitor

Roth, Ron M. *IEEE Trans. on Info. Theory* (2018)

Roth, Ron M. *IEEE Trans. on Info. Theory* (2020)



ECC for *computation*, works for all inputs

Two different ECC schemes

1) Integer precise computations (2018)

2) Tolerate small analog errors, detect/correct large deviations (2020)



# Demonstrations of Analog Error Correcting Codes (analog ECC)

Test analog-ECC on “Device Failures”



## Procedure:

- Turn random devices far ON (low resistance)
- Apply random inputs
- Measure (out – expected out) with and without analog-ECC

Test analog-ECC on “Noise Injection”



## Procedure:

- Use extra rows to inject noise into select columns
- Apply random inputs
- Measure (out – expected out) with and without analog-ECC



# Challenge: Presence of “Noise” in analog computing



## Many sources of errors:

- Interconnecting wires  $\neq 0$  resistance
- Device resistances are finite
- Nonlinear devices  $G = G(V)$
- Johnson/Shot noise sources
- Temperature effects
- Device resistance fluctuations

## Experimental Data:



Error (Column outputs – Expected outputs)  
(128 rows)

M. Hu, et. al, Adv. Mater. 2018

F. Cai, et al., Nature Electr 2020



- ## Two Research Directions:

  - Find applications that can utilize lower precision and analog noise
  - Find ways to reduce/correct these errors
- C. Li, R. Roth, C. Graves, X. Sheng, J.P. Strachan, IEDM 2020  
Roth, Ron M. IEEE Trans. on Info. Theory (2020)

$$\frac{\text{ions}}{\text{omed}} = \frac{N^2}{N}$$

# Today's hardest and most interesting problems: Optimization

Learning in biology itself is an optimization process; as is the **Training of Artificial Neural Networks**  
**Practical industrial problems** in scheduling, routing, financial portfolios, system design, etc



**Many challenging “non-convex” problems:** k-SAT, Graph Coloring, Traveling Salesman, Mixed-Integer Linear Programming, Knapsack, Weighted MaxCut, ....  
Interesting ones are NP-complete or NP-hard

**Many heuristic “algorithms” and approaches:**

Local Search, Simulated Annealing, Evolutionary Algorithms, Boltzmann machines, Hopfield networks, etc.

**Many physical systems and device implementations:**



**Coupled Oscillators**  
Parihar, Abhinav, et al. *Scientific reports* 7.1 (2017)



**Analog in-memory computing  
And Many Others!**

**Goal: Provide some speed-up and energy reduction for these hard problems**

**P-bits, Magnetic Tunnel Junctions**  
P. W. A. Borders, A. Z. Pervaiz, S. Fukami, K. Y. Camsari, H. Ohno, and S. Datta, *Nature* (2019)

# Our HW approach: in-memory, mixed analog-digital circuits

- Massive connection weights need to be in some compact programmable memory
- Ideally, an analog (multi-bit) tunable Memristive crossbar implements the couplings. Flash, SRAM, and other memory tech are options.
- Additional CMOS circuitry for neurons/spins, updates, and control



Proposed QUBO/Ising/Hopfield solver

## What advantages?

- Memristive crossbars provide **fully-connected, multi-bit, programmable** connectivity
  - 3D integration potential
- Need **noise** for probabilistic sampling
- High **Parallelism**: Can update batches of neurons in parallel, many crossbars running in parallel
- Favorable **energy/latency** trade-off at lower precision
- **Flexibility** to support many heuristics
  - Local Search, WalkSAT, Stochastic and Simulated annealing, Parallel Tempering, etc.

# Our HW approach: in-memory, mixed analog-digital circuits

- Massive connection weights need to be in some compact programmable memory
- Ideally, an analog (multi-bit) tunable Memristive crossbar implements the couplings. Flash, SRAM, and other memory tech are options.
- Additional CMOS circuitry for neurons/spins, updates, and control



## What disadvantages?

- Memristive technologies under development
- Memristor precision and fluctuations could be problematic

Proposed QUBO/Ising/Hopfield solver

# Experimental demonstrations and performance comparisons

Solving NP-hard Max-Cut problems – 60x60 problem



Experimental Memory Pattern

MaxCut



Figure from N. Mohseni, P.L. McMahon, T. Byrnes. *Nature Reviews Physics* (2022)

- [1] F. Cai, S. Kumar, T. Van Vaerenbergh, ... JP Strachan, *Nature Electronics* (2020)
- [2] C. Roques-Carmes, et al. *Nature Comm.* (2020)

# Harder problem: 3SAT

OR      AND

3SAT: find boolean assignment of variables  $x_i$  to satisfy:  $(x_1 \vee x_2 \vee x_3) \wedge (\overline{x_1} \vee x_4 \vee \overline{x_5}) \wedge \dots$

“Hard” problems have many more clauses than variables ( $\sim 4.25x$ ) → many interdependencies



Fault testing a circuit (\*)



SAT formulation

\* from D. Knuth, TAOCP,  
Volume 4, Fascicle 6, 2015

# Harder problem: 3SAT

OR      AND

3SAT: find boolean assignment of variables  $x_i$  to satisfy:  $(x_1 \vee x_2 \vee x_3) \wedge (\overline{x_1} \vee x_4 \vee \overline{x_5}) \wedge \dots$

Convert into Energy/Loss minimization problem

$$\text{Minimize } E = \underbrace{(1 - x_1) * (1 - x_2) * (1 - x_3)}_{\dots} + (x_1) * (1 - x_4) * (x_5) + \dots$$

$$\Rightarrow (1 - x_1 - x_2 - x_3 + x_1 x_2 + x_1 x_3 + x_2 x_3 - x_1 x_2 x_3)$$

Pair-wise 2<sup>nd</sup> order coupling                                    3<sup>rd</sup> order coupling

Ising / QUBO hardware cannot compute 3<sup>rd</sup> order products

Introduce auxiliary variable  $y = x_1 x_2$

$$\Rightarrow E_{QUBO} = (1 - x_1 - x_2 - x_3 + x_1 x_2 + x_1 x_3 + x_2 x_3 - y x_3) + \lambda (x_1 x_2 - 2x_1 y - 2x_2 y + 3y)$$

Need to add **penalty terms** to enforce  $y = x_1 x_2$

Or  $E_{PUBO} = (1 - x_1 - x_2 - x_3 + x_1 x_2 + x_1 x_3 + x_2 x_3 - x_1 x_2 x_3)$  keep 3<sup>rd</sup> order terms, but more complex operations needed!

# Mixed analog-digital design of PUBO and QUBO

PUBO solver



28nm CMOS floorplan

Note: PUBO and QUBO have very different optimal algorithms  
→ The needed circuits were modified for this

PUBO 1/3 area of QUBO

- RRAM (1T1R) area:** Represented by a grey square.
- Analog custom design analog input & output:** Represented by an orange square.
- Digital, std-cell design:** Represented by a blue square.
- Analog custom design with digital interface:** Represented by an orange square with a blue border.

# Performance comparisons of optimization solvers

Previous QUBO memristor solver

Higher order solver

|                                          | Coherent Ising Machine (CIM) <sup>7, 11</sup> | D-Wave 2000Q <sup>8, 10</sup> | sparse Ising Machine (sIM) <sup>9</sup>  |                                        | mem-SO-HNN <sup>4, 14</sup> | Augmented Ising Machine (AIMs) <sup>10</sup> | This Work                                                                                   |                                                     |
|------------------------------------------|-----------------------------------------------|-------------------------------|------------------------------------------|----------------------------------------|-----------------------------|----------------------------------------------|---------------------------------------------------------------------------------------------|-----------------------------------------------------|
|                                          |                                               |                               | FPGA sIM                                 | Nanodevice sIM                         |                             |                                              | WalkSAT/SKC                                                                                 | HO-HNN                                              |
| Spin representation                      | Coherent light                                | Superconducting qubits        | Digital bits                             | CMOS-MTJ p-bit                         | Digital bits                | Analog charge based                          | Digital bits                                                                                |                                                     |
| Coupling representation                  | Coupling matrix in FPGA                       | Flux storage                  | Sparse coupling matrix in FPGA           |                                        | Memristor crossbar          | Custom CMOS coupling cell                    | Memristor crossbar                                                                          |                                                     |
| Connectivity                             | All-to-all                                    | Sparse                        | Sparse                                   | Sparse                                 | All-to-all                  | Sparse                                       | All-to-all                                                                                  |                                                     |
| Dynamics                                 | discrete-time                                 | continuous-time               | discrete-time                            | continuous-time                        | discrete-time               | continuous-time                              | discrete-time                                                                               |                                                     |
| Interaction                              | High order                                    | Second order                  | Second order                             | Second order                           | Second order                | Third order                                  | High order                                                                                  |                                                     |
| Frequency                                | 1GHz                                          | 30MHz                         |                                          | 1GHz                                   | 500 MHz                     | 500 MHz                                      |                                                                                             |                                                     |
| Time-to-Solution (TTS)                   | 388.9us (N=100)                               | 44ms (N=20)                   | * $\langle L \rangle = 2555.09s$ (N=100) | * $\langle L \rangle = 77.77s$ (N=100) | 12.12ms (N=100)             | 73.8us (N=100)                               | 1.8us (N=20)<br>51.8us (N=100)<br>* $\langle L \rangle = 22\mu s$ (N=100)                   | 0.77us (N=20)<br>113.4us (N=100)                    |
| Power                                    | 50W                                           | 25kW                          | 75W                                      | <b>38.7mW</b>                          | 317.2mW                     | 300mW (N=500)                                | <b>13.35mW</b> (N=20)<br>66.4mW (N=100)                                                     | 42.4mW (N=20)<br>209mW (N=100)                      |
| Energy-to-Solution (ETS)                 | 19.445mJ (N=100)                              | 1.1kJ                         | 191.632kJ (N=100)                        | 3J (N=100)                             | 3.84mJ (N=100)              |                                              | <b>24.03nJ</b> (N=20)<br><b>3.43uJ</b> (N=100)<br>* $\langle E \rangle = 1.46\mu J$ (N=100) | 32.6nJ (N=20)<br>23.7uJ (N=100)                     |
| Area                                     | 1 km fibre ring cavity                        | >10m <sup>2</sup> room        |                                          |                                        | 0.0224 mm <sup>2</sup>      | 6.76 mm <sup>2</sup>                         | 0.0214 mm <sup>2</sup> (N=100)                                                              | <b>0.0197 mm<sup>2</sup></b> (N=100)                |
| Solutions per second per watt            | 51.427                                        | $9 \times 10^{-4}$            | $5.21 \times 10^{-6}$                    | 0.33                                   | 260.114                     |                                              | $4.1 \times 10^7$ (N=20)<br>$3 \times 10^5$ (N=100)                                         | $3 \times 10^7$ (N=20)<br>$4.2 \times 10^4$ (N=100) |
| Solutions per second per mm <sup>2</sup> |                                               |                               |                                          |                                        | $3.7 \times 10^3$           | $2 \times 10^3$                              | $9 \times 10^5$                                                                             | $4.5 \times 10^5$                                   |

All solvers benchmarked on hard instances of 3-SAT problems

T. Bhattacharya et al., Nature Communications, 15, 8211 (2024)

Best Time-to-solution

Best Energy-to-solution

Best solutions per second per Watt / mm<sup>2</sup>

# What other powerful computing primitives should we build?



Beyond memristor crossbar, a few computing primitives listed:



My proposal for valuable computing primitives:

- Complex dynamical neuron circuits (beyond integrate and fire neuron)
- Associative memories and Content-addressable memories

# Complex neuron models – trade-off with performance?



Yamazaki, K.; Vo-Ho, V.-K.; Bulsara, D.; Le, N. Spiking Neural Networks and Their Applications: A Review. *Brain Sci.* (2022).

# Easy in analog: Mott memristors in Hodgkin-Huxley circuit

Circuit with two  $\text{VO}_2$  memristors, two batteries, two capacitors, two resistors.



# Mott Memristor as part of Hodgkin-Huxley Neuron



Can store information in structural phase, ionic configuration, etc

Nanoscale in size, <10 nm

1000 times faster than a neuron

1% of the energy per spike

Dark field cross-sectional TEM image of Nb<sub>x</sub>O<sub>y</sub> memristor. The heated region is thermally connected to T<sub>amb</sub> through the effective thermal resistance, R<sub>th</sub>, and thermal capacitance, C<sub>th</sub>.

# Another compute primitive: associative memories

Modern computers use Addressed Memory (RAM): **Input address → Output data**

Memories in brains are very different

**Input data**



**Associative Memory**



**Output / Association**

Sweet,  
Crunchy safe  
to eat



Sweet, Soft,  
safe to eat



Poisonous,  
do not eat!



# Building Addressed Memory versus Associative Memory

## Addressed Memory

### RAM (Random Access Memory)

- **Input:** Address, **Output:** Contents at this address

What is needed to build it:

- 1) Addressing circuitry (periphery)
- 2) Storage



## Associative Memory

### CAM (Content Addressable Memory) and Ternary CAM

- **Input:** Search word (content), **Output:** address of a match  
Ternary CAMs (**TCAM**) have third state: 0,1, and X = 'don't care'

What is needed to build it:

- 1) Storage
- 2) Comparison circuitry (at every word)
- 3) Association circuitry



# Our Analog Content Addressable Memory (aCAM) proposal



# Analog Content Addressable Memory (aCAM)

Our implementation of analog CAM  
6-Transistor-2-Memristor circuit



# Analog Content Addressable Memory (aCAM)

Our implementation of analog CAM  
6-Transistor-2-Memristor circuit



1) Performs inequality test operations  
Is  $V_{low} < \text{Input} < V_{high}$  ?



# Analog Content Addressable Memory (aCAM)

Our implementation of analog CAM  
6-Transistor-2-Memristor circuit



2) Can operate as a multi-bit CAM



Showed up  
to 16 levels  
(4 bits)

# Analog Content Addressable Memory (aCAM)

Our implementation of analog CAM  
6-Transistor-2-Memristor circuit



aCAM cells laid out in an array.  
Allows high dimensional data to be stored as words of “intervals”



# Connection to Hopfield Networks and (dense) Modern Hopfield Networks



- $\xi^1, \xi^2, \xi^3$  are the attractor states
- Regions bounded by the dashed lines are the basins of attraction
- Analog CAM intervals can match (high-dimensional) boundaries of the basins of attraction
- Activates a RAM with the stored (clean) attractor states
- Exponential capacity
- No dynamics needed: single step look-up, then memory activation

# (analog) Associative Memories enable many applications

We have built prototype Associative Memories (180nm CMOS), testing multiple applications



Associative Memories implement **Finite State Machines**

- FSM has applications in Genomics, Network Security, etc
- Put FSM state transitions in a rapid-lookup CAM
- Benchmarked versus state-of-the-art FPGA  
→ 25x improved Throughput/Watt, reduced chip area/cost

C. E. Graves, et al, **NANOARCH** (2018);  
C. E. Graves, et al, **ICRC** (2018);  
C. E. Graves, et al, **IEEE TNano**, (2019)  
C. E. Graves, et al., **Adv. Mater** (2020)



**Machine Learning applications:** Decision Trees, Random Forests

Tested on traffic-sign classification task

- **914x higher Throughput** (Decisions/second) over digital ASIC
- **15x lower Energy per Decision** over digital ASIC

G. Pedretti, C. Graves, S. Serebryakov, R. Mao, X. Sheng, M. Foltin, C. Li, J.P. Strachan, *Nature Communications* (2021)

**Few-shot Learning application:**

Mao, Ruibin, et al. "Experimentally realized memristive memory augmented neural network." *Nature Communications* (2022)

# Modified analog CAM for Transformers

## Combine

- Memristor crossbar arrays for Feed-Forward Network Layers (~75% computations)
- Modified Dynamic aCAM based Attention



Nathan Leroux



Paul Manea

## Feed-forward Linear Layers

Compute in Memristor device arrays  
Offers low-energy read



## Attention layers

Compute in Capacitive gain-cell arrays  
Lower energy & fast re-programming



# Modified analog CAM for Transformers

## Combine

- Memristor crossbar arrays for Feed-Forward Network Layers (~75% computations)
- Modified Dynamic aCAM based Attention

Deployed on GPT2 sized LLMs (>1 Billion)

Retrained LLM with Spice-based circuit characteristics

Full accuracy reached

Inference Latency – 65ns, 6.1 nJ



Nathan Leroux



Paul Manea



# Thank you!

## Jülich / RWTH Aachen

Paul Manea

Nathan Leroux

Mohammad Hizzani

Arne Heittmann

Ming-Jay Yang

Emre Neftci

Dmitrii Dobrynin

Jan Finkbeiner

Sebastian Siegel

Chirag Sudarshan

## UCSB

Dima Strukov

George Hutchinson

Tinish Battacharya

## HP Labs

Giacomo Pedretti

Masoud Mohseni

Fabian Bohm

Thomas Van Vaerenbergh

Jim Ignowski

Ray Beausoleil

## Key Collaborators

Can Li (Univ HK)

Catherine Graves (Google DeepMind)

J. Joshua Yang (USC)

Qiangfei Xia (UMass Amherst)

Wei Lu (U Michigan)

Shimeng Yu (Georgia Tech)

