



## Rogues Gallery 2022 Updates



Presented by: Jeffrey Young  
CRNCH Rogues Gallery Director  
March 9th, 2022

# Rogues Gallery: A Community Research Infrastructure for Post-Moore Computing



The Rogues Gallery is now an NSF funded post-Moore testbed for CISE researchers and the community

CNS-2016701, \$1.3M over 3 years

Supports deploying:

- Rack-scale Lucata Pathfinder 16 node system
- Neuromorphic accelerators
- Smart networking and 5G equipment
- Backend infrastructure

This grant focuses on **community engagement and post-Moore training**

# Rogues Gallery: Enabled Research Threads

## CISE Enabled Research Pillars

### Sparse/Irregular HPC

- Graph analytics
- Scientific Computing
- Database and Big Data Acceleration



### HW/SW Codesign

- Polyhedral compilation,
- Design of libraries, runtimes, APIs for novel devices
- Benchmarking and characterization

### Machine Learning

- Low-power edge AI
- Autonomous vehicles
- SLAM for robotics
- Dynamic and life-long learning



### Next-generation Networking

- 5G software stacks
- Edge computing services
- In-situ and encrypted data analysis
- Data reduction and line-speed DSP

## Rogues Gallery Hardware and Software Support

- Emu Pathfinder
- FPGA+HBM
- CASPIAN
- Tensor, Streaming Graph APIs
- ARM A64FX

- Emu Pathfinder
- FPGAs + RISC-V
- Optane
- Kokkos, Habanero-C runtimes

- FPAA and CASPIAN
- EMU Pathfinder
- FPGAs
- RASP/TENNLab SW
- Emu Scikit-learn

- Ettus USRP-2947 and Ettus E-320
- Mellanox Bluefield NICs
- FPGAs
- FPAA and CASPIAN

# RG CCRI PIs and Personnel

## Co-PIs



Tom Conte



Ada Gavrilovska



Jennifer Hasler



Rich Vuduc

## Senior Personnel



Alex Daglis



Semir Sarajlic

## Technical Support



Will Powell



Jeff Valdez



Aaron Jezghani (PACE)

# Rogues Gallery – Timeline



2017

2018

2019

2020

2021

# Rogues Gallery – Timeline (2)



# Current Rogues Gallery Stats



The Rogues  
Gallery is a  
***disaggregated  
testbed.***

## Rogues Gallery Users

- ~15 VMs, ~30 servers, numerous boards
- 150-200 users overall; 30+ external users
- 120+ students supported via TechFee associated work in 2020-2022

## Current NSF Deployment Status

- \$660K budgeted for hardware deployment
- ~\$530K spent
- All base HW currently deployed along with Pathfinder nodes
- Outstanding bits: neuromorphic and networking



# RG Achievements

- All proposed base hardware deployed
  - File and scratch servers, IB and 40/100 GE network switches
- Pathfinders, next-generation FPGAs, smartNICs deployed in 2021-2022
  - Lucata worked with GT to run Graph500 on 2 RG Pathfinders and 2 loaner chassis to achieve **#46 on the GreenGraph 500** at scale 30!
- 3 funded techfees deployed Intel Ice Lake nodes, FPGA cluster, Neuromorphic devices (2019-2022)
- RG workflows introduced
  - PACE interactions on Open OnDemand and Slurm deployment
- Most documentation migrated to public presence at <https://gt-crncr-rg.readthedocs.io/>
- Four RG-associated tutorials hosted in 2022 – PEARC, MICRO, Telluride, DAC
- Growing list of associated publications at <https://crnch-rg.cc.gatech.edu/crnch-rg-publications/>



Lucata Pathfinder Demonstrates Breakthrough Graph Analytics Processing Efficiency by Ranking #46 on Green Graph500 Benchmark

Ranks #211 on Graph 500 BFS Benchmark

November 18, 2021 11:09 ET | Source: [Lucata Corporation](#)

NEW YORK, Nov. 18, 2021 (GLOBE NEWSWIRE) -- [Lucata Corporation](#), provider of a next generation computing architecture based on Intel technology for high performance, massively scalable processing of graph analytics, today announced that Lucata Pathfinder ranked #46 on the [Green Graph500](#), which ranks

# Rogues Gallery Workflows



## Key ideas:

1. Novel architectures are “hardware-limited”.
2. Most hardware has a defined path of emulation for functionality with eventual hardware deployment

We build around these limitations with the following:

- Slurm schedule **everything**. If needed, add a front-end small device that can be time-shared.
- Define workflows that utilize virtual machines for emulation and compilation, reserving limited HW for experiments
- Work with PACE and other HPC practitioners to incorporate best practices for running workflows



# Rogues Gallery Workflows (2)



OnDemand provides an integrated, single access point for all of your HPC resources.

## Message of the Day

This is the RG OnDemand Server. From this interface you can submit jobs to the following:

- rg-emu-dev: VM for compiling/simulating Emu Chick code
- karrawangi-login: login node for the Emu Chick
- rg-fpga-dev-<1-3>: VMs for FPGA compilation with Intel or Xilinx tools
- flubberc1-3>: Servers with FPGAs and small TPUs
- brainard: Desktop connected to Zynq and Pi devices
- rg-fpaa-host: A Raspberry Pi that is connected to our FPAA prototype.
- octavius-login: Login node for the 16-node Arm A64FX cluster

Most tools can be found under the /tools/ netshare folder. For more information on specific systems please see the wiki.

powered by  




```
In [8]: PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

In [9]: dataloader = iter(testloader)
images, labels = dataloader.next()
print('GroundTruth: ', '\n'.join('%s' % classes[labels[j]] for j in range(4)))

Out[9]: 
GroundTruth:  cat  ship  ship  plane

In [10]: net = Net()
net.load_state_dict(torch.load(PATH))
Out[10]: <all keys matched successfully>

In [11]: outputs = net(images)

In [12]: _, predicted = torch.max(outputs, 1)
print('Predicted: ', '\n'.join('%s' % classes[predicted[j]] for j in range(4)))
Predicted:  cat  ship  ship  plane

In [13]: correct = 0
for i in range(len(testloader)):
    with torch.no_grad():
        inputs, labels = next(iter(testloader))
        # calculate outputs by running images through the network
        outputs = net(inputs)
        # the class with the highest score is chosen as the predicted class
        _, predicted = torch.max(outputs, 1)
        if predicted[i].item() == labels[i].item():
            correct += 1
print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / len(testloader)))
```

- OpenOnDemand support will allow for GUI- and Jupyter-based jobs to be run from the browser
  - OOD and Slurm support prototyped with PACE RSes – Aaron Jezghani, Kevin Manalo
- We will continue to support SSH and VS Code interfaces



Learn more: <https://github.com/gt-crnch-rg/crnch-ood>

# Rogues Gallery Base Hardware

| Hardware               | Hardware Type    | Notes            |
|------------------------|------------------|------------------|
| AMD Milan Servers      | Server           | Flubber4-7       |
| Intel Ice Lake Servers | Server           | Frozone5-6 (TBD) |
| Penguin Icebreakers    | File server      | Files1/2         |
| Cisco 93108FX          | 1/10 GE Switch   |                  |
| Cisco 9336             | 40/100 GE Switch |                  |
| Penguin Altus XE1212   | Scratch Server   | Speedforce       |
| Intel NUCs             | Host/debug node  | Brainard         |

# Rogues Gallery Area-Specific Updates



# “More Moore”



***Post-Moore computing also includes evolutions to traditional architectures!***

CRNCH recently deployed one of the few open-access Arm A64FX systems as part of the NSF Hive program (OAC-1828187)

- 48 core Arm CPUs with 32 GB of HBM
- Focuses on streaming and low-locality workloads
- First commercial Arm processors with SVE support

FPGAs continue to be important for prototyping novel accelerators

- See the Vortex GPGU as an example of a new RISC-V based FPGA accelerator  
[vortex.cc.gatech.edu](http://vortex.cc.gatech.edu)

# “More Moore” – Deployed HW



| FPGA Hardware          | Hardware Type          | Notes        |
|------------------------|------------------------|--------------|
| Intel Arria/Stratix 10 | Intel FPGA             |              |
| Alveo U50/U280         | HBM-enabled Intel FPGA |              |
| Pynq Z-2/Z-1           | Xilinx FPGA            |              |
| VMK 190                | Xilinx Versal          |              |
| IA-840F                | Intel Agilex FPGA      | TBD          |
| SiFive Unmatched       | RISC-V Dev Board       | johnny-rv5-1 |
| ZCU 106                | Xilinx FPGA            |              |

Learn More:

<https://crnch-rg.cc.gatech.edu/reconfigurable-platforms/>

<https://crnch-rg.cc.gatech.edu/arm-hpc/>

# Near-memory Computing



## Related Work:

**Lucata:** Brian Page, Peter Kogge, "Deluge: Achieving Superior Efficiency, Throughput, and Scalability with Actor Based Streaming on Migrating Threads", HPEC 2021

Brian Page, Peter Kogge, "Scalability of Streaming on Migrating Threads", HPEC 2020

**Optane:** Tony Mason, Thaleia Dimitra Doudali, Margo Seltzer, Ada Gavrilovska, "Unexpected Performance of Intel® Optane™ DC Persistent Memory", CAL 2020

**Sparse data and data movement costs will continue to dominate application concerns for the near future leading to opportunities for near-memory computing**

The Pathfinder-S system was deployed in July 2021

- ~8X the processing elements with faster clocks, improved networking, and software stack
- Bolstered by related NSF projects and efforts like a Kokkos backend for Emu Cilk and GraphBLAS support

Optane PMEM 200 NVDIMMs are currently available in Frozone Ice Lake systems

# Neuromorphic Computing



*Increased energy consumption by new AI algorithms necessitates a shift to more efficient neural-focused architectures*

Development is proceeding for a “large-scale” Field Programmable Analog Array (FPAA) based on designs by Dr. Hasler’s group

- The FPAA provides a mixed analog/digital design platform with open-source tooling
- A multi-chip deployment will enable larger SNNs based on hhNeuron-inspired blocks

Looking to deploy something Caspian-like but this has been delayed due to patent issues.

# Novel Networking



E.g. AR/VR gaming



*Computation will increasingly move into the network as a means to further reduce data movement. Similarly edge computing will shift processing of data to lower-power devices.*

SmartNICs include InnovaFlex-2, BlueField 2 DPU, Alveo U50 cards

Students working with Drs. Gavrilovska and Bhardwaj are developing next-generation “mobile edge computing” and 5G

Ke-Jou Hsu, James, Choncholas, Ketan Bhardwaj, and Ada Gavrilovska, “DNS Does Not Suffice for MEC-CDN”, HotNets ’20

<https://crnch-rg.cc.gatech.edu/next-generation-networking/>

# Future Rogues (Quantum, Reversible)



***Quantum computing education is on a robust growth path while hardware remains expensive and limited. However, there are tremendous opportunities for tools, algorithms, and compilers.***

CRNCH does not officially support quantum hardware, but we do provide access to common toolkits like QISKit, QCOR, AIDES-QC, and JaqalPaq via VMs

- Jupyter notebooks can be accessed remotely via RG!

Future work is focused on software and scheduling development to support novel testbeds like GTRI's ion trap testbed (currently in use for DARPA ONISQ).

# Community Outreach

CRNCH RG is focused on supporting post-Moore computing at GT and in the community via:

Minisymposiums and Birds of Feather – RG has supported symposiums at SIAM PP 2020, SIAM CSE 2019, and BoFs at ISC and SC each year

- Popular Arm BoFs at ISC and SC led to the creation of the Arm HPC User's Group (AHUG) with direct involvement from the RG director
- CRNCH also supports the Advanced Architecture TestBed (AATB) BoF at SC each year since 2018



Tutorials – opportunities for post-Moore HW training



Class Support – Techfee and VIP efforts

# Tutorials

UCX Tutorials Hosted by CRNCH RG Participants in 2021

- ~150 attendees via virtual and in-person sessions
- PEARC - Lucata Pathfinder
  - Hosted by Jeff Young, Jeff Valdez, Will Powell with Lucata
- Telluride - Analog Neuromorphic Tools and Techniques
  - Jennifer Hasler
- HotInterconnects – UCX Tutorial
  - Jeff Young with NVIDIA
- MICRO – Vortex Tutorial
  - Hyesoon Kim
- DAC RISC-V Tutorial
  - Jeff Young with TCL and LBNL



<https://crnch-rg.cc.gatech.edu/tutorials-and-training/>  
See tutorial details at: <https://github.com/gt-crnch-rg/>

# Rogues Gallery TechFee Support



TechFee is an internal proposal process meant to support students for instructional purposes

- Focus on core classes with increasing focus on our “online” classes like OMSCS
- Provides unprecedented exposure to novel hardware. “Does your school have a DGX, Power9, and A64FX?!”

CRNCH is currently supporting a reconfigurable computing initiative (CS3220, 7290, 8803)

- 40 Pynq Z-2 boards with shared /nethome and Slurm-Jupyter notebook scheduling
- Students use ICE to run Vivado then can test on CRNCH Pynq boards using PYNQ overlays.

# Vertically Integrated Projects (VIP) Team

- Undergraduate research opportunity for credit; teams are self-directed with guidance from faculty.
- Current projects:
  - NeuroCar – implement sensing and control using SNNs with Nengo FPGA platform; replicate results of the autonomous GT Rally Car with lower power
  - GraphBLAS for the Pathfinder – Understand and benchmark algorithms with GraphBLAS API on the Pathfinder system
  - Qubit allocation optimization – evaluate techniques using IBM's Q experience and ORNL's XACC and attempt to build a linear systems algorithm approach to test possible solutions
  - No-history branch prediction – sort the register file on the fly to assist with branch prediction and limit security vulnerabilities



# Upcoming Challenges

## Purchasing Remains Challenging

- We are trying to put in orders ASAP for critical pieces

## Start-up Interactions

- New start-ups typically are hesitant to engage without a big capital investment
- We are working on a whitepaper to share with start-ups

## Personnel Limitations

- We need more RTs for CoC in general!
- The CRNCH RG VIP class also needs a secondary instructor
  - Please reach out to Jeff if you are interested or have a senior grad student that can mentor a project.



# How Can You Engage with Rogues Gallery?

## Use RG equipment for research and classes

- Contribute documentation to our growing wiki!
  - Workflows, best practices, etc.
- Suggest new equipment that we can purchase via NSF or techfee
- Share ideas for tutorials and benchmarking that CRNCH RG can help to support.

## Consider “hoteling” lab equipment with CRNCH

- This allows for priority usage by your students as well as access to netscratch, backed up nethome, and high-speed networking.

## Vertically Integrated Projects (VIP) team

- Present a guest lecture at VIP or help mentor a subteam
- Learn more at <https://github.com/gt-crnch-rg/fc-with-rg-vip>



### Rogues Gallery Hardware

The Rogues Gallery testbed is composed of both “virtual” resources and “physical” servers and test platforms. The figure below shows the Rogues Gallery resources at a high level with suggested paths from development VMs to the actual hardware. Discussion of the filesystems you have access to can be found [here](#).

For instance, if you’re able to test your compilation and do debugging on the rg-fpga-dev VMs, you can then use the same code on the physical “Rubber” boxes to test your code with a PCI Express-based FPGA. We currently have three pairs of development VMs that map to physical hardware, as described more in the tables below.



# What would you like to see next for RG?



# Acknowledgments

This work is supported by:

- Hardware donations by Micron, Xilinx, and Intel
- NSF Rogues Gallery (CNS-2016701), NSF “Hive” Project (OAC-1828187), the NSF SuperSTARLU project (OAC-1710371), and IARPA.
- Sandia National Laboratory supports the Rogues Gallery VIP undergraduate research class, and parts of this research have been funded through the Laboratory Directed Research and Development (LDRD) program at Sandia. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.