

# Experiences from 5 years of running a production Arm-based supercomputer

Prof.  
Simon McIntosh-Smith  
AHUG Managing Director  
University of Bristol, UK

@simonmcs



<http://uob-hpc.github.io>

# Some history...



- I first discussed HPC with Arm at SC 2009
  - Gave them a 10-point plan for how to target HPC in the future:
    - Add 64-bit addressing and floating point, ECC, support for Fortran, better compilers, math libraries, exhibit at SC and ISC etc...
- Joined the Mont-Blanc 2 FP7 EU project in 2013
  - Led by Barcelona Supercomputing Center
  - Demonstrated that Arm-based Samsung smart-phone processors (Exynos) could be made to run simple HPC programs
- Broadcom/Marvell's announcement of Vulkan/ThunderX2 in 2015 convinced me the time was right to try and build a real, **production** Arm-based supercomputer



# 'Isambard', a UK Tier-2 HPC service from GW4

## Named in honour of Isambard Kingdom Brunel



# Isambard 1 – 2017

- Isambard 1 was the 1st production Arm-based HPC service in the world
  - Prototype service started Oct 2017
  - Production began Spring 2018
- Funded by £3.0M from EPSRC
- 10,752 Armv8 cores
- 118 nodes x 2 sockets x 32 cores
- Marvell ThunderX2 32-core @ 2.5GHz
- Cray XC50 ‘Scout’ form factor
- High-speed Aries interconnect
- Cray HPC optimised software stack



Isambard hosted by the Met Office in Exeter, UK





**EPSRC**



1<sup>ST</sup> ISAMBARD HACKATHON - BRISTOL  
NOVEMBER 2ND & 3RD 2017

**arm**



**ETH** Zürich



**EPSRC**



**STOKING  
THE FIRE**  
2ND ISAMBARD HACKATHON-BRISTOL  
MARCH 19TH & 20TH 2018

**arm**



**Open $\nabla$ CFD®**



UNIVERSITY OF  
**Southampton**



**ETH** Zürich



UNIVERSITY  
OF VIENNA

OpenVCFD®



UNIVERSITY OF  
Southampton



# Some of the codes ported using Isambard

- Focused on the most heavily used codes on the UK national HPC service, Archer:
  - **VASP**, **CASTEP**, **GROMACS**, **CP2K**, **UM**, HYDRA, **NAMD**, **Oasis**, **SBLI**, **NEMO**
  - Note: 8 of these 10 codes are written in FORTRAN
- Additional important codes ported in the early days:
  - **OpenFOAM**, **OpenIFS**, WRF, CASINO, LAMMPS, ...
- **RED** = codes at the first hackathon, **BLUE** = codes at the second hackathon

# Isambard 2 – 2020

- Isambard was highly successful, and won £4.6M in follow-on funding in 2020
- Doubled to 21,504 cores ThunderX2
- Added 3,456 core Fujitsu A64fx system (72 nodes)
- Best paper winner at CUG 2019
- Includes a “Multi Architecture Comparison System (MACS) ”
  - Adds interesting CPUs and GPUs from all the main vendors
  - Enables rigorous comparisons



A performance analysis of the first generation of HPC-optimized Arm processors,  
S. McIntosh-Smith, J. Price, T. Deakin & A. Poenaru, CC:PE, Feb 2019.

**SPECIAL ISSUE PAPER**

# A performance analysis of the first generation of HPC-optimized Arm processors

Simon McIntosh-Smith<sup>ID</sup> | James Price | Tom Deakin<sup>ID</sup> | Andrei Poenaru

High Performance Computing Research Group, Department of Computer Science, University of Bristol, Bristol, UK

**Correspondence**

Simon McIntosh-Smith, High Performance Computing Research Group, Department of Computer Science, University of Bristol, Tyndall Avenue, Bristol BS8 1TH, UK.  
Email: S.McIntosh-Smith@bristol.ac.uk

**Funding information**

Office of Science of the U.S. Department of Energy, Grant/Award Number: DE-AC05-00OR22725; Engineering and Physical Sciences Research Council, Grant/Award Number: EP/P020224/1

## Summary

In this paper, we present performance results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimized specifically for HPC. Isambard is the first Cray XC50 “Scout” system, combining Cavium ThunderX2 Arm-based CPUs with Cray’s Aries interconnect. The full Isambard system will be delivered in the summer of 2018, when it will contain over 10 000 Arm cores. In this work, we present node-level performance results from eight early-access nodes that were upgraded to B0 beta silicon in March 2018. We present node-level benchmark results comparing ThunderX2 with mainstream CPUs, including Intel Skylake and Broadwell, as well as Xeon Phi. We focus on a range of applications and mini-apps important to the UK national HPC service, ARCHER, as well as to the Isambard project partners and the wider HPC community. We also compare performance across three major software toolchains available for Arm: Cray’s CCE, Arm’s version of Clang/Flang/LLVM, and GNU.

Isambard



University of  
**BRISTOL**

**GW4**



Isambard

Engineering and  
Physical Sciences  
Research Council

GW4

UNIVERSITY OF  
BATH

University of  
BRISTOL

CARDIFF  
UNIVERSITY

PRIFYSGOL  
CAERDYN

UNIVERSITY OF  
EXETER



arm

MARVELL

CRAY



CRAY







## Isambard 2's A64fx system

- HPE Apollo chassis
- 72 nodes, 3,456 cores
- Infiniband interconnect
- Fujitsu, Cray, Arm and GNU compilers
- First ever public SVE tutorial ran on this system at SC20



# FULL STEAM AHEAD

3RD ISAMBARD HACKATHON - ONLINE  
MARCH 23RD & 24TH 2021

arm

CRAY®  
a Hewlett Packard Enterprise company

FUJITSU

EPSRC  
Engineering and  
Physical Sciences  
Research Council

# ress for Arm in HPC

The collage consists of four overlapping screenshots from different websites:

- Top Left:** A screenshot from NextPlatform.com. The main headline reads "ARM Benchmarks Show HPC Ripe for Processor Shakeup". Below it, another article headline says "Isambard 2 at UK Met Office to be largest Arm supercomputer in Europe".
- Top Right:** A screenshot from top500.org. The main headline is "ARM Benchmarks Show HPC Ripe for Processor Shakeup". Below it, another article headline says "Isambard 2 at UK Met Office to be largest Arm supercomputer in Europe".
- Bottom Left:** A screenshot from Scientific.com. The main headline is "ARM Benchmarks Show HPC Ripe for Processor Shakeup". Below it, another article headline says "Isambard 2 at UK Met Office to be largest Arm supercomputer in Europe".
- Bottom Right:** A screenshot from InsideHPC.com. The main headline is "BullSequana XH3000 Unprecedented global efficiency at scale for accelerated workload!". Below it, another article headline says "Isambard 2 at UK Met Office to be largest Arm supercomputer in Europe".

Each screenshot includes the website's header, navigation bar, and footer.

# Just some of Isambard's achievements

- **Nearly 800 users** and £7.7M of UKRI funding so far
- Delivered **around 800M Arm core hours to date**, 20M per month
- Hundreds of scientists and engineers **trained on Arm in HPC**
- Dozens of **hands-on tutorials and hackathons** (SC, ISC, AHUG...)
- **Dozens of HPC codes ported to Arm** for the first time on Isambard
- **Best paper award** at CUG 2019
- World's first hands-on Arm tutorial on production system (SC18)
- **World's first open SVE tutorial on real hardware** (SC20)
- Made significant contributions to the quality and robustness of the main **Arm software toolchains**: LLVM, GNU, Cray, Fujitsu

# Lessons learned (so far)

- All our technology was new so was often **late**
- Running an Arm-based production system is much like running any other, especially if you partner with a system vendor with a **high-quality software stack** (e.g. Cray/HPE)
- The vast majority of codes just **recompile and run with no changes**
- Users fall into 2 categories:
  1. Those who want to try Isambard because it's Arm and different (portability, CI etc.)
  2. Those who just want the CPU cycles, and don't care that it's Arm
- The system has been incredibly **stable**, nearly 100% uptime since summer 2018
- Our small A64fx system hasn't been as popular as we'd hoped
- Being "**different**" does deter some potential users  
→ **need to do more advertising, promotion, education etc.**

# Areas we found needed more work

The **Python and R community** found it harder to use Isambard than our other users

- X86 vendors provide **optimized, pre-rolled Python and R binaries**
- Isambard users were having to **build from source**
- This is especially hard for R, which has tens of thousands of interdependent packages
- We've been running an Isambard project to address this
  - Working to upstream our modifications ASAP

# Isambard 3



- Isambard is considered highly successful by our funders, EPSRC – *novel, high impact, good user feedback* etc.
- Invited by UKRI/EPSRC to develop Isambard 3 in 2023
- £10M CAPEX funding, 4 year project
- With new partner NVIDIA, Isambard 3 will be one of the first supercomputers based on their new ‘Grace’ Arm CPUs
- 55,000+ cores, 2-3 PetaFLOP/s, one of the fastest in the UK
- Isambard 3 will have at least **5-6 times the performance** of the current Isambard 2 system, while being **6-7 times more energy efficient**

# Isambard 3 NVIDIA ‘Grace’ CPU superchip



*Competitive with best-in-class CPUs in 2023.*

This is the first time that Isambard’s  
Arm processors will come from a  
mainstream HPC chip vendor.

# Isambard 3 @ the National Composites Centre



**National Composites Centre, Bristol UK.**  
Significant room for expansion to Exascale.



All of Isambard 3 will fit in a single, energy efficient Modular Data Centre (MDC). Easy to scale up in an agile manner.

# Isambard case study: molecular simulations of factors behind Parkinson's and osteoporosis

- Dr Richard Sessions and Dr Debbie Shoemark at the University of Bristol have been running **molecular level simulations** on Isambard to understand the mechanisms behind **Parkinson's disease**, and to find ways to treat **osteoporosis**
- Their simulations on Isambard have shown how the alpha-synuclein protein can start to clump together in the human brain, a key feature of Parkinson's disease
- Other simulations have investigated a protein involved in bone homeostasis, which is the maintenance of bone density. This work is leading to potential drug therapies to treat osteoporosis, i.e. low bone density. **Required performing millions of “virtual” drug-docking operations at the molecular level**



Simulations showing how the alpha-synuclein protein can start to clump together in the human brain.

# GW4 Isambard summary

- The **GW4 Isambard service** has earned an international reputation for **excellence and innovation**
- Our funders, EPSRC/UKRI, are investing a significantly increased amount to **build on Arm expertise in the UK**
- The new service will be one of the most **energy efficient and low carbon in the world**, 5-6X better than Isambard 2
- Running an Arm-based HPC service was much more straightforward than we expected
- **Most of the remaining challenge is perception**

# Community building: Arm HPC User Group



- The Arm user community has grown significantly since Isambard started in 2017
- The Arm HPC User Group (AHUG) was founded in the last few years
- **Call to ACTION: become part of the community!!!**

<https://a-hug.org>