

# Exploring the ARM Coherent Mesh Network Topology

Philipp A. Friese, Martin Schulz

Technical University of Munich  
TUM School of Computation, Information and Technology  
Chair of Computer Architecture and Parallel Systems

ARCS Conference  
Potsdam, 16.05.24



# Motivation

# Motivation

The Ampere Altra Max Review: Pushing it to  
128 Cores per Socket

by Andrei Frumusanu on October 7, 2021 8:00 AM EST

AnandTech

# Motivation

The Ampere Altra Max Review: Pushing it to  
128 Cores per Socket

by Andrei Frumusanu on October 7, 2021 8:00 AM EST

AnandTech

**AMPERE GETS OUT IN FRONT OF X86 WITH 192-CORE  
“SIRYN” AMPEREONE**

May 18, 2023 Timothy Prickett Morgan

nextplatform.com

# Motivation

The Ampere Al...  
128 Cores per S

by Andrei Frumusanu on October 7, 2021 8:00 AM EST

AnandTech

**AmpereOne-3 CPU teased: 256 cores, TSMC 3nm process node, PCIe 6.0 support, 12-channel DDR5**



Anthony Garreffa  
@anthony256

Published Apr 29, 2024 6:48 PM CDT

tweaktown.com

**AMPERE GETS OUT IN FRONT OF X86 WITH 192-CORE “SIRYN” AMPEREONE**

May 18, 2023 Timothy Prickett Morgan

nextplatform.com

# Motivation

The Ampere Al...  
128 Cores per S

by Andrei Frumusanu on October 7, 2021 8:00 AM EST

AnandTech

**AmpereOne-3 CPU teased: 256 cores, TSMC 3nm process node, PCIe 6.0 support, 12-channel DDR5**



Anthony Garreffa  
@anthony256

Published Apr 29, 2024

tweaktown.com

AMD's next-gen server chips appear with 192 Zen 5c cores

News

By Matthew Connatser published December 17, 2023

**AMPERE GETS OUT IN FRONT OF Xeon WITH 192-CORE "SIRYN" AMPEREONE**

tomshardware.com

May 18, 2023 Timothy Prickett Morgan

nextplatform.com

# Motivation

The Ampere Al...  
128 Cores per S

by Andrei Frumusanu on October 7, 2021 8:00 AM EST

AnandTech

**AmpereOne-3 CPU teased: 256 cores, TSMC 3nm process node, PCIe 6.0 support, 12-channel DDR5**



Anthony Garreffa  
@anthony256

Published Apr 29, 2024

tweaktown.com

AMD's next-gen server chips appear with 192 Zen 5c cores

News

By Matthew Connatser published December 17, 2023

**AMPERE GETS OUT IN FRONT OF Xeon WITH 192-CORE**

**"SIRYN" AMPEREONE**

May 18, 2023 Timothy Prickett Morgan

nextplatform.com

**Intel Previews Sierra Forest with 288 E-Cores, Announces Granite Rapids-D for 2025 Launch at MWC 2024**

by Gavin Bonshor on February 26, 2024 8:25 AM EST

AnandTech

# Motivation

The Ampere Al...  
128 Cores per S

by Andrei Frumusanu on October 7, 2021 8:00 AM EST

AnandTech

**AmpereOne-3 CPU teased: 256 cores, TSMC 3nm process node, PCIe 6.0 support, 12-channel DDR5**



Anthony Garreffa  
@anthony256

Published Apr 29, 2024

tweaktown.com

AMD's next-gen server chips appear with 192 Zen 5c cores

News

By Matthew Connatser published December 17, 2023

AMPERE GETS OUT IN FRONT OF Xeon WITH 192-CORE

Knowledge of On-chip Network becomes important

nextplatform.com

by Gavin Bonshor on February 26, 2024 8:25 AM EST

AnandTech

# Ampere Altra Max

- 128 Armv8 Cores
- On-chip Network:  
*Coherent Mesh Network*  
(CMN)



© 2024 Phoronix Media

# Coherent Mesh Network



# Coherent Mesh Network



# Coherent Mesh Network



# Coherent Mesh Network



# Coherent Mesh Network



# Coherent Mesh Network



# Deriving Topology Information

Three Steps:

1. Mesh Size
2. Static Components
3. Dynamic Components

# Deriving Topology Information

Three Steps:

1. **Mesh Size**
2. Static Components
3. Dynamic Components

# Deriving Topology Information

Three Steps:

1. **Mesh Size**
2. Static Components
3. Dynamic Components

`perf_event subsystem!`

# Deriving Topology Information

Three Steps:

1. **Mesh Size**
2. Static Components
3. Dynamic Components



# Deriving Topology Information

Three Steps:

1. **Mesh Size**
2. Static Components
3. Dynamic Components

```
% perf stat -e arm_cmn_0/[..],nodeid=(0,1)/ sleep 0  
5,432           arm_cmn_0/[..],nodeid=(0,1)/
```

# Deriving Topology Information

Three Steps:

1. **Mesh Size**
2. Static Components
3. Dynamic Components

```
% perf stat -e arm_cmn_0/[..],nodeid=(0,1)/ sleep 0  
5,432           arm_cmn_0/[..],nodeid=(0,1)/
```

```
% perf stat -e arm_cmn_0/[..],nodeid=(4,2)/ sleep 0  
<not supported>      arm_cmn_0/[..],nodeid=(4,2)/
```

# Deriving Topology Information

Three Steps:

1. **Mesh Size**
2. Static Components
3. Dynamic Components

```
% perf stat -e arm_cmn_0/[..],nodeid=(0,1)/ sleep 0  
5,432           arm_cmn_0/[..],nodeid=(0,1)/
```

```
% perf stat -e arm_cmn_0/[..],nodeid=(4,2)/ sleep 0  
<not supported>      arm_cmn_0/[..],nodeid=(4,2)/
```

Ampere Altra Max: 8x8 Mesh

# Deriving Topology Information

Three Steps:

1. Mesh Size
2. **Static Components**
3. Dynamic Components

# Deriving Topology Information

Three Steps:

1. Mesh Size
2. **Static Components**
3. Dynamic Components

- Memory controllers
- SLC controllers
- PCIe device controllers

# Deriving Topology Information

Three Steps:

1. Mesh Size
2. **Static Components**
3. Dynamic Components

- Memory controllers
- SLC controllers
- PCIe device controllers

```
% perf stat -e arm_cmn_0/[...],event=hnf_slc_eviction,nodeid=(1,0)/ sleep 0  
1,234 arm_cmn_0/[...]
```

# Deriving Topology Information

Three Steps:

1. Mesh Size
2. **Static Components**
3. Dynamic Components

- Memory controllers
- SLC controllers
- PCIe device controllers

```
% perf stat -e arm_cmn_0/[...],event=hnf_slc_eviction,nodeid=(1,0)/ sleep 0  
1,234 arm_cmn_0/[...]
```

```
% perf stat -e arm_cmn_0/[...],event=hnf_slc_eviction,nodeid=(0,0)/ sleep 0  
<not supported> arm_cmn_0/[...]
```

# Deriving Topology Information

Three Steps:

1. Mesh Size
2. Static Components
3. **Dynamic Components**

# Deriving Topology Information

Three Steps:

1. Mesh Size
2. Static Components
3. **Dynamic Components**



# Deriving Topology Information

Three Steps:

1. Mesh Size
2. Static Components
3. **Dynamic Components**



# Deriving Topology Information



# Deriving Topology Information

Benchmark: Cores 60, 42



# CMN Topology of Ampere Altra Max

Crosspoint Cores (M) Memory (C) Cache



# CMN Topology of Ampere Altra Max

Crosspoint

Cores

(M) Memory

(C) Cache



# CMN Topology of Ampere Altra Max

Crosspoint Cores (M) Memory (C) Cache



# Distance to SLC Controllers

 : SLC Controller



# LULESH Benchmark



# LULESH Benchmark



# Conclusion

- CMN topology information extracted
- CPU cores differ in performance
- Topology information exploited to gain performance

# Conclusion

- CMN topology information extracted
- CPU cores differ in performance
- Topology information exploited to gain performance

## Contact

- E-Mail: [philipp.friese@tum.de](mailto:philipp.friese@tum.de)
- Website: [ce.cit.tum.de/caps](http://ce.cit.tum.de/caps)

