

# Computer Architectures

Scientific Computing  
Jörn Behrens



# Introduction

## *Why deal with Computer Architectures?*

- Most programs achieve 10-20% of peak performance, why?
- How does a computer work, anyways?
- What makes the performance?



# *Why deal with Computer Architectures?*

- Most programs achieve 10-20% of peak performance, why?
- How does a computer work, anyways?
- What makes the performance?



*Moore's law*



Original:  
every 18 months:  
double components per integrated circuit

every 18 months:  
double computational performance!

# *Moore's law*

Original:

every 18 months:  
double components per integrated circuit



every 18 months:  
double computational performance!

# *Increase in Clockrate?*



# *Parallelism is the Key*



# Basics

## *Von Neumann Architecture*



John von Neumann  
\*1903 ..1957



# *Von Neumann Architecture*



John von Neumann  
\*1903 ..1957

# *Operation via Instructions*



# *How to accelerate?*



What are your ideas?



# Acceleration Features

## Parallelism on the processor level



# *Pipelining*



# RISC vs. CISC

reduce instruction complexity



# *Super-scalar processors*

- One floating point instruction
- One fixed point instruction (or memory reference)
- One condition code manipulation
- A branch instruction



# *Parallel Pipelines*



# Multithreading/Hyperthreading



# Multithreading/Hyperthreading



- a) task switching
- b) physical multi-threading (2 cores)
- c) coarse-grained multithreading
- d) fine-grained multithreading
- e) simultaneous multithreading (hyperthreading)

<http://www.slcentral.com>

# *Vector Units (SIMD)*



# *Parallel Units on Chip*

## Intel Core i7 Processor



# System Architectures

Replicating functional units

Workstation Architecture



Multiprocessor Server Architecture



Latencies and Performance Indicators

| Latency (ns) for various functional units |     |        |       |        |     |        |
|-------------------------------------------|-----|--------|-------|--------|-----|--------|
|                                           | ALU | Branch | Cache | Divide | MUL | Memory |
| Processor                                 | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 10  | 10     |
| MUL                                       | 100 | 100    | 100   | 100    | 100 | 100    |
| Divide                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| Branch                                    | 10  | 10     | 10    | 10     | 10  | 10     |
| Cache                                     | 10  | 10     | 10    | 10     | 10  | 10     |
| Memory                                    | 100 | 100    | 100   | 100    | 100 | 100    |
| ALU                                       | 10  | 10     | 10    | 10     | 1   |        |

# *Worksstation Architecture*



# *Multiprocessor Server Architecture*



# Intel Core Architecture



# *AMD Athlon/Opteron Architecture*



Bis zu sechs Mikrobefehle (ROPs) gleichzeitig kann der Athlon 64/Opteron aus den Decodern an die Scheduler schicken, bezüglich Nachschub ist das Design jedoch insgesamt nur dreifach skalar.

# Multi-processor Architectures

A classification according to Flynn (1972)



MIMD: Distributed Memory Multi-Computer



- Memory and periphery distributed
- No remote memory access
- Example: "Beowulf cluster"

MIMD: Virtual Shared Memory Multi-Processor



- Distributed memory accessible by all procs.
- Non-Uniform memory access (local access faster)
- CC-NUMA: Cache Coherent Non-Uniform Memory Access

MIMD: Shared Memory Multi-Processor



- Memory and periphery accessible by all procs.
- Uniform Memory Access (UMA)
- Network: Bus or Crossbar

# *A classification according to Flynn (1972)*

Very coarse, qualitative categorization of multiplicity of instruction and data streams



Examples:

SISD: Monoprocessors, von-Neumann Architecture

MISD: ? (multithreading)

SIMD: Vector processors

MIMD: multi-processors, clusters, etc.

# **MIMD: Distributed Memory Multi-Computer**



- Memory and periphery distributed
- No remote memory access
- Example: "Beowulf cluster"

# **MIMD: Distributed Memory Multi-Computer**

CP ... *Communication Processor*

NI ... *Network Interface*



- Higher Bandwidth
- Low latency to local memory
- Dedicated message passing
- Custom Design

# *MIMD: Shared Memory Multi-Processor*



- Memory and periphery accessible by all procs.
- Uniform Memory Access (UMA)
- Network: Bus or Crossbar

# **MIMD: NUMA Shared Memory Multi-Processor**



- Memory hierarchy (caches)
- Non-Uniform Memory Access (NUMA)
- Less Network traffic, faster (local) memory access
- Cache coherency required!

# *MIMD: Virtual Shared Memory Multi-Processor*



- Distributed memory accessible by all procs.
- Non-Uniform memory access (local access faster)
- CC-NUMA: Cache Coherent Non-Uniform Memory Access

# Network Architectures

*Topologies of static networks*



*Determining the performance of a network: bisection bandwidth*



# *Topologies of static networks*



3D Cube Connected Cycle (CCC)



2D Torus



Fat Tree



4D Hypercube



$k$ -ary  $n$ -cube ( $k=4$ ,  $n=3$ )

# *Determining the performance of a network: bisection bandwidth*



Ring:

$$B_{\text{ring}} = 2$$



2D Mesh of size ( $n \times m$ ):

$$B_{\text{2D mesh}} = \min(m, n)$$



2D Torus of size ( $n \times m$ ):

$$B_{\text{2D torus}} = 2 \cdot \min(m, n)$$



3D Torus of size ( $l \times n \times m$ ):

$$\begin{aligned} B_{\text{3D torus}} = & 2 \cdot [l + m + n - \max(l, m, n) \\ & - \min(l, m, n)] \cdot \min(l, m, n) \end{aligned}$$

# **Network Topology Characteristics**

| Topology           | # Nodes   | Degree (# ports) | min. path         | bisection width |
|--------------------|-----------|------------------|-------------------|-----------------|
| Ring               | $N$       | 2                | $\frac{N}{2}$     | 2               |
| 2D Torus           | $N = k^2$ | 4                | $2\frac{k}{2}$    | $2k$            |
| Hypercube (d-dim.) | $N = 2^d$ | $d$              | $d$               | $\frac{N}{2}$   |
| Tree               | $N$       | 3                | $2(\log_2 N - 1)$ | 1               |
| $k$ -ary $n$ -cube | $N = k^n$ | $2n$             | $n\frac{k}{2}$    | $2k^{n-1}$      |

## Basics

### Von Neumann Architecture



John von Neumann  
1903 - 1957

## Introduction

### Why deal with Computer Architectures?

- Most programs achieve 10-20% of peak performance, why?
- How does a computer work, anyways?
- What makes the performance?



## Acceleration Features

Parallelism on the processor level



## Computer Architectures

Scientific Computing  
Jörn Behrens



## Network Architectures

### Typeologies of static networks



### Determining the performance of a nation's backbone bandwidth



## Multi-processor Architectures

### A classification according to Open (2010)

Reproduced with permission of the copyright owner

Open, J., 2010. Multi-processor Architectures. In: J. Behrens, ed. Scientific Computing. Berlin, Heidelberg: Springer Berlin Heidelberg.



## System Architectures

Replicating functional units

