

# A Gentle Introduction to High Performance Computing (HPC)

Marwan Abdellah  
Blue Brain Project, EPFL

December 2012

# Agenda

HPC is About ...

Injecting Concepts  
Injecting Terminology

HPC Applications

Computer Architecture

Parallel Paradigms

Heterogeneous Computing



HPC MULTIDISCIPLINARITY

# HIGH PERFORMANCE COMPUTING INTERDISCIPLINARY TOPIC



NETWORKING

COMPUTER  
PROGRAMMING

DYNAMIC  
PROGRAMMING

OTHERS ...

COMPUTER  
ARCHITECTURE

OPERATING  
SYSTEMS

ALGORITHMS



# APPROACH

HPC IS A



A photograph of a man in a white shirt and tie giving a presentation to an audience. He is standing on the right side of the frame, facing left towards a large projection screen. The screen displays the text "Image Processing Task" in white. Five audience members are visible from behind, seated in rows, looking at the presentation.

# Image Processing Task

IMAGE PROCESSING

NOISY IMAGE



IMAGE DENOISING ALGORITHM

# CONVOLVE FILTER WITH IMAGE

IMAGE PROCESSING



IMAGE PROCESSING



RESULT

ONE MORE COLORING IMAGE

# IMAGE PROCESSING



# IMAGE PROCESSING



# IMAGE PROCESSING



ALGORITHM IMPLEMENTED  
SUCCESSFULLY

BUT

TIME NEEDE TO SOLVE  
PROBLEM

20 MINs



# IDEA





ALGORITHM IMPLEMENTED  
SUCCESSFULLY

AND  
TIME NEEDE TO SOLVE  
PROBLEM

~~25~~  
20  
4 MINs



# WRAP UP

PROBLEM IMAGE PROCESSING  
SUCCESSFUL SOLUTION

TIME CRITICAL FACTOR  
PERFORMANCE SO BAD

SOME GREAT IDEA + CONCEPTS  
RESOURCES AFFORDABLE

IDEALLY SPEED UP  $\approx 4X$  \*AMDAHL's

# HIGH PERFORMANCE COMPUTING



TO USE COMPUTER CLUSTERS  
OR **COLLECTION** OF RESOURCES  
TO SOLVE **ADVANCED** PROBLEM  
THAT MAY **TAKE TIME** ON SINGLE

# HPC IS ABOUT

RESOURCES



HPC

CONCEPTS

TECHNIQUES

THINKING ...

PEOPLE

TOOLS

# HPC APPLICATIONS

HPC APPLICATION VARIES IN DIFFERENT  
DOMAINS & ARENAS

SCIENTIFIC  
ACADEMIC  
BUSINNESS

# PHYSICS SIMULATION





Forecast chart (T+12)  
Valid 1200 UTC Mon 23 Mar 2009

# WEATHER PREDICTION



# MEDICAL IMAGING



TO UNDERSTAND HPC  
WE MUST FIND INTERFACE TO GET ITS

CONCEPTS  
TERMINOLOGY

IDEAS

RELEVANTs

COMPUTER DESIGN

# COMPUTER DESIGN

ALL COMPUTERS  
MORE OR LESS  
BASED ON  
SAME BASIC DESIGN

**VON NEUMANN  
ARCHITECTURE**

# MODEL FOR DESIGNING AND BUILDING COMPUTERS

## THREE CHARACTERISTICS

COMPUTER CONSISTS OF  
**FOUR MAIN**  
COMPONENTS

MEMORY  
ALU  
CONTROL UNIT  
INPUT / OUTPUT

PROGRAM IS STORED  
IN **MEMORY**  
DURING EXECUTION

VON NEUMANN ARCHITECTURE

PROGRAM INSTRUCTIONS  
EXECUTED  
**SEQUENTIALLY**

# VON NEUMANN ARCHITECTURE



THEN  
ANY PROBLEM FINALLY  
TRANSLATED INTO SOME  
**INSTRUCTIONS**

THE MORE YOU **EXECUTE** THEM  
THE MORE YOU **PERFORM**

GORDON  
MOORE

NUMBER OF  
**TRANSISTORS**  
ON A CHIP  
WILL **DOUBLE**  
ABOUT EVERY  
**TWO YEARS**

**MOORE'S LAW**

# MOORE'S LAW

COMPUTER SPEED  
ROUGHLY  
**PROPORTIONAL** TO  
NUMBER OF  
TRANSISTORS PER UNIT AREA

COMPUTER SPEED  
WILL DOUBLE  
ABOUT EVERY TWO YEARS

# MOORE'S LAW

Million Instructions Per Second

PROCESSING POWER (MIPS)  
INCREASED  
BECAUSE  
**TRANSISTOR  
DENSITY**  
INCREASED

# MOORE'S LAW



# LIMITATIONS

- \* PHYSICAL LIMITS
- \* POWER CONSUMPTION
- \* HEAT DISSIPATION
- \* Some Others

HARDWARE SPEED IS  
**LIMITED**

# LIMITATIONS

- \* PHYSICAL LIMITS
- \* POWER CONSUMPTION
- \* HEAT DISSIPATION
- \* Some Others

HARDWARE SPEED IS  
**LIMITED**

# IDEA

MOORE'S LAW

SINGLE PROCESSING UNIT !

CAN WE USE

MULTIPLE

PROCESSING UNITS

MULTIPLE

IS THE SPARK FOR HPC

# MULTIPLE PROCESSING UNITS



MULTIPLE PROCESSING UNITS



# MacBook MACHINE

Intel Core 2 Duo

**2 CORES**

4 Giga Byte Ram

Cost \$1000

CAN SOLVE SOME  
PROBLEM IN  
**8 HOURS**

SING UNITS



MUL

# CRAY C90 MACHINE

8 VECTOR PROCESSORS

32 GIGA BYTE RAM  
Cost \$26 MILLION

CAN SOLVE  
SAME PROBLEM IN  
0.002 SECNODS

A perspective view of a row of server racks in a data center. The racks are dark grey or black with silver horizontal accents at the top and bottom. The floor is a light-colored tiled surface. In the center of the image, the words "HAVE A LOOK AT" are stacked vertically above the word "TOP 500", all in large, bold, yellow capital letters.

HAVE A LOOK AT  
**TOP 500**

# PROBLEM DECOMPOSITION

# PROBLEM





**GRANULARITY**



# Problem



CU

$P_1$

$P_2$

$P_3$



# TERMINOLOGY

SO FAR

TASK

PARALLEL TASK

SERIAL EXECUTION

PARALLEL (CONCURRENT) EXECUTION

SINGLE VS. MULTICORE PROCESSORS

GRANULARITY

OBSERVED SPEEDUP

# PARADIGMS

**DATA PARALLEL**  
**DOMIAN DECOMPOSITION**

**TASK PARALLEL**  
**FUNCTIONAL DECOMPOSITION**

SIMD or SPMD

EXECUTING  
SAME  
INSTRUCTION  
OR PROGRAM (KERNEL)  
ON DIFFERENT DATA

# DATA PARALLEL PARADIGM

LOADING DATA

DATA  
PARTITIONING

SENDING DATA

KERNEL EXECUTION

GATHERING RESULTS

DATA PARALLEL PARADIGM

# SERIAL SOLUTION



**DATA PARALLEL PARADIGM**

# SERIAL SOLUTION



DATA PARALLEL PARADIGM

# PARALLEL SOLUTION



PARTITIONING ACCORDING TO  
AVAILABLE RESOURCES

# DATA PARALLEL PARADIGM



# DATA PARALLEL PARADIGM



FLINT'S TAXONOMY  
**MIMD**  
EXECUTING  
**DIFFERENT**  
**INSTRUCTIONS**  
**OR PROGRAMs (TASKs)**  
**ON DIFFERENT DATA**

# TASK PARALLEL PARADIGM

LOADING DATA

**TASKs**  
**SCHEDULING**

ASSIGNING **TASKs**

**TASKs EXECUTION**

**GATHERING RESULTS**

TASK PARALLEL PARADIGM

# SERIAL SOLUTION



TASK PARALLEL PARADIGM

# SERIAL SOLUTION



PowerBook G4

TASK PARALLEL PARADIGM

# PARALLEL SOLUTION



SCHEDULING ACCORDING TO  
AVAILABLE RESOURCES

# TASK PARALLEL PARADIGM



# DATA PARALLEL PARADIGM



# ARCHITECTURE

CLASSIFICATION COULD BE  
ACCORDING TO

## MEMORY MODEL

HOW CPU VIEWS AVAIALBE  
MEMORY ?

SHARED MEMORY

DISTRIBUTED MEMORY

## SHARED MEMORY MODEL

SINGLE  
ADDRESS SPACE

ALL PROCESSORS HAVE ACCESS  
TO POOL OF

SHARED MEMORY

ALSO CALLED

MULTI-PROCESSORS  
HPC SYSTEM (MPs)

## SHARED MEMORY MODEL

# SHARED

MEMORY MODEL

## MULTI PROCESSORS

GLOBAL MEMORY SPACE



MULTIPLE  
ADDRESS SPACES

EACH PROCESSORS HAVE ACCESS  
TO ITS OWN

LOCAL MEMORY

MUST DO MESSAGE PASSING TO  
EXCHANGE DATA BETWEEN  
PROCESSORS

# DISTRIBUTED

## MULTI COMPUTERS



# CLUSTERS

DISTRIBUTED COMPUTING

INDEPENDENT MACHINES

LOOSELY COUPLED BY A

**SCHEDULER**

TO SOLVE A

COMPLEX COMPUTATIONAL

PROBLEM



## Inter-Processor Connection Mechanism



MODERN CLUSTERS ARE EVEN  
**MORE COMPLEX** (HYPRID)

**GRID COMPUTING**  
**INDEPENDENT MACHINES**  
**LOOSELY COUPLED**  
**COMBINED INTO A**  
**UNIFIED SYSTEM BY**  
**SOFTWARE**  
**NETWORKING**

# TERMINOLOGY

SO FAR

SHARED MEMORY

DISTRIBUTED MEMORY

COMMUNICATIONS

SYNCHRONIZATION

PARALLEL OVERHEAD

SCALABILITY

CLUSTER COMPUTING

# HETEROGENEOUS COMPUTING

DEFINITION

SYSTEM  
COMPOSED OF DIFFERENT

**HETEROGENOUS**

COMPUTATIONAL UNITS

IN GENERAL

COMPUTATIONAL UNITS PROCESSORS

**DIFFERENT ISAs**

INSTRUCTION SET ARCHITECTRES

A HETEROGENEOUS SYSTEM COULD BE  
COMPOSED OF MIX OF CUs

1 GENERAL PURPOSE PROCESSOR (GPP)

2 SPECIAL PURPOSE PROCESSOR (SPP)

- \* DIGITAL SIGNAL PROCESSOR (DSP)
- \* GRAPHICS PROCESSING UNIT (GPU)

3 CO – ROCESSOR

4 CUSTOM ACCELERATION LOGIC

- \* ASICs
- \* FPGAs

# Wrap Up

Incoming presentations  
will be more detailed

Thanks For  
Paying Attention