

# Betatron tune measurement with the LHC damper using a GPU

Master thesis

In collaboration with CERN (BE/RF)  
and Hepia

Author : Frédéric Dubouchet

Professor : Paul Albuquerque

# Plan

- Introduction Accelerator & tune, GPU programming
- Specifications
- Implemented software
- Results
- Discussion
- Conclusion

# Introduction

## Accelerators & tune

- Accelerators
- LHC damper
- Betatron tune
- Tune measurement
- Relation with FFT

# Accelerators

- LHC
  - 2 rings
  - Proton collider
  - 27 km long
  - supra-conducting cavities & magnets



# LHC damper

- Horizontal & Vertical
- Damp oscillations
- Feedback system
- BPM (Beam Position Measurement)



# Betatron tune (1/2)

- Transverse oscillation in vertical & horizontal planes
- Due to magnets
- $f\beta = Q * f_0$ 
  - $f\beta$  betatron frequency
  - $Q$  tune
  - $f_0$  revolution frequency
- Only fractional part



# Betatron tune (2/2)

- Values have to be avoided to insure stability
  - Integer values
  - Harmonics



# Tune measurement

- Tune measurement is critical for the upgrade
  - Need to have a bunch-by-bunch tune acquisition
  - Identify what causes instabilities in the machine
- BBQ (BI) -> Average
- LHC damper (RF) -> Bunch by bunch

# Relation with FFT



amplitude vs tune



amplitude vs tune vs time

- Transform position data to frequency
- Acquisition frequency is revolution frequency.
- Normalized values

# Introduction

## GPU programing

- GPU computing
  - OpenCL
  - Other GPU technologies
- Hardware solutions
  - FPGA
  - DSP

# GPU computing

- GPU are some order of magnitude faster than CPU
- Top computer in the world use GPU as base
- The problem need to be highly parallel
- GPU are quite cheap when compare to CPU or hardware solutions (FPGA/DSP)

# OpenCL

- Open standard  
(khronos group)
- Can be used on many  
platform
  - CPU
  - FPGA
  - GPU
- ...



# Other GPU technologies

- CUDA Nvidia GPU only
- DirectCompute Microsoft only
- Shader languages graphic specific
  - GLSL
  - HLSL
  - Cg

Nvidia Tesla



# FPGA

## Field-programmable gate array

- Hardware has to be build
- Can be faster than a GPU in specific cases
- High development cost



# DSP

## Digital signal processor

- Hardware has to be build
- Specific programming language
- Very good at FFT
- Less flexible than GPU



# Specifications

- Time constraints
- Hardware constraints
- Data volume
- Storing

# Constraints (1/2)

- “Real-time” system should give a value every ~10Hz
  - ~1024 (points) \* 2880 (bunches) in less than 100ms!
- Hardware should not disturb normal operation!
  - Separate crate needed
  - ~100M bytes per second per pickup
    - ~1/2G bytes per second for the full machine
  - A full run is 12 hours : ~5 Tera bytes per plane!

# Constraints (2/2)

- After the review “Functional Requirements on LHC Transverse Instability Diagnostics after LS1”
  - Need for a modular approach to suit ABP needs
    - Possibility to make specific crate for specific task
  - OP asked to be able to “freeze” up to 3 seconds before and after an instability
    - ~3G per planes to be stored, can be buffered in RAM and then stored.

# Implemented Software

- Software General view
  - ADTDSPU control
  - Acquisition
  - Data analysis Software

# Software general view



# ADTDSPU control



# Acquisition



## Real-time process

### Server process

#### Control interface

File Type  
File Name  
RMS threshold  
Tune window  
Device name

#### Data interface

Device selection  
Buffer type  
Frequency peak  
Time stamp  
Buffers

Device 0  
Device 1

Time domain  
Amplitude linear  
Amplitude log  
Avg amp linear  
Avg amp log  
Phase

# Data analysis software



Time is given for 3000 times 2048 points on a single Fermi card.

# Results

- FFT
- Amplitude & Accumulate
- SVD
- Performances
- Spectrogram

# FFT

- Using reference implementation on OpenCL
  - Simple Radix2 kernel
  - Parallelized  $N/2$
  - Parallelized on bunch
- Maximum compute units is  $N/2 * \text{bunches}$
- Loop  $\log_2(N)$  times



# Amplitude & accumulate

- Amplitude is hardcoded into the GPU
  - length, hypot,...
- Accumulate has to be done atomically
  - Atomic accumulate is not present on Fermi
    - using `atomic_cmpxchg`

# SVD

- Used GSL and C++

- GSL only support double

- Highly dependent on correlation between bunches

- Sample acquisition only have 6 correlated bunches

Speed with M (2048 x 100)

| Bunches | Acquisition | Time  |
|---------|-------------|-------|
| 5       | 20          | 0.15s |
| 4       | 25          | 0.30s |
| 2       | 50          | 2.04s |
| 1       | 100         | 16.9s |

# Performances

| Device       | Type   | Threads | Speed [GHz] | Pipeline | Time [ms] |
|--------------|--------|---------|-------------|----------|-----------|
| Xeon X5650   | FFTW   | 12      | 2.67        | N/A      | 291       |
| Xeon X5650   | OpenCL | 12      | 2.67        | enable   | 284       |
| Xeon X5650   | OpenCL | 12      | 2.67        | disable  | 288       |
| i7-3720QM    | FFTW   | 8       | 2.6         | N/A      | 310       |
| i7-3720QM    | OpenCL | 8       | 2.6         | enable   | 272       |
| i7-3720QM    | OpenCL | 8       | 2.6         | disable  | 273       |
| Tesla M2090  | OpenCL | 512     | 1.3         | enable   | 35        |
| Tesla M2090  | OpenCL | 512     | 1.3         | disable  | 37        |
| GeForce 650M | OpenCL | 384     | 0.9         | enable   | 355       |
| GeForce 650M | OpenCL | 384     | 0.9         | disable  | 365       |

3000 \* 2048 points FFT and amplitude on various hardware and settings

# Spectrogram

- Allow observation of tune with time
- Give a general overview of what is happening in the machine



# Discussion

- Observations
- Data flow
- Hardware
- Software

# Observations

- See the tune moving with time
- Depend on the Damper settings
- More investigation needed when damper on

Tune moving with damper off



Tune? with damper on



# Data flow



# Hardware



- Updating the hardware in order to be able to export the bunch-by-bunch acquisitions
- Create a prototype of the computing box
- Modify the firmware of the SPEC card

# Software

- Need to integrate the modification of the Hardware
- Need drivers for the Rx card in the acquisition box
- Need an integrated FESA class
  - Control the Rx card
  - Control computation on the GPU
  - Send result to the Operation

# Conclusion

- The “real-time” bunch-by-bunch tune measurement is achievable
- the review “Functional Requirements on LHC Transverse Instability Diagnostics after LS1” showed need for the system
- First prototype is on the way
- We will start by using the SPEC card from BE/CO/HT
- Other projects are interested by the concept

# Questions?

- Special thanks to CERN and Hepia
  - Wolfgang Höfle
  - Paul Albuquerque
- See the thesis for all the references