

# Distributed FPGA for enhanced Image Processing

## Bildverarbeitung mit Vivado HLS

# Inhaltsverzeichnis

- Projekt
- Vivado HLS Überblick
- Datenverarbeitung mit Vivado HLS
- Fazit

# Ausgangslage

- 1500 MP Bilder
- Preprocessing
- Distributed FPGA Lösung



# Aufbau



# Vivado HLS: Übersicht

- Konzept: C/C++ als Sprache für Hardwarebeschreibung
- Design Flow
- Erweiterungen in C/C++
- GUI
- Pragmas

# Konzept



# Design Flow



**C-Simulation**

Schneller Bearbeiten/Komplizieren/Testen Kreislauf

**C-Synthese**

C Code wird in VHDL Code übersetzt

**VHDL/C Co-Simulation**

VHDL Code wird simuliert und die Resultate mit der C-Simulation verglichen

# Erweiterungen in C/C++

## Arbiträre Datentypen

`ap_uint<12>`

## C++ Klassen

`hls::stream`

## Directives (Pragmas)

- Definition der Schnittstellen
- Loop unrolling/ pipelining
- Array partitioning/reshaping
- Dataflow
- Resource control

→ Deren Verständnis ist der Schlüssel zum Erfolg mit Vivado HLS

# GUI: Synthese



# GUI: Debug



# GUI: Analyse



# GUI: Synthesis report



# GUI: Wave



# Pragmas

## Interface

- interface
- protocol
- stream

## Ressource

- resource
- array\_map
- array\_partition
- array\_reshape
- data\_pack
- dependence
- allocation

## Throughput/Area

- unroll
- pipeline
- inline
- dataflow
- loop\_flatten
- loop\_merge

## Verschiedene

- clock
- reset
- loop\_tripcount

# Pragma: INTERFACE

**Control**

- ap\_ctrl\_none
- ap\_ctrl\_hs

**AXI4**

- axis
- m\_axi
- s\_axilite

**Memory**

- ap\_fifo
- bram

```
#pragma HLS INTERFACE axis register forward port=outMean
```

# Pragma: RESSOURCE



```
#pragma HLS RESOURCE variable=pixel core=RAM_2P_BRAM
```

# Pragma: ARRAY\_PARTITION



```
#pragma HLS ARRAY_PARTITION variable=pixel dim=1 cyclic factor=21
```

# Pragma: PIPELINE / UNROLL



```
#pragma HLS PIPELINE  
#pragma HLS UNROLL factor=21
```

# Ausgangslage



$$\mu = \frac{\sum_{i=1}^N p_i}{N} \quad \sigma^2 = \frac{\sum_{i=1}^N (p_i - \mu)^2}{N - 1}$$



# Problem

- Jedes Pixel wird zweimal gebraucht
- Zuerst Mittelwert, dann Varianz berechnen



$$\mu = \frac{\sum_{i=1}^N p_i}{N} \quad \sigma^2 = \frac{\sum_{i=1}^N (p_i - \mu)^2}{N - 1}$$

# Solutions

## Interface

```
#pragma HLS INTERFACE axis register reverse port=inDATA
```

## Array Partition

```
#pragma HLS ARRAY_PARTITION variable=pixel cyclic factor=21 dim=1
```

## Pipeline Loops

```
#pragma HLS PIPELINE
```

## Unroll Loops

```
#pragma HLS UNROLL factor=21
```

# Vergleich



# Analyse

|     | Operation\Control Step  | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 |
|-----|-------------------------|----|----|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| 88  | loop_strData            |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 89  | indvars_iv1(phi_mux)    |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 90  | indvars_iv2(phi_mux)    |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 91  | i2(phi_mux)             |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 92  | tmp_8(icmp)             |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 93  | indvars_iv_next(+)      |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 94  | tmp_10(+)               |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 95  | pixel_V_0_load_1(read)  |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 96  | pixel_V_1_load_1(read)  |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 97  | pixel_V_2_load_1(read)  |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 98  | pixel_V_3_load_1(read)  |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 99  | pixel_V_4_load_1(read)  |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 100 | pixel_V_5_load_1(read)  |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 101 | pixel_V_6_load_1(read)  |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 102 | pixel_V_7_load_1(read)  |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 103 | pixel_V_8_load_1(read)  |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 104 | pixel_V_9_load_1(read)  |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 105 | pixel_V_10_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 106 | pixel_V_11_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 107 | pixel_V_12_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 108 | pixel_V_13_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 109 | pixel_V_14_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 110 | pixel_V_15_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 111 | pixel_V_16_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 112 | pixel_V_17_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 113 | pixel_V_18_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 114 | pixel_V_19_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 115 | pixel_V_20_load_1(read) |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 116 | indvars_iv_next1(+)     |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 117 | node_289(write)         |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 118 | node_294(write)         |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 119 | node_298(write)         |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 120 | node_302(write)         |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 121 | node_306(write)         |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |
| 122 | node_310(write)         |    |    |    |    |    |    |     |     |     |     |     |     |     |     |     |

21 read cycles

21 write cycles

# Fazit

## Vorteile

- Schneller Design Flow
- Lösungen Vergleichen
- Portabel



## Nachteile

- Mächtiges Tool das es zu beherrschen gilt
- Nicht nur einfach C/C++ Code

# Fazit



## Stolpersteine

- Kleine Änderungen können zu grossen Unterschieden in Zeit und Ressourcennutzung führen
  - Nur ein Pragma ändern
- Hardwaresnahes Denken wird vernachlässigt
  - Flache Hierarchie im C/C++ Code

# Fazit

## Unsere Erfahrungen

- Pragmas müssen richtig eingesetzt werden
- Mit wenig VHDL Kenntnissen zum Erfolg
- AXI Interfaces schnell umgesetzt
- Auch FSM möglich

# Distributed FPGA for enhanced Image Processing

## Bildverarbeitung mit Vivado HLS

Noah Hütter  
Jan Stocker

[noahhuetter@gmail.com](mailto:noahhuetter@gmail.com)  
[janstocker@gmx.ch](mailto:janstocker@gmx.ch)



[bit.do/ECC18](http://bit.do/ECC18)

UG902 Vivado HLS User Guide  
UG871 Vivado HLS Tutorial  
HLS Pragmas  
Präsentation & Code

[PDF](#)

[PDF](#)

[Site](#)

[Site](#)

# Latency and Initiation Interval



# FSM mit Vivado HLS

```
void fsm() {  
  
    static enum dState {D_IDLE = 0, D_WRITE, D_READ} state;  
  
    switch(state) {  
        case D_IDLE:  
            if ( next_state_condition )  
                state = D_WRITE;  
            // Do stuff  
            break;  
        case D_WRITE:  
            if ( next_state_condition )  
                state = D_READ;  
            // Do stuff  
            break;  
        case D_READ:  
            if ( next_state_condition )  
                state = D_IDLE;  
            // Do stuff  
            break;  
    }  
}
```