



**NED University of Engineering & Technology**

Spring Semester 2020

CIS - TE

# IMAGE PROCESSING ON FPGA

VLSI DESIGN  
CS-319

Dr. Saad Qasim

Abdul Ahad  
CS0-17022

Syedah Nawal Munif  
CS-17024

Ahsan Abbas  
CS-15130

# Table of Contents

|                                                   |    |
|---------------------------------------------------|----|
| Abstract.....                                     | 2  |
| Imaging Basics .....                              | 3  |
| Neighborhood Image Processing .....               | 4  |
| 2D Convolution in Image Processing .....          | 5  |
| ZYNQ Board .....                                  | 7  |
| System Architecture / Design .....                | 7  |
| Line Buffers.....                                 | 8  |
| DDR .....                                         | 10 |
| DMA Controller and AXI Interface.....             | 10 |
| Interrupt based processing .....                  | 10 |
| Block representation of the System Logic.....     | 11 |
| Verilog Coding .....                              | 12 |
| 1. Designing Line Buffer.....                     | 12 |
| 2. Designing Multiply Accumulate Model (MAC)..... | 14 |
| a.Box Blur Filter .....                           | 14 |
| b. Edge Detection using Sobel Operator .....      | 16 |
| c. Unsharp masking.....                           | 18 |
| d. High Pass Filtering.....                       | 20 |
| e. Low Pass Filtering .....                       | 22 |
| 3. Designing Control Logic.....                   | 24 |
| 4. Image Processing IP Packaging .....            | 27 |
| a)RTL Schematic .....                             | 28 |
| b) Utilization Report .....                       | 29 |
| 6. Simulating the Design .....                    | 32 |
| a.Testbench.....                                  | 32 |
| b. Waveform .....                                 | 34 |
| 7. Output Images.....                             | 35 |
| 8. References .....                               | 36 |

# **Abstract**

In order to implement the image processing algorithms and to process the amount of data captured from sources such as medical instruments, intelligent high speed real-time systems, etc. has become imperative. In this project, an efficient FPGA-based design and implementation of image processing algorithm will be presented using verilog hardware description language. The FPGA provides the necessary hardware for image processing algorithms with flexibility to support image processing by using different algorithms. Verilog coding is done in Vivado 2014.2.

# Imaging Basics

Images are like matrix. If they are gray-scale then they are matrix of two-dimensional arrays but if they are colored then we say that they are three-dimensional. The following image is decomposed into pixels where each box represents a single pixel. 1 byte data represents the intensity of the pixel in case of a gray-scale image. If it is a colored image, each box will have three values representing the RGB values of that pixel.



=

|     |     |     |     |     |     |     |     |     |     |     |    |    |    |    |    |    |    |    |    |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----|----|----|----|----|----|----|----|----|
| 193 | 194 | 190 | 184 | 169 | 144 | 128 | 89  | 60  | 60  | 109 | 44 | 40 | 46 | 45 | 45 | 58 | 61 | 72 | 50 |
| 191 | 195 | 192 | 173 | 134 | 114 | 121 | 116 | 64  | 77  | 60  | 41 | 41 | 43 | 41 | 41 | 46 | 60 | 67 | 57 |
| 187 | 196 | 178 | 139 | 110 | 113 | 112 | 132 | 126 | 61  | 70  | 55 | 45 | 42 | 40 | 37 | 39 | 53 | 50 | 59 |
| 170 | 186 | 151 | 122 | 114 | 117 | 114 | 131 | 139 | 76  | 83  | 74 | 52 | 45 | 45 | 41 | 43 | 46 | 39 | 47 |
| 147 | 163 | 139 | 131 | 132 | 121 | 125 | 143 | 132 | 78  | 64  | 64 | 42 | 33 | 35 | 30 | 32 | 36 | 33 | 43 |
| 129 | 126 | 132 | 148 | 134 | 136 | 141 | 133 | 121 | 81  | 72  | 67 | 49 | 30 | 24 | 21 | 25 | 30 | 32 | 34 |
| 126 | 101 | 106 | 146 | 149 | 132 | 138 | 134 | 101 | 80  | 65  | 62 | 53 | 37 | 27 | 28 | 21 | 28 | 39 | 40 |
| 137 | 117 | 103 | 130 | 141 | 118 | 119 | 99  | 83  | 74  | 66  | 60 | 52 | 42 | 30 | 27 | 21 | 29 | 39 | 33 |
| 141 | 115 | 97  | 103 | 82  | 79  | 84  | 80  | 79  | 74  | 69  | 64 | 52 | 45 | 26 | 31 | 25 | 25 | 29 | 35 |
| 105 | 99  | 64  | 67  | 70  | 71  | 78  | 83  | 83  | 79  | 80  | 72 | 57 | 46 | 44 | 65 | 29 | 18 | 21 | 27 |
| 63  | 60  | 52  | 56  | 65  | 75  | 86  | 92  | 87  | 82  | 83  | 81 | 74 | 53 | 62 | 52 | 27 | 26 | 27 | 25 |
| 55  | 37  | 35  | 46  | 56  | 64  | 71  | 82  | 83  | 85  | 82  | 73 | 62 | 54 | 58 | 30 | 27 | 25 | 22 | 24 |
| 51  | 40  | 32  | 54  | 90  | 73  | 75  | 71  | 70  | 85  | 79  | 62 | 49 | 43 | 47 | 29 | 44 | 82 | 53 | 25 |
| 54  | 39  | 43  | 63  | 61  | 75  | 76  | 83  | 73  | 75  | 72  | 68 | 64 | 61 | 42 | 33 | 29 | 86 | 92 | 69 |
| 53  | 41  | 47  | 47  | 58  | 70  | 62  | 83  | 97  | 92  | 85  | 71 | 73 | 55 | 42 | 27 | 25 | 45 | 75 | 70 |
| 34  | 36  | 43  | 47  | 58  | 58  | 53  | 65  | 112 | 111 | 81  | 65 | 71 | 74 | 41 | 28 | 53 | 70 | 43 | 43 |
| 35  | 42  | 45  | 43  | 43  | 54  | 59  | 80  | 134 | 133 | 86  | 55 | 69 | 74 | 43 | 23 | 72 | 39 | 19 | 13 |
| 52  | 48  | 44  | 39  | 40  | 42  | 49  | 69  | 111 | 115 | 78  | 46 | 49 | 65 | 31 | 31 | 34 | 19 | 18 | 15 |
| 47  | 46  | 43  | 39  | 28  | 25  | 25  | 40  | 80  | 71  | 84  | 35 | 43 | 37 | 30 | 44 | 56 | 48 | 45 | 41 |
| 43  | 40  | 39  | 35  | 23  | 19  | 26  | 44  | 38  | 44  | 50  | 36 | 29 | 31 | 31 | 81 | 72 | 63 | 70 | 76 |

In this project, we have performed processing of gray-scale images which are two-dimensional array matrices. In fact a 'gray' color is one in which the red, green and blue components all have equal intensity in RGB space, and so it is only necessary to specify a single intensity value for each pixel, as opposed to the three intensities needed to specify each pixel in a full color image. Often, the grayscale pixel intensity is stored as an 8-bit integer (1 byte) giving 256 possible different shades of gray from black to white.

# Neighborhood Image Processing

A sliding neighborhood operation is an operation that is performed on a pixel at a time, with the value of any given pixel in the output image being determined by applying some algorithm to the values of the corresponding input pixel's *neighborhood*. A pixel's neighborhood is some set of pixels, defined by their locations relative to that pixel, which is called the *center pixel*. The neighborhood is a rectangular block, and as you move from one element to the next in an image matrix, the neighborhood block slides in the same direction.



In neighborhood processing, we calculate the new pixel of the output image from the pixel of the input image, which is not only based on its value in the input image, but also on its neighbors. There are eight immediate neighbors of each pixel in the image matrix.



# 2D Convolution in Image Processing

Convolution is a simple mathematical operation which is fundamental to many common image processing operators. Convolution provides a way of 'multiplying together' two arrays of numbers, generally of different sizes, but of the same dimensionality, to produce a third array of numbers of the same dimensionality.

In 2D convolution we start with a kernel, which is simply a small matrix of weights. This kernel "slides" over the 2D input data, performing an elementwise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel.

|                |                |                |   |   |
|----------------|----------------|----------------|---|---|
| 3              | 3              | 2              | 1 | 0 |
| 0 <sub>0</sub> | 0 <sub>1</sub> | 1 <sub>2</sub> | 3 | 1 |
| 3 <sub>2</sub> | 1 <sub>2</sub> | 2 <sub>0</sub> | 2 | 3 |
| 2 <sub>0</sub> | 0 <sub>1</sub> | 0 <sub>2</sub> | 2 | 2 |
| 2              | 0              | 0              | 0 | 1 |

  

|      |      |      |
|------|------|------|
| 12.0 | 12.0 | 17.0 |
| 10.0 | 17.0 | 19.0 |
| 9.0  | 6.0  | 14.0 |

Kernel is also called a mask and it can be of different size depending upon the operation.



We assume a dummy row on the edges of the image in order to process the edge pixels of the input image. The output picture may not have values corresponding to the pixels on the edges if we don't consider a dummy row on the edges. So the resolution of the output image becomes roughly unequal than the resolution of the input image.

In this project, we are taking a  $512 \times 512$  image as input then the output image would of  $514 \times 514$  due to the fact that we have not considered a dummy row on the edges.



# ZYNQ Board

The ZedBoard Znyq-7000 development kit produced by Xilinx Company was used in the study. As a result of grayscale conversion and convolution operations, such applications as edge detection, sharpening and blurring were realized on the test images. Several FPGAs were explored to choose the best development kit for high scale image processing. Although in this project implementation only a  $512 \times 512$  image is used but an image of greater resolution could be easily and rapidly processed through the Zynq Board. This is basically signified in the hardware implementation. Since we have developed the project only on software therefore its complete powerful processing capabilities could not be clearly highlighted.

## System Architecture / Design

We cannot use pure streaming design for neighborhood processing algorithm since pixels for processing are not consecutive. Hence, we would buffer the pixels inside the IP before processing them. It is not practical to buffer the entire image inside the IP, so we would be buffering just enough pixels for processing. For instance, a  $512 \times 512$  grayscale image requires 262144 Bytes and if we would use a  $3 \times 3$  kernel then we would just need three image pixel lines (rows) to start processing.



## Line Buffers

Line buffers are inside FPGA used for storing one line/row of input image. For a  $512 \times 512$  image, one line buffer would be of 512 Bytes in size and three line buffers would require 1536 Bytes. In actual, the size of the line buffer depends only on the width of the image. From DDR we will send three rows of pixels first to IP for processing, then IP would process it and will send back the processed pixels to DDR. Then we will send the data of next three line buffers. In convolution of neighborhood processing on image, we might have to reuse the same rows of image pixels multiple times, so we would use multiplexer to reuse the same line buffer for pixel processing. For better performance, we would not be sending same pixel rows again and again.



When we first perform convolution, the first three pixel lines would be used.

## Line buffers



Then the first line buffer would be replaced with the fourth pixel row and the multiplexers would be adjusted in such a way that now the second line buffer would be chosen as first pixel row, third line buffer as second pixel row and the first line buffer as the third pixel row. This would be done through select lines.



We can improve the system performance by adding a fourth line buffer. While the three line buffers are being processed, the next pixel row can be sent to the fourth buffer so that data transfer happens in parallel to data processing.

## DDR

It is basically our computer memory. The input image is stored in the computer and full path is provided in the code in order to process it.

## DMA Controller and AXI Interface

The Xilinx IP AXI Direct Memory Access (AXI DMA) core is a soft Xilinx IP core for use within the Xilinx Vivado Design Suite. The AXI DMA provides high-bandwidth direct memory access between memory (line buffers) and AXI4-Stream target peripherals. While packaging the image processing IP, the input output ports declared in the code are mapped with the AXI Interface. Primary high-speed DMA data movement between system memory (DDR) and stream target (Image Processing IP) is through the AXI4 Read Master to AXI4 memory-mapped to stream (MM2S) Master, and AXI stream to memory-mapped (S2MM) Slave to AXI4 Write Master. AXI DMA also enables up to 16 multiple channels of data movement on both MM2S and S2MM paths. This interface is basically used to stream pixel data to and from the Image Processing IP rapidly without any delay and stored in the system memory. This interface would be helpful when we implement the design logic on hardware.

## Interrupt based processing

We would have an interrupt signal coming from the image processing IP to the Zynq PS.

Initially the image is stored in DDR. Then we would configure the DMA Controller to stream the first four pixel rows to the image processing IP. Convolution is performed in the image processing IP and once it is finished then interrupt would be sent to the processor (Zynq PS). After the interrupt is received, then next line of pixel row would be sent to the IP. After convolution of one row of pixel data, the IP would be streaming the processed pixels to DDR through AXI4 interface of the DMA Controller.

## Block representation of the System Logic

# System Architecture



# Verilog Coding

Xilinx IP provides different kinds of RAMs but the kind of flexibility we need is not provided by Xilinx IP, so we have written our own code for line buffers.

## 1. Designing Line Buffer

We have made the following assumptions:

1. One pixel at a time is written to the line buffer.
2. Size of the image is 512 x 512 , so we have to store 512 pixels in the line buffers.
3. For a 3 x 3 kernel, convolution is performed on three pixels. So three pixels are read in one short which are 24 bits of data.

Based on these assumptions, the following Verilog code is written.



The screenshot shows the Vivado IDE interface with the Verilog source code for 'lineBuffer.v' open. The code defines a module 'lineBuffer' with various input and output ports. It includes logic for writing data to a memory buffer and reading data from it. The code is annotated with comments explaining its functionality.

```
lineBuffer.v
E:/ISE_147/vivado/software/Vivado/2014.2/examples/Vivado_Tutorial/Projects/image_processing_zynq/image_processing_zynq.srcs/sources_1/new/lineBuffer.v
21
22 //image is of 512x512 pixels
23
24 module lineBuffer(
25     input i_clk,
26     input i_RST, //active high reset
27     input [7:0] i_data, //write port (one byte pixel written to line buffer (composed of 8 bits))
28     input i_data_valid, //indicates whether the pixel data received is valid or not
29     output [23:0] o_data, //read port (three pixels are read from line buffer in one short for convolution (24 bits))
30     input i_rd_data //indicates that data should be read from the line buffer
31 );
32
33
34     reg [7:0] line [511:0]; //line buffer (memory) where the pixel data would be stored
35     reg [8:0] wrPntr; //write pointer(tells in which memory location of the line buffer data will be stored)
36     reg [8:0] rdPntr; //read pointer
37
38     //writing in memory part
39
40     always@(posedge i_clk)
41     begin
42         if(i_data_valid) //if there is a valid data then
43             line[wrPntr] <= i_data; //that data would be stored in a memory location pointed by the
44                                     //wrPntr
45     end
46
47
48     always@(posedge i_clk)
49     begin
50         if(i_RST)
51             wrPntr <= 'd0; //initially the wrPntr is pointing to the first memory location
52         else if(i_data_valid)
53             wrPntr <= wrPntr + 'd1; //then wrPntr will be incremented afterwards
54     end
55 
```

```
56 //reading from memory part
57
58
59 assign o_data = {line[rdPntr],line[rdPntr+1],line[rdPntr+2]}; //3 bytes read from memory (3 pixels = 24 bits)
60 //to read these three bytes we have not used sequential block because that would give us latency between reading the pixel value
61 //and incrementing the read pointer
62
63
64 always@(posedge i_clk)
65 begin
66     if(i_RST)
67         rdPntr <= 'd0; //initially the rdPntr will read from the first memory location
68     else if(i_rd_data)
69         rdPntr <= rdPntr + 'd1; //afterwards the read pointer would be incremented to read the next pixel
70 end
71
72
73
74 endmodule
```

## 2. Designing Multiply Accumulate Model (MAC)

MAC unit is a fundamental block in the computing devices, especially Digital Signal Processor. MAC unit performs multiplication and accumulation process. Basic MAC unit consists of multiplier, adder, and accumulator. Here the MAC unit would perform the convolution and multiplication with kernel of the pixel data read from the line buffers.

### a. Box Blur Filter

A box blur (also known as a box linear filter) is a spatial domain linear filter in which each pixel in the resulting image has a value equal to the average value of its neighboring pixels in the input image. It is a form of low-pass ("blurring") filter. The simplest low-pass filter just calculates the average of a pixel and all of its eight immediate neighbors. The result replaces the original value of the pixel. The process is repeated for every pixel in the image.

A 3 by 3 box blur kernel can be written as matrix.

$$\frac{1}{9} \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}.$$

This kernel is multiplied with the pixel data of the input image and a blurred output image is obtained.

```

conv.v
E:/ISE_147/vivado/software/Vivado/2014.2/examples/Vivado_Tutorial/Projects/image_processing_zynq/image_processing_zynq.srcs/sources
20 //////////////////////////////////////////////////////////////////
21 //box blur operation
22
23 module conv(
24     input i_clk,
25     input [71:0] i_pixel_data, // we cannot declare as 2D hereve have flatten it
26                         // 24 x 3 = 72 bits (3 pixels)
27     input i_pixel_data_valid, //indicates the pixel data is valid
28     output reg [7:0] o_convolved_data, //one output pixel(8 bits)
29     output reg o_convolved_data_valid //indicates that the output data is valid
30 );
31
32
33
34     reg [7:0] kernel [8:0]; //box blur kernel, 2D
35     integer i;
36     reg [15:0] multData[8:0]; //to store mltiplication(with kernel) data
37     reg [15:0] sumDataInt; //16 bits
38     reg [15:0] sumData; //16 bits to store the pixel values after multiplicaton with kernel
39     reg multDataValid;
40     reg sumDataValid;
41     reg convolved_data_valid;
42
43
44
45     //initializing kernel
46
47     initial
48     begin
49         for(i = 0; i < 9; i = i+1)
50             begin
51                 kernel[i] = 1;
52             end
53     end
54
55     //multiplicationh kernel
56
57     always@(posedge i_clk)
58     begin
59         for(i = 0; i < 9; i = i+1)
60             begin
61                 multData[i] <= kernel[i]*i_pixel_data[i*8+:8];
62             end
63         multDataValid <= i_pixel_data_valid;
64     end
65
66
67     //addition to calculate output pixel
68
69     always@(*)
70     begin
71         sumDataInt = 0; //intermediate sum data
72         for(i = 0; i < 9; i = i+1)
73             begin
74                 sumDataInt = sumDataInt + multData[i];
75             end
76         end
77         //intermediate and final sum data are different because we cannot mix the blocking and non-blocking assignment
78         //also the sumData should be initially zero
79
80         always@(posedge i_clk)
81         begin
82             sumData <= sumDataInt; //final sum data
83             sumDataValid <= multDataValid;
84         end
85
86         //divison by 9
87
88         always@(posedge i_clk)
89         begin
90             o_convolved_data <= sumData/9; //only integer part will be assigned
91             o_convolved_data_valid <= sumDataValid;
92         end
93
94
95
96
97
98 endmodule

```

## b. Edge Detection using Sobel Operator

The Sobel operator is used in image processing and computer vision, particularly within edge detection algorithms where it creates an image emphasizing edges. The operator uses two  $3 \times 3$  kernels which are convolved with the original image to calculate the output pixels. The output image can be viewed as a bitmap image, with the areas of high gradient (the likely edges) visible as white lines. If we define A as the source image, and G<sub>x</sub> and G<sub>y</sub> are two images which at each point contain the vertical and horizontal derivative approximations respectively.

$$\mathbf{G}_x = \begin{bmatrix} +1 & 0 & -1 \\ +2 & 0 & -2 \\ +1 & 0 & -1 \end{bmatrix} * \mathbf{A} \quad \text{and} \quad \mathbf{G}_y = \begin{bmatrix} +1 & +2 & +1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix} * \mathbf{A}$$

The magnitude of the resultant pixel is calculated through the following:

$$\mathbf{G} = \sqrt{\mathbf{G}_x^2 + \mathbf{G}_y^2}$$

```
/*
22 //edge detection
23 module edge_detection(
24     input      i_clk,
25     input [71:0] i_pixel_data,
26     input      i_pixel_data_valid,
27     output reg [7:0] o_convolved_data, //threshold pixel value
28     output reg   o_convolved_data_valid
29 );
30
31 integer i;
32 reg [7:0] kernel1 [8:0]; //two 2D kernels
33 reg [7:0] kernel2 [8:0];
34 reg [10:0] multData1[8:0]; //to represent max. mult data value (1 bit is for sign)
35 reg [10:0] multData2[8:0]; //8 bits of pixel + max negative no. in kernel is -2 (represented in 3 bits)
36 reg [10:0] sumDataInt1; //so 8+3 = 11 bits are required
37 reg [10:0] sumDataInt2;
38 reg [10:0] sumData1;
39 reg [10:0] sumData2;
40 reg multDataValid;
41 reg sumDataValid;
42 reg convolved_data_valid;
43 reg [20:0] convolved_data_int1;
44 reg [20:0] convolved_data_int2;
45 wire [21:0] convolved_data_int;
46 reg convolved_data_int_valid;
47
```

```

48 initial
49 begin
50     kernell[0] = 1; //first kernel
51     kernell[1] = 0;
52     kernell[2] = -1;
53     kernell[3] = 2;
54     kernell[4] = 0;
55     kernell[5] = -2;
56     kernell[6] = 1;
57     kernell[7] = 0;
58     kernell[8] = -1;
59
60     kernel2[0] = 1; //second kernel
61     kernel2[1] = 2;
62     kernel2[2] = 1;
63     kernel2[3] = 0;
64     kernel2[4] = 0;
65     kernel2[5] = 0;
66     kernel2[6] = -1;
67     kernel2[7] = -2;
68     kernel2[8] = -1;
69 end
70
71 always @(posedge i_clk)
72 begin
73     for(i=0;i<9;i=i+1)
74         begin //multiplication with both the kernels is done in parallel (here signed
75             //multiplier is use because we have negative numbers in kernel
76             //if signed multiplier is not use then vivado will take the negative number in kernel
77             //as unsigned number
78             multData1[i] <= $signed(kernell[i])*$signed({1'b0,i_pixel_data[i*8+:8]});
79             multData2[i] <= $signed(kernel2[i])*$signed({1'b0,i_pixel_data[i*8+:8]});
80         end
81     multDataValid <= i_pixel_data_valid;
82 end
83
84 always @(*)
85 begin
86     sumDataInt1 = 0; //intermediate sum of the pixel values after multiplying by kernel
87     sumDataInt2 = 0;
88     for(i=0;i<9;i=i+1)
89         begin //sum is also done in parallel and is signed addition
90             sumDataInt1 = $signed(sumDataInt1) + $signed(multData1[i]);
91             sumDataInt2 = $signed(sumDataInt2) + $signed(multData2[i]);
92         end
93     end
94 end
95
96 always @(posedge i_clk)
97 begin
98     sumData1 <= sumDataInt1;
99     sumData2 <= sumDataInt2;
100    sumDataValid <= multDataValid;
101 end
102
103 always @(posedge i_clk)
104 begin //to compute the final pixel value after convolution (magnitude of pixel)
105     convolved_data_int1 <= $signed(sumData1)*$signed(sumData1);
106     convolved_data_int2 <= $signed(sumData2)*$signed(sumData2);
107     convolved_data_int_valid <= sumDataValid;
108 end
109
110 assign convolved_data_int = convolved_data_int1+convolved_data_int2; //the squares of Gx and Gy are added (threshold)
111
112 //we have defined in this block that if threshold is above 4000 than
113 //the pixel would be white otherwise black
114 always @(posedge i_clk)
115 begin
116     if(convolved_data_int > 4000)
117         o_convolved_data <= 8'hff; //output pixel is of 8 bits
118     else
119         o_convolved_data <= 8'h00;
120     o_convolved_data_valid <= convolved_data_int_valid;
121 end
122
123 endmodule
124

```

## c. Unsharp masking

Unsharp masking (USM) is an image sharpening technique. Sharpening an image increases the contrast between bright and dark regions to bring out features. The following array is a kernel for a common high pass filter used to sharpen an image:

**Sharpen filter** 
$$\begin{bmatrix} 0 & -1 & 0 \\ -1 & 5 & -1 \\ 0 & -1 & 0 \end{bmatrix}$$

```
21 //sharp image operation
22
23 module sharpen(
24     input i_clk,
25     input [71:0] i_pixel_data, // we cannot declare as 2D hereve have flatten it
26                           // 24 x 3 = 72 bits (3 pixels)
27     input i_pixel_data_valid, //indicates the pixel data is valid
28     output reg [7:0] o_convolved_data, //one output pixel(8 bits)
29     output reg o_convolved_data_valid //indictates that the output data is valid
30 );
31
32
33
34     reg [7:0] kernel [8:0]; //box blur kernel, 2D
35     integer i;
36     reg [10:0] multData[8:0]; //to store mltiplication(with kernel) data
37     reg [10:0] sumDataInt; //11 bits (signed data)
38     reg [10:0] sumData; //16 bits to store the pixel values after multiplicaton with kernel
39     reg multDataValid;
40     reg sumDataValid;
41     reg convolved_data_valid;
42
```

```

44
45      //initializing kernel
46
47      initial
48          begin
49              kernel[0] =  0;
50              kernel[1] = -1;
51              kernel[2] = 0;
52              kernel[3] = -1;
53              kernel[4] =  5;
54              kernel[5] = -1;
55              kernel[6] =  0;
56              kernel[7] = -1;
57              kernel[8] =  0;
58
59
60          end
61
62      //multiplicationh kernel
63
64      always@(posedge i_clk)
65      begin
66          for(i = 0; i < 9; i = i+1)
67              begin
68                  multData[i] <= $signed(kernel[i])*$signed({1'b0,i_pixel_data[i*8+:8]});
69              end
70          multDataValid <= i_pixel_data_valid;
71      end
72
73      //addition to calculate output pixel
74
75      always@(*)
76      begin
77          sumDataInt = 0; //intermediate sum data
78          for(i = 0; i < 9; i = i+1)
79              begin
80                  sumDataInt = $signed(sumDataInt) + $signed(multData[i]);
81              end
82          end
83      end
84      //intermediate and final sum data are different because we cannot mix the blocking and non-blocking assignment
85      //also the sumData should be initially zero
86
87      always@(posedge i_clk)
88      begin
89          sumData <= sumDataInt; //final sum data
90          sumDataValid <= multDataValid;
91      end
92
93
94      //sending the output pixel data
95
96      always@(posedge i_clk)
97      begin
98          o_convolved_data <= sumData;
99          o_convolved_data_valid <= sumDataValid;
100     end
101
102
103 endmodule

```

## d. High Pass Filtering

A high pass filter tends to retain the high frequency information within an image while reducing the low frequency information. The kernel of the high pass filter is designed to increase the brightness of the center pixel relative to neighboring pixels. The kernel array usually contains a single positive value at its center, which is completely surrounded by negative values. Below is just an example of one possible kernel for a high pass filter. Other filters may include more weighting for the center point.

|   | $A_1$ | $A_2$ | $A_3$ |
|---|-------|-------|-------|
| 1 | -7/9  | -7/9  | 0     |
| 2 | 0     | 5     | 0     |
| 3 | 0     | -7/9  | -7/9  |

```
--  
23 module brightness(  
24     input i_clk,  
25     input [71:0] i_pixel_data, // we cannot declare as 2D here we have flatten it  
26             // 24 x 3 = 72 bits (3 pixels)  
27     input i_pixel_data_valid, // indicates the pixel data is valid  
28     output reg [7:0] o_convolved_data, // one output pixel(8 bits)  
29     output reg o_convolved_data_valid // indicates that the output data is valid  
30 );  
31  
32  
33  
34     reg [7:0] kernel [8:0]; // kernel, 2D  
35     integer i;  
36     reg [10:0] multData[8:0]; // to store multiplication (with kernel) / data  
37     reg [10:0] sumDataInt; // 11 bits (signed data)  
38     reg [10:0] sumData; // to store the pixel values after multiplication with kernel  
39     reg multDataValid;  
40     reg sumDataValid;  
41     reg convolved_data_valid;  
42  
43  
44  
45 // initializing kernel  
46  
47 initial  
48 begin  
49     kernel[0] = -7;  
50     kernel[1] = -7;  
51     kernel[2] = 0;  
52     kernel[3] = 0;  
53     kernel[4] = 5;  
54     kernel[5] = 0;  
55     kernel[6] = 0;  
56     kernel[7] = -7;  
57     kernel[8] = -7;
```

```

60      end
61
62 //multiplicationh kernel
63
64 always@(posedge i_clk)
65 begin
66   for(i = 0; i < 9; i = i+1)
67     begin
68       multData[i] <= $signed(kernel[i])*$signed({1'b0,i_pixel_data[i*8+:8]});
69     end
70   multDataValid <= i_pixel_data_valid;
71 end
72
73
74 //addition to calculate output pixel
75
76 always@(*)
77 begin
78   sumDataInt = 0; //intermediate sum data
79   for(i = 0; i < 9; i = i+1)
80     begin
81       sumDataInt = $signed(sumDataInt) + $signed(multData[i]);
82     end
83   end
84 //intermediate and final sum data are different because we cannot mix the blocking and non-blocking assignment
85 //also the sumData should be initially zero
86
87 always@(posedge i_clk)
88 begin
89   sumData <= sumDataInt; //final sum data
90   sumDataValid <= multDataValid;
91 end
92
93
94 //sending the output pixel data
95
96 always@(posedge i_clk)
97 begin
98   o_convolved_data <= sumData/9;
99   o_convolved_data_valid <= sumDataValid;
100 end
101
102 endmodule
103

```

## e. Low Pass Filtering

A low pass filter provides the basis for smoothing operations. If an image contains too many variations to be able to determine specific features, smoothing can decrease the contrast so that some areas (especially the background) will not distract from viewing other areas of the image. Through low pass filtering kernel we have basically decreased the brightness of output image.

|   | $A_1$ | $A_2$ | $A_3$ |
|---|-------|-------|-------|
| 1 | 0     | 0     | 0     |
| 2 | 0     | 5/9   | 0     |
| 3 | 0     | 0     | 0     |

```
23 module low_pass(
24     input i_clk,
25     input [71:0] i_pixel_data, // we cannot declare as 2D here we have flatten it
26                         // 24 x 3 = 72 bits (3 pixels)
27     input i_pixel_data_valid, // indicates the pixel data is valid
28     output reg [7:0] o_convolved_data, // one output pixel(8 bits)
29     output reg o_convolved_data_valid // indicates that the output data is valid
30 );
31
32
33
34     reg [7:0] kernel [8:0]; // kernel, 2D
35     integer i;
36     reg [10:0] multData[8:0]; // to store multiplication (with kernel) data
37     reg [10:0] sumDataInt; // 11 bits (signed data)
38     reg [10:0] sumData; // to store the pixel values after multiplication with kernel
39     reg multDataValid;
40     reg sumDataValid;
41     reg convolved_data_valid;
42
43
44
```

```

45 //initializing kernel
46
47 initial
48 begin
49     kernel[0] = 0;
50     kernel[1] = 0;
51     kernel[2] = 0;
52     kernel[3] = 0;
53     kernel[4] = 5;
54     kernel[5] = 0;
55     kernel[6] = 0;
56     kernel[7] = 0;
57     kernel[8] = 0;
58
59
60 end
61
62 //multiplicationh kernel
63
64 always@(posedge i_clk)
65 begin
66     for(i = 0; i < 9; i = i+1)
67     begin
68         multData[i] <= $signed(kernel[i])*$signed({1'b0,i_pixel_data[i*8+:8]});
69     end
70     multDataValid <= i_pixel_data_valid;
71 end
72
73 //intermediate and final sum data are different because we cannot mix the blocking and non-blocking assignment
74 //also the sumData should be initially zero
75
76
77 always@(posedge i_clk)
78 begin
79     sumData <= sumDataInt; //final sum data
80     sumDataValid <= multDataValid;
81 end
82
83
84 //sending the output pixel data
85
86 always@(posedge i_clk)
87 begin
88     o_convolved_data <= sumData/9;
89     o_convolved_data_valid <= sumDataValid;
90 end
91 endmodule
92
93

```

### 3. Designing Control Logic

This Control Unit designed instantiates all the line buffers and the multiplexers. It controls storing pixel data into line buffers as well as it controls data to be send to the MAC Module.

```
2
3 module imageControl(
4     input          i_clk,
5     input          i_rst,
6     input [7:0]    i_pixel_data, //input pixel stream to be stored in line buffers
7     input          i_pixel_data_valid, //indicating the validity of incoming pixels data
8     output reg [71:0] o_pixel_data, //72 pixels (24x3) pixels are output
9     output          o_pixel_data_valid, //indicates the validity of output pixel data
0     output reg      o_intr
1 );
2
3 reg [8:0] pixelCounter; //it counts from 0 to 511. one line buffer should have 512 pixels.(0-511 represented in 9 bits)
4           //when the counter reaches 511, pixels are stored in the next line buffer.
5
6 reg [1:0] currentWrLineBuffer; //to count the number of line buffer (0-3)
7 reg [3:0] lineBuffDataValid; //to determine valid data to be written in each line buffer (0-3)
8 reg [3:0] lineBuffRdData; //to determine valid data to be read from 3 line buffers in parallel
9 reg [1:0] currentRdLineBuffer; //to determine from which line buffer data should be read
0
1 wire [23:0] lb0data;
2 wire [23:0] lb1data;
3 wire [23:0] lb2data;
4 wire [23:0] lb3data;
5
6 reg [8:0] rdCounter; //counts from 0 upto 511. Determines the number of pixels read from the current line buffer
7 reg rd_line_buffer; //enable signal sent to three line buffers at a time from where data should be read
8 reg [11:0] totalPixelCounter; //determines the valid data in all line buffers (512x4) 12 bits are required
9 reg rdState;
0
1 localparam IDLE = 'b0,
2           RD_BUFFER = 'b1;
3
4 assign o_pixel_data_valid = rd_line_buffer; //whenever rd_line_buffer signal becomes high, we would have valid output
5           //which is being read from the line buffers
6
7 always @(posedge i_clk) //determines the total data in all the line buffers
8 begin
9     if(i_rst)
0         totalPixelCounter <= 0;
1     else
2     begin
3         if(i_pixel_data_valid & !rd_line_buffer) //we have data but not reading from the line buffers
4             totalPixelCounter <= totalPixelCounter + 1;
5         else if(!i_pixel_data_valid & rd_line_buffer) //we dont have data but we are reading from line buffers
6             totalPixelCounter <= totalPixelCounter - 1;
7     end
8 end
9
0 //state machine to control the enable signal of the multiplexer (currentRdLineBuffer)
1 always @(posedge i_clk)
2 begin
3     if(i_rst)
4     begin
5         rdState <= IDLE; //initially no enable signal is send which would indicate no data is read
6         rd_line_buffer <= 1'b0; //from any of the line buffers in IDLE state
7         o_intr <= 1'b0;
8     end
9     else
0     begin
1         case(rdState)
2             IDLE:begin
3                 o_intr <= 1'b0;
4                 if(totalPixelCounter >= 1536) //512x3 =1536
5                     begin
6                         rd_line_buffer <= 1'b1; //we should have data in three line buffers
7                         rdState <= RD_BUFFER;
8                     end
9             end
8         end
9     end
0
1
2
3
4
5
6
7
8
9
```

```

RD_BUFFER:begin
    if(rdCounter == 511) //checks that 512 pixels are read from one line buffer
    begin
        rdState <= IDLE;
        rd_line_buffer <= 1'b0; //signal is made low again because we finished reading
        o_intr <= 1'b1;
    end
    end
endcase
end
end

always @(posedge i_clk) //this block is used to count pixels (8 bits) one by one
begin
    if(i_rst)
        pixelCounter <= 0; //initially the pixel counter is set to zero
    else
    begin
        if(i_pixel_data_valid)
            pixelCounter <= pixelCounter + 1; //input pixel stream to be stored in line buffers are incremented
                                            //when the pixel_data is valid
    end
end

always @(posedge i_clk) //block which tells in which line buffer the pixels would be saved
begin
    if(i_rst)
        currentWrLineBuffer <= 0; //initially pixels are stored in line buffer 0
    else
    begin
        if(pixelCounter == 511 & i_pixel_data_valid)
            currentWrLineBuffer <= currentWrLineBuffer+1; //when the pixel counter reaches 511, pixels are saved in the next line buffer
    end
end

always @(*)
begin
    lineBuffDataValid = 4'h0; //all of the data valid signal of line buffer should be zero
    lineBuffDataValid[currentWrLineBuffer] = i_pixel_data_valid; //except for the current line buffer in which pixel data is written
end

always @(posedge i_clk) //this block determines how much data has been read from the current line buffer
begin
    if(i_rst)
        rdCounter <= 0;
    else
    begin
        if(rd_line_buffer)
            rdCounter <= rdCounter + 1; //increments by 1 after each pixel is read
    end
end

always @(posedge i_clk) //this logic controls the currentRdLineBuffer
begin
    if(i_rst)
    begin
        currentRdLineBuffer <= 0; //initially the data is read from line buffer 0
    end
    else
    begin
        if(rdCounter == 511 & rd_line_buffer) //after reading 0-511 pixels from the current line buffer
            currentRdLineBuffer <= currentRdLineBuffer + 1; //data is read from the next line buffer
    end
end

```

```

59 always @(*) //to determine from which line buffer data should be read
60 begin
61     case(currentRdLineBuffer)
62         0:begin
63             o_pixel_data = {lb2data,lb1data,lb0data}; //data is read from line buffers 0,1,2
64         end
65         1:begin
66             o_pixel_data = {lb3data,lb2data,lb1data}; //data is read from line buffers 1,2,3
67         end
68         2:begin
69             o_pixel_data = {lb0data,lb3data,lb2data}; //data is read from line buffers 2,3,0
70         end
71         3:begin
72             o_pixel_data = {lb1data,lb0data,lb3data}; ////data is read from line buffers 3,0,1
73         end
74     endcase
75 end
76
77 //implementing multiplexer logic
78 always @(*) //this block is used to fetch data for the fourth line buffer
79 begin
80     case(currentRdLineBuffer) //this is the control signal to the multiplexer
81         0:begin
82             lineBuffRdData[0] = rd_line_buffer; //rd_line_buffer is enable signal sent to indicate
83             lineBuffRdData[1] = rd_line_buffer; //that data should be read from these line buffers
84             lineBuffRdData[2] = rd_line_buffer;
85             lineBuffRdData[3] = 1'b0;
86         end
87         1:begin
88             lineBuffRdData[0] = 1'b0;                                17
89             lineBuffRdData[1] = rd_line_buffer;                      18 lineBuffer lB1(
90             lineBuffRdData[2] = rd_line_buffer;                      19     .i_clk(i_clk),
91             lineBuffRdData[3] = rd_line_buffer;                      20     .i_rst(i_rst),
92         end                                         21     .i_data(i_pixel_data),
93         2:begin
94             lineBuffRdData[0] = rd_line_buffer;                      22     .i_data_valid(lineBuffDataValid[1]),
95             lineBuffRdData[1] = 1'b0;                                23     .o_data(lb1data),
96             lineBuffRdData[2] = rd_line_buffer;                      24     .i_rd_data(lineBuffRdData[1])
97             lineBuffRdData[3] = rd_line_buffer;                      25 );
98         end
99         3:begin
100             lineBuffRdData[0] = rd_line_buffer;
101             lineBuffRdData[1] = rd_line_buffer;
102             lineBuffRdData[2] = 1'b0;
103             lineBuffRdData[3] = rd_line_buffer;
104         end
105     endcase
106 end
107
108 //instantiation of line buffers
109 lineBuffer lB0(
110     .i_clk(i_clk),
111     .i_rst(i_rst),
112     .i_data(i_pixel_data),
113     .i_data_valid(lineBuffDataValid[0]),
114     .o_data(lb0data),
115     .i_rd_data(lineBuffRdData[0])
116 );
117
118 lineBuffer lB1(
119     .i_clk(i_clk),
120     .i_rst(i_rst),
121     .i_data(i_pixel_data),
122     .i_data_valid(lineBuffDataValid[1]),
123     .o_data(lb1data),
124     .i_rd_data(lineBuffRdData[1])
125 );
126
127 lineBuffer lB2(
128     .i_clk(i_clk),
129     .i_rst(i_rst),
130     .i_data(i_pixel_data),
131     .i_data_valid(lineBuffDataValid[2]),
132     .o_data(lb2data),
133     .i_rd_data(lineBuffRdData[2])
134 );
135
136 lineBuffer lB3(
137     .i_clk(i_clk),
138     .i_rst(i_rst),
139     .i_data(i_pixel_data),
140     .i_data_valid(lineBuffDataValid[3]),
141     .o_data(lb3data),
142     .i_rd_data(lineBuffRdData[3])
143 );
144
145 endmodule
146

```

## 4. Image Processing IP Packaging

Image Processing IP is interfaced with DMA Controller which has an AXI Stream interface. Data comes in and is sent via this interface. Built-in fifo generator is used from the IP catalog as an output buffer, in order to minimize any mismatch which would occur if DMA Controller is not ready to accept output pixel data from the Image Processing IP. So the output data is not sent directly to the DMA Controller from the IP, it is stored in the output buffer and is sent from it. Once a line buffer has finished processing the pixels, interrupt is sent to the processor from that line buffer which indicates that this line buffer is free now and next pixel row should be sent in to this line buffer. When interrupt signal is high, it represents that a line buffer is free. Ports are mapped in IP Packaging.

```
22 module imageProcessTop(
23     input  axi_clk,
24     input  axi_reset_n,
25     //slave interface //interface from which data would be coming through DMA Controller to IP
26     input  i_data_valid,
27     input [7:0] i_data,
28     output o_data_ready, //this ready signal is given to DMA Controller when we are exiting data
29     //master interface //data is send back to DMA Controller from IP through this interface
30     output o_data_valid,
31     output [7:0] o_data,
32     input  i_data_ready, //this ready signal is coming from DMA Controller
33     //interrupt
34     output o_intr //to indicate that line buffer is free
35
36 );
37
38 wire [71:0] pixel_data;
39 wire pixel_data_valid;
40 wire axis_prog_full;
41 wire [7:0] convolved_data; //output after convolution
42 wire convolved_data_valid;
43
44 assign o_data_ready = !axis_prog_full; //output buffer accepts data until not half full
45                                     //axis_prog_full signal is asserted when DMA signal is not accepting data
46
47 //instantiation of modules
48 imageControl IC(
49     .i_clk(axi_clk),
50     .i_rst(axi_reset_n), //active low reset
51     .i_pixel_data(i_data),
52     .i_pixel_data_valid(i_data_valid),
53     .o_pixel_data(pixel_data),
54     .o_pixel_data_valid(pixel_data_valid),
55     .o_intr(o_intr)
56 );
57
58 convl convo(
59     .i_clk(axi_clk),
60     .i_pixel_data(pixel_data),
61     .i_pixel_data_valid(pixel_data_valid),
62     .o_convolved_data(convolved_data),
63     .o_convolved_data_valid(convolved_data_valid)
64 );
65
66 outputBuffer OB( //built-in fifo generator is used from IP Catalog
67     .s_aclk(axi_clk),           // input wire s_aclk
68     .s_arstn(axi_reset_n),      // input wire s_arstn
69     .s_axis_tvalid(convolved_data_valid), // input wire s_axis_tvalid
70     .s_axis_tready(),          // output wire s_axis_tready //there is no case when fifo wont be ready to accept data
71     .s_axis_tdata(convolved_data), // input wire [7 : 0] s_axis_tdata
72     .m_axis_tvalid(o_data_valid), // output wire m_axis_tvalid
73     .m_axis_tready(i_data_ready), // input wire m_axis_tready
74     .m_axis_tdata(o_data),      // output wire [7 : 0] m_axis_tdata
75     .axis_prog_full(axis_prog_full) // output wire axis_prog_full
76 );
77
78
79
80
81 endmodule
```

## 5. Synthesizing the Design

After coding the design files in verilog, the design was synthesized. The synthesis process checks code syntax, analyzes the hierarchy of the design, recognizes the functionality described, and ensures that the design is optimized for the device architecture selected.

### a) RTL Schematic

After the synthesis phase of the design, a schematic representation of the synthesized design is displayed. RTL View is a Register Transfer Level graphical representation of the synthesized design.



## b) Utilization Report

```
1 Copyright 1986-2014 Xilinx, Inc. All Rights Reserved.
2 -----
3 | Tool Version : Vivado v.2014.2 (win64) Build 932637 Wed Jun 11 13:33:10 MDT 2014
4 | Date        : Sun Sep 13 13:11:25 2020
5 | Host        : DESKTOP-FQ1K1PB running 64-bit major release (build 9200)
6 | Command     : report_utilization -file imageProcessTop_utilization_synth.rpt -pb imageProcessTop_utilization_synth.pb
7 | Design      : imageProcessTop
8 | Device      : xc7z020
9 | Design State: Synthesized
.0 -----
.1
.2 Utilization Design Information
.3
.4 Table of Contents
.5 -----
.6 1. Slice Logic
.7 2. Memory
.8 3. DSP
.9 4. IO and GT Specific
.0 5. Clocking
.1 6. Specific Feature
.2 7. Primitives
.3 8. Black Boxes
.4 9. Instantiated Netlists
.5 -----
.6 11. Slice Logic
.7 -----
.8
.9 -----
30 |      Site Type    | Used | Fixed | Available | Util% |
31 +-----+-----+-----+-----+
32 | Slice LUTs*     | 1943 | 0 | 53200 | 3.65 |
33 | LUT as Logic    | 790 | 0 | 53200 | 1.48 |
34 | LUT as Memory   | 1153 | 0 | 17400 | 6.62 |
35 | LUT as Distributed RAM | 1152 | 0 | | |
36 | LUT as Shift Register | 1 | 0 | | |
37 | Slice Registers | 230 | 0 | 106400 | 0.21 |
38 | Register as Flip Flop | 230 | 0 | 106400 | 0.21 |
39 | Register as Latch | 0 | 0 | 106400 | 0.00 |
40 | F7 Muxes        | 32 | 0 | 26600 | 0.12 |
41 | F8 Muxes        | 0 | 0 | 13300 | 0.00 |
42 +-----+-----+-----+-----+
43 * Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt_design after synthesis, if
44
45
46 2. Memory
47 -----
48
49 +-----+-----+-----+-----+
50 |      Site Type    | Used | Fixed | Available | Util% |
51 +-----+-----+-----+-----+
52 | Block RAM Tile | 0 | 0 | 140 | 0.00 |
53 | RAMB36/FIFO* | 0 | 0 | 140 | 0.00 |
54 | RAMBL18        | 0 | 0 | 280 | 0.00 |
55 +-----+-----+-----+-----+
56 * Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E1 or one FIFO18E1. However, if
57
```

```

50
59 3. DSP
60 -----
61
62 +-----+-----+-----+-----+
63 | Site Type | Used | Fixed | Available | Util% |
64 +-----+-----+-----+-----+
65 | DSPs | 2 | 0 | 220 | 0.90 |
66 | DSP48E1 only | 2 | | | |
67 +-----+-----+-----+-----+
68
69
70 4. IO and GT Specific
71 -----
72
73 +-----+-----+-----+-----+
74 | Site Type | Used | Fixed | Available | Util% |
75 +-----+-----+-----+-----+
76 | Bonded IOB | 11 | 0 | 200 | 5.50 |
77 | Bonded IPADs | 0 | 0 | 2 | 0.00 |
78 | Bonded IOPADs | 0 | 0 | 130 | 0.00 |
79 | PHY_CONTROL | 0 | 0 | 4 | 0.00 |
80 | PHASER_REF | 0 | 0 | 4 | 0.00 |
81 | OUT_FIFO | 0 | 0 | 16 | 0.00 |
82 | IN_FIFO | 0 | 0 | 16 | 0.00 |
83 | IDELAYCTRL | 0 | 0 | 4 | 0.00 |
84 | IBUFGDS | 0 | 0 | 192 | 0.00 |
85 | PHASER_OUT/PHASER_OUT_PHY | 0 | 0 | 16 | 0.00 |
86 | PHASER_IN/PHASER_IN_PHY | 0 | 0 | 16 | 0.00 |
87 | IDELAYE2/IDELAYE2_FINEDELAY | 0 | 0 | 200 | 0.00 |
88 | ILOGIC | 0 | 0 | 200 | 0.00 |
89 | OLOGIC | 0 | 0 | 200 | 0.00 |
90 +-----+-----+-----+-----+
91
92
93 5. Clocking
94 -----
95
96 +-----+-----+-----+-----+
97 | Site Type | Used | Fixed | Available | Util% |
98 +-----+-----+-----+-----+
99 | BUFGCTRL | 0 | 0 | 32 | 0.00 |
100 | BUFIO | 0 | 0 | 16 | 0.00 |
101 | MMCME2_ADV | 0 | 0 | 4 | 0.00 |
102 | PLL2_ADV | 0 | 0 | 4 | 0.00 |
103 | BUFRCE | 0 | 0 | 8 | 0.00 |
104 | BUFHCE | 0 | 0 | 72 | 0.00 |
105 | BUFR | 0 | 0 | 16 | 0.00 |
106 +-----+-----+-----+-----+
107
108
109 6. Specific Feature
110 -----
111
112 +-----+-----+-----+-----+
113 | Site Type | Used | Fixed | Available | Util% |
114 +-----+-----+-----+-----+
115 | BSCANE2 | 0 | 0 | 4 | 0.00 |
116 | CAPTUREE2 | 0 | 0 | 1 | 0.00 |
117 | DNA_PORT | 0 | 0 | 1 | 0.00 |
118 | EFUSE_USR | 0 | 0 | 1 | 0.00 |
119 | FRAME_ECC2 | 0 | 0 | 1 | 0.00 |
120 | ICAP2 | 0 | 0 | 2 | 0.00 |
121 | STARTUPE2 | 0 | 0 | 1 | 0.00 |
122 | XADC | 0 | 0 | 1 | 0.00 |
123 +-----+-----+-----+-----+

```

```

126 7. Primitives
127 -----
128
129 +-----+
130 | Ref Name | Used | Functional Category |
131 +-----+
132 | RAMD64E | 1152 | Distributed Memory |
133 | LUT6    | 489  |          LUT |
134 | FDRE   | 230  |      Flop & Latch |
135 | LUT2    | 144  |          LUT |
136 | LUT4    | 80   |          LUT |
137 | LUT3    | 50   |          LUT |
138 | CARRY4 | 42   |      CarryLogic |
139 | LUT5    | 38   |          LUT |
140 | LUT1    | 35   |          LUT |
141 | MUXF7  | 32   |      MuxFx |
142 | IBUF   | 9    |          IO |
143 | OBUF   | 2    |          IO |
144 | DSP48E1 | 2    | Block Arithmetic |
145 | SRL16E | 1    | Distributed Memory |
146 +-----+
147
148
149 8. Black Boxes
150 -----
151
152 +-----+
153 | Ref Name | Used |
154 +-----+
155 | outputBuffer | 1 |
156 +-----+
157
158
159 9. Instantiated Netlists
160 -----
161
162 +-----+
163 | Ref Name | Used |
164 +-----+
165
166

```

## 6. Simulating the Design

Behavioral simulation employs a high level of abstraction to model the design. The testbench reads the data from the bitmap image and stores the processed pixels back in a bitmap image.

### a. Testbench

```
/*
22
23 `define headerSize 1080//image header
24 `define imageSize 512*512
25
26 module tb(
27 );
28
29
30 reg clk;
31 reg reset;
32 reg [7:0] imgData;
33 integer file,file1,i,r;
34 reg imgDataValid;
35 integer sentSize;
36 wire intr;
37 wire [7:0] outData;
38 wire outDataValid;
39 integer receivedData=0;
40
41 initial
42 begin
43   clk = 1'b0;
44   forever
45   begin
46     #5 clk = ~clk; //after every 5ns
47   end
48 end
49
50 initial
51 begin
52   reset = 0; //reset is active low
53   sentSize = 0;
54   imgDataValid = 0;
55   #100;
56   reset = 1; //reset is disabled after 100ns
57   #100;
58   file = $fopen("C:/Users/hp/Desktop/clown.bmp","rb"); //input image is open
59   //file1 = $fopen("C:/Users/hp/Desktop/blurred_lena.bmp","rb");
60   //file1 = $fopen("C:/Users/hp/Desktop/brightness_clown.bmp","rb");
61   file1 = $fopen("C:/Users/hp/Desktop/inverted_clown.bmp","wb"); //output image is open
62
63   for(i=0;i<`headerSize;i=i+1) //the header part of image is not sent for processing
64   begin
65     r = $fscanf(file,"%c",imgData); //1 byte is read from the input image
66     //$display(file);
67     $fwrite(file1,"%c",imgData); //1 byte is written into output image
68     //$display(file);
69   end
70
71   for(i=0;i<4*512;i=i+1) //only the pixels of input image are sent for processing
72   begin
73     @(posedge clk);
74     r = $fscanf(file,"%c",imgData);
75     imgDataValid <= 1'b1;
76   end
77   sentSize = 4*512; //4 line buffers are used
78   @(posedge clk);
79   imgDataValid <= 1'b0;
80   while(sentSize < `imageSize)
```

```

81 begin
82     @(posedge intr); //after sending 4 pixel rows, we wait for interrupt signal to indicate any of the line buffer is free
83     for(i=0;i<512;i=i+1)
84     begin
85         @(posedge clk);
86         r = $fscanf(file,"%c",imgData);
87         imgDataValid <= 1'b1;
88     end
89     @(posedge clk);
90     imgDataValid <= 1'b0; //after sending 1 pixel row to 1 line buffer
91     sentSize = sentSize+512;
92 end
93 @(posedge clk); //two dummy lines are sent in the end for neighborhood processing
94 imgDataValid <= 1'b0;
95 @(posedge intr);
96 for(i=0;i<512;i=i+1)
97 begin
98     @(posedge clk);
99     imgData <= 0; //first dummy line data
100    imgDataValid <= 1'b1;
101 end
102 @(posedge clk);
103 imgDataValid <= 1'b0;
104 @(posedge intr);
105 for(i=0;i<512;i=i+1)
106 begin
107     @(posedge clk);
108     imgData <= 0; //second dummy line data
109    imgDataValid <= 1'b1;
110 end
111 @(posedge clk);
112 imgDataValid <= 1'b0;
113 $fclose(file); //input file closed
114 end
115
116 always @(posedge clk) //this block determines the pixels sent (equal to image size)
117 begin
118     if(outDataValid)
119     begin
120         $fwrite(file1,"%c",outData);
121         receivedData = receivedData+1;
122     end
123     if(receivedData == `imageSize) //data is finished processing and is saved in output image
124     begin
125         $fclose(file1); //output file closed
126         $stop;
127     end
128 end
129
130 imageProcessTop dut(
131     .axi_clk(clk),
132     .axi_reset_n(reset),
133     //slave interface
134     .i_data_valid(imgDataValid),
135     .i_data(imgData),
136     .o_data_ready(),
137     //master interface
138     .o_data_valid(outDataValid),
139     .o_data(outData),
140     .i_data_ready(1'b1), //IP is always ready to accept data
141     //interrupt
142     .o_intr(intr)
143 );
144
145 endmodule
...

```

The image header in bmp image contains general data regarding the image like information about the type, size, and layout of a device-independent bitmap file. It is not altered and is copied to the output image as it is.

## b. Waveform

After running the behavioral simulation, press F3 or Run All option which runs the simulation until there are no more \$stop or \$finish command in the code. The simulation waveform is too large to attach here. So it is shown at a particular instant only.



Active low reset is used. The files are not open for reading and writing the pixel data until the reset goes high and positive edge of the clock arrives. As the pixels are incremented (integer i) the imgDataValid signal goes high which indicates that valid data is being read from the input file. The image header is at is copied to the output file so the outData is 00 because this pixel data is not sent to image processing IP. After that the pixel data is processed. Pixels are sent from the input file into the line buffers. From there they are sent to the image processing IP and the processed pixel data is stored in the output buffer which minimizes any sort of delays in receiving the pixel data. The interrupt signal goes high when data from one line buffer is read from the image processing IP and is processed. The interrupt signal goes high which signals that there is no valid data to be read now and the line buffer is free. So the next pixel row should be streamed to it.

## 7. Output Images



a. Input Image (Clown.bmp)



b. Box Blur (blurred\_clown.bmp)



c. Edge Detection  
(edgeDetection\_clown.bmp)



d. Sharpening (sharpen\_clown.bmp)



e. High Pass Filter (brightness\_  
clown.bmp)



f. Low Pass Filtering (lowpass\_  
clown.bmp)

## 8. References

- [https://www.youtube.com/watch?v=TcjqZG2pbHw&feature=youtu.be&ab\\_channel=VipinKizheppatt](https://www.youtube.com/watch?v=TcjqZG2pbHw&feature=youtu.be&ab_channel=VipinKizheppatt)
- [https://en.wikipedia.org/wiki/Sobel\\_operator](https://en.wikipedia.org/wiki/Sobel_operator)
- [https://en.wikipedia.org/wiki/Kernel\\_\(image\\_processing\)](https://en.wikipedia.org/wiki/Kernel_(image_processing))
- [https://www.xilinx.com/support/documentation/ip\\_documentation/axi\\_dma/v7\\_1/pg021\\_axi\\_dma.pdf](https://www.xilinx.com/support/documentation/ip_documentation/axi_dma/v7_1/pg021_axi_dma.pdf)