

# **Description & Implementation of the ACMI Component**

A Hardware Accelerator for the Vision Transformer Algorithm using MinHash to Lineage  
assignment by ENICS Lab  
Final Project - Computer Engineering, BIU

NOAM DIAMANT & ITAY GOLDBERG  
Academic Supervisor : Prof. Leonid Yavits  
Instructor : Itay Merlin

May 22, 2025



# Contents

|          |                                              |           |
|----------|----------------------------------------------|-----------|
| <b>1</b> | <b>Theoretical Background</b>                | <b>4</b>  |
| 1.1      | Minhash . . . . .                            | 4         |
| 1.2      | DNA Sequencing and Genome Analysis . . . . . | 5         |
| 1.3      | ViRAL Algorithm . . . . .                    | 5         |
| <b>2</b> | <b>Introduction</b>                          | <b>7</b>  |
| 2.1      | The purpose of this project . . . . .        | 7         |
| 2.2      | MinHash Accelerators . . . . .               | 7         |
| 2.3      | System Architecture . . . . .                | 9         |
| 2.3.1    | Kmer Buffer and Index Counter . . . . .      | 11        |
| 2.3.2    | Fragment Memory . . . . .                    | 11        |
| 2.3.3    | Sorter . . . . .                             | 11        |
| 2.3.4    | Extender . . . . .                           | 12        |
| 2.4      | Alternatives . . . . .                       | 12        |
| 2.4.1    | UShER . . . . .                              | 13        |
| 2.4.2    | Kraken2 . . . . .                            | 13        |
| 2.4.3    | Key Differences . . . . .                    | 13        |
| <b>3</b> | <b>ACMI's Components</b>                     | <b>14</b> |
| 3.1      | Fragment Memory . . . . .                    | 14        |
| 3.1.1    | Description . . . . .                        | 14        |
| 3.1.2    | Parameters . . . . .                         | 14        |
| 3.1.3    | Interface . . . . .                          | 14        |
| 3.1.4    | Block Diagram . . . . .                      | 15        |
| 3.1.5    | Implementation . . . . .                     | 16        |
| 3.1.6    | Codes & Tests . . . . .                      | 17        |
| 3.2      | Hasher . . . . .                             | 21        |
| 3.2.1    | Description . . . . .                        | 21        |
| 3.2.2    | Parameters . . . . .                         | 21        |
| 3.2.3    | Interface . . . . .                          | 21        |
| 3.2.4    | Block Diagram . . . . .                      | 22        |
| 3.2.5    | Implementation . . . . .                     | 22        |
| 3.2.6    | Codes & Tests . . . . .                      | 23        |
| 3.3      | Sorter . . . . .                             | 25        |
| 3.3.1    | Description . . . . .                        | 25        |
| 3.3.2    | Parameters . . . . .                         | 25        |
| 3.3.3    | Interface . . . . .                          | 25        |
| 3.3.4    | Block Diagram . . . . .                      | 26        |
| 3.3.5    | Implementation . . . . .                     | 26        |

|          |                                      |           |
|----------|--------------------------------------|-----------|
| 3.3.6    | Codes & Tests . . . . .              | 27        |
| 3.4      | Extender . . . . .                   | 31        |
| 3.4.1    | Description . . . . .                | 31        |
| 3.4.2    | Parameters . . . . .                 | 31        |
| 3.4.3    | Interface . . . . .                  | 31        |
| 3.4.4    | Block Diagram . . . . .              | 32        |
| 3.4.5    | Implementation . . . . .             | 32        |
| 3.4.6    | Codes & Tests . . . . .              | 32        |
| 3.5      | Index Counter . . . . .              | 36        |
| 3.5.1    | Description . . . . .                | 36        |
| 3.5.2    | Parameters . . . . .                 | 36        |
| 3.5.3    | Interface . . . . .                  | 36        |
| 3.5.4    | Block Diagram . . . . .              | 37        |
| 3.5.5    | Implementation . . . . .             | 37        |
| 3.5.6    | Codes & Tests . . . . .              | 38        |
| 3.6      | Kmer Buffer . . . . .                | 40        |
| 3.6.1    | Description . . . . .                | 40        |
| 3.6.2    | Parameters . . . . .                 | 40        |
| 3.6.3    | Interface . . . . .                  | 40        |
| 3.6.4    | Block Diagram . . . . .              | 41        |
| 3.6.5    | Implementation . . . . .             | 41        |
| 3.6.6    | Codes & Tests . . . . .              | 42        |
| <b>4</b> | <b>Integration</b>                   | <b>45</b> |
| 4.1      | Overview of Integration . . . . .    | 45        |
| 4.2      | Integration Challenges . . . . .     | 45        |
| 4.3      | codes & test . . . . .               | 46        |
| <b>5</b> | <b>Results</b>                       | <b>49</b> |
| 5.1      | Synthesis Process Overview . . . . . | 49        |
| 5.2      | Tools and Environment . . . . .      | 49        |
| 5.3      | Synthesis Constraints . . . . .      | 49        |
| 5.4      | Area Analysis . . . . .              | 49        |
| 5.4.1    | Library Breakdown . . . . .          | 50        |
| 5.4.2    | Cell Type Breakdown . . . . .        | 50        |
| 5.5      | Timing Analysis . . . . .            | 50        |
| 5.5.1    | Critical Path Analysis . . . . .     | 51        |
| 5.5.2    | Cell Types and Delays . . . . .      | 52        |
| 5.5.3    | Fanout Analysis . . . . .            | 53        |
| 5.5.4    | Conclusion . . . . .                 | 53        |
| 5.6      | Power Estimation . . . . .           | 53        |
| 5.6.1    | Component Power Breakdown . . . . .  | 53        |
| 5.6.2    | Power Distribution . . . . .         | 54        |
| 5.7      | Optimization Results . . . . .       | 54        |
| <b>6</b> | <b>Summary &amp; Discussion</b>      | <b>55</b> |
| 6.1      | Discussion of Results . . . . .      | 55        |
| 6.2      | Conclusion . . . . .                 | 56        |
| <b>7</b> | <b>Appendices</b>                    | <b>57</b> |

# Chapter 1

## Theoretical Background

### 1.1 Minhash

MinHash is a variant of locality-sensitive hashing designed to approximate the similarity between two datasets. The fundamental idea behind MinHash is that the probability that two sets have the same minimum hash value is equivalent to their Jaccard similarity. Jaccard similarity measures the proportion of shared elements between two sets equal to:

$$J(A, B) = \frac{|A \cap B|}{|A \cup B|} \quad (1.1)$$

We use the Bottom-k Minhash whose overview is presented in Fig. 1.1 In the genomic context, the datasets comprise sequences of kmers. The signatures  $A$  and  $B$  are generated by a single hash function,  $h$ , which ranks the kmers by assigning a numerical value to each kmer. The  $k$  minimal ranking values comprise the signature subsets  $S(A)$  and  $S(B)$ . The equation in Fig. 1.1 approximates Jaccard similarity between the two original kmer sequences.



Figure 1.1: Minhash Bottom-k overview

We leverage Bottom-k to convert low-coverage genomes into sequences of fixed-size fragments. Genomic samples that come from similar genomes are likely to have overlapping hash values, so that Bottom-k is instrumental in capturing their similarity. Traditional use of minHash with bottom-k allows sampling of a small amount of information from the entire set in a way that will enable direct calculation of the Jaccard similarity. When we deal with biological sequences we are exposed to a particularly large amount of information, therefore a direct calculation of Jaccard similarity will be inefficient in terms of

memory and the duration of the operation, hence we will use this project in a more advanced solution using a neural network.

## 1.2 DNA Sequencing and Genome Analysis

DNA is composed of four nucleotides: Adenine (A), Guanine (G), Cytosine (C), and Thymine (T), which are frequently referred to as DNA base pairs, bases, or bps. Accordingly, a DNA data element is a DNA base that can have one of four values (A, G, C, and T). DNA sequencing is the process of determining the bases of a DNA chain in a given biological sample. Contemporary high-throughput DNA sequencers can sequence multiple biological samples in parallel. A DNA sequencing process, along with genome analysis, is carried out in several steps: (1) sample preparation; (2) DNA sequencing that generates multiple DNA reads (i.e., sub-sequences of the DNA sample); and (3) DNA classification, DNA read alignment, genome assembly, variant analysis, etc. While variant refers to a single change in a genome (i.e., a single base change mutation), lineage is a collection of variants that help define a specific line of an organism.

## 1.3 ViRAL Algorithm

ViRAL is the platform for a quick and accurate identification, classification, and lineage assignment of SARS-CoV-2 variants. It accepts an assembled genome and outputs the ordered list of most probable lineages such genome belongs to. If the top probabilities output by ViRAL are similar, this might indicate that the new genome belongs to an unknown lineage, creating a new node on the phylogenetic tree.

The top-level overview of ViRAL algorithm is presented in Figure 1.2. The input to ViRAL is an assembled genome in FASTA file format. To convert the genome into a sequence of feature vectors, we build a pipeline of preprocessing steps, comprising the feature extractor and the embedding layer. The feature extractor chooses a set of fragments (i.e., genome subsequences of fixed length), represents each fragment as a two-dimensional matrix, and outputs those genome fragment matrices (GFMs). Each GFM is fed into an embedding layer that consists of one neuron. This layer converts the GFM into a genome fragment vector, designated the feature vector. The feature vectors are fed into the ViT, which outputs the most likely assignment candidates (by attaching probabilities to each lineage).



Figure 1.2: A general overview of the ViRAL algorithm.

The feature extractor and the embedding layer transform a genome into a sequence of representative feature vectors, which are the numerical representations of genome fragments. We implement the fea-

ture extractor using MinHash to preserve similarities in the genome. It allows the feature extractor to find features in the query genome that are shared by the known lineages. The embedding layer is used as a dimensionality reduction step. It allows for the reduction of ViRAL latency without affecting its assignment accuracy.

# **Chapter 2**

## **Introduction**

In this chapter, an extensive literature review was conducted in order to cover all possible subjects that affect the goals of this project. Understanding how the different components and architectures that are in use today and the modern solutions to the problem this work is trying to solve are important to better define the objectives of this research.

### **2.1 The purpose of this project**

The article introduces a neural network-based algorithm designed to enhance the accuracy of lineage assignment. It suggests that creating dedicated hardware can further accelerate the calculations. While software-only (CPU-based) and GPU-accelerated solutions perform relatively well, they still face limitations. The preprocessing stage is responsible for 95% of ViRAL's execution time, and it is a prime target for acceleration due to conventional computers' lack of optimization for Locality-Sensitive Hashing operations. Within MinHash, components like signature sorting exhibit latency that scales as  $n \log(n)$ , where  $n$  represents genome size, further compounding the issue as genomes grow larger. Consequently, to effectively scale and accelerate ViRAL, focusing on MinHash optimization is crucial. In this project, we implemented the MinHash accelerator component responsible for data preparation before activating the neural network using the SystemVerilog language. This preprocessing component ensures that the data is correctly formatted and structured for the neural network, ultimately contributing to the system's overall performance improvements.

In the following pages, we will present key parts of the article, the application of the various sub-components, and the results of the synthesis of the component as a whole. All of these are a confirmation of the functionality of the hardware accelerator proposed in the article.

### **2.2 MinHash Accelerators**

Several MinHash acceleration solutions have been proposed. MetaCache GPU uses MinHash for metagenomic classification and implements it on GPU. MSIM is a near-memory MinHash accelerator with a limited energy consumption reduction (26.4x vs. high-performance GPU), targeted for high-performance applications. In MinHash is used for kmer clustering. An FPGA-based solution is limited to parallel calculation of 15 hash functions (whereas ViRAL calculates in parallel at least 256 hash functions). JACK-FPGA uses a cloud FPGA to accelerate MinHash. Scotch is an FPGA-based locality-sensitive hashing accelerator with a limited energy consumption gain (5x over high-performance GPU).

The purpose of MinHash acceleration in ViRAL framework is to balance the MinHash and Vision Transformer calculations while providing the highest energy efficiency, preferably sufficient for portable applications.

## The MinHash Scheme

A modified MinHash is the second processing component of ViRAL, whose role is extracting the informative feature vectors from the genome. MinHash, or the min-wise independent permutations scheme is a technique for similarity estimation. MinHash is used in many tasks in computational biology, including genome assembly, metagenomic gene clustering, and genomic distance estimation. MinHash implements the following sketch function:

Given a set of characters  $A$ , a compression factor  $n$ , and a hash function  $h$ , the elements of the set  $A$  are hashed using function  $h$  to generate the set  $H(A)$ . Then the elements of  $H(A)$  are sorted and the inputs to the smallest  $n$  elements are returned as described in Algorithm 1.

---

**Algorithm 1:** The Sketch Algorithm  $\text{sketch}(A)$ 


---

**Require:** set  $A = \{a_1, \dots, a_{|A|}\}$ , compression parameter  $n$  and a hash function  $h$   
 $\text{sketch} \leftarrow \emptyset$   
 $H(A) \leftarrow (h(a_1), \dots, h(a_{|A|}))$   
Sort  $H(A)$  to get  $(h(a_{i1}), h(a_{i2}), \dots, h(a_{i|A|}))$   
**return**  $(a_{i1}, a_{i2}, \dots, a_{in})$

---

The subset obtained by applying the sketch algorithm provides a good estimate for the Jaccard index, defined as follows: Given two sets  $A, B$ , the Jaccard index is

$$J(A, B) = \frac{|A \cap B|}{|A \cup B|}$$

Formally, given two sets,  $A$  and  $B$ ,

$$\frac{|\text{sketch}(A) \cap \text{sketch}(B) \cap \text{sketch}(A \cup B)|}{|\text{sketch}(A \cup B)|} \approx J(A, B)$$

We use the sketch algorithm to extract feature vectors from a genome, as presented further.

## Feature Extractor

The first part of the ViRAL pipeline receives the assembled genome and outputs matrices that represent genome fragments. Feature extractor employs MinHash to find similar fragments (i.e., features) in different genomes.

---

**Algorithm 2:** The Feature Extractor

---

**Require:** genome  $g$ , compression parameter  $n$ , fragment size  $f$ , kmer size  $k$ , hash function  $h$   
 $G \leftarrow \{gi \equiv g[i : i + k]\} \text{ s.t. } i \in [0, n - k + 1]$   
 $\text{kmers} \leftarrow \text{sketch}(G, n, h)$   
 $\text{left} \leftarrow \text{floor}\left(\frac{f-k}{2}\right)$   
 $\text{right} \leftarrow \text{ceil}\left(\frac{f-k}{2}\right)$   
 $\text{fragments} \leftarrow \{g[i - \text{left}, i + \text{right}] : gi \in \text{kmers}\}$   
**return** one-hot-encode ( $\text{fragments}$ )

---

One hot encoding of a fragment refers to the operation of transforming each base in the fragment as follows:  $A \rightarrow [1, 0, 0, 0]; C \rightarrow [0, 1, 0, 0]; G \rightarrow [0, 0, 1, 0]; T \rightarrow [0, 0, 0, 1]; N \rightarrow [0, 0, 0, 0]$ .

For the example please refer to Figure 2.1(b).

An example is illustrated in Figure 2.1(a). Input is a genome sequence  $g$  of length  $N = 19$ . We generate all possible kmers (genome sub-sequences of length  $k$ )  $G$  (where  $k = 4$ ), and extract ( $n = 3$ )

4mers from the genome using the MinHash scheme (i.e.,  $\text{sketch}(G, n = 3, h)$ ). The next step is to generate fragments of length  $f = 8$  (by extending each of three 4mers by  $\frac{f-k}{2} = \frac{8-4}{2} = 2$  bases in each direction). The last step of this workflow is to encode each basepair of the fragments using one-hot encoding.

For the genome sequence  $g$  of length  $N$ , the extractor first generates a set  $G$  of all possible kmers. It then sketches (i.e., applies MinHash sketch function to) the set of kmers,  $G$ , to extract  $n$  representative kmers of the genome (Algorithm 1). Those kmers are used as anchors to be expanded to generate fragments. The last part is transforming a fragment into a numerical matrix where the genome basepairs  $A, G, C$ , and  $T$  are encoded using one-hot encoding as described in Algorithm 2.

## Embedding Layer

The input of the embedding layer is a genome fragment matrix of dimensions  $f \times 4$  (i.e.,  $f$  represents the fragment length and 4 is the dimensionality of the basepair one-hot encoding). It outputs a feature vector of size  $f$ . The embedding layer consists of one neuron which performs a base-wise linear transformation. Specifically, the embedding layer defines 5 learnable parameters, 4 weights  $w_A, w_C, w_G, w_T$ , and one bias term  $b_N$ . Given a genome fragment matrix, we transform each base (represented in one-hot encoding) to a numerical token as presented in Figure 2.1(b), which also contains an example in which we embed a fragmented matrix of dimensions  $8 \times 4$ .



Figure 2.1: (a): Feature extractor workflow example. (b): Embedding a fragment matrix of dimensions  $8 \times 4$ .

## 2.3 System Architecture

Figure 2.2 illustrates the system view of the ViRAL platform. DNA samples are prepared and sequenced. The assembled genome material is fed into ViRAL, whose main components are MinHash and ViT accelerators, implemented by ACMI, a standalone ACcelerated MInHash ASIC, and Nvidia GPU, respectively.



Figure 2.2: System view of the accelerated ViRAL platform featuring MinHash accelerator ACMI, combined with NVidia GPU.

ACMI is a throughput-oriented streaming platform. ACMI (1) inputs a genome  $i + 1$ , (2) processes both the previously received genome  $i$  and the genome  $i + 1$ , and (3) outputs the genome  $i$  related results, in parallel. ACMI inputs the accession genome one DNA base per cycle (on average). In the case of SARS-CoV-2 virus whose size is approximately 30K DNA bases, it takes approximately 30K cycles to upload a single genome. The result of ACMI is a Genome Fragment Matrix (GFM). Both the compression factor and the fragment\_size are 256, hence the GFM is a  $256 \times 256$  matrix of 2-bits elements (each element is a base). To balance the pipeline, ACMI is designed to output 4bits/cycle, thus taking approximately  $256 \times 256 \times 2/4 = 32K$  cycles to output the GFMs.

Since ACMI operating at 113 MHz requires I/O throughput of less than 1.13 GB/s to achieve its optimal performance, data connectivity can be implemented by PCIe 3.0 or higher.

The ACMI architecture is presented in Figure 2.3. ACMI comprises the Fragment Memory (FM), the kmer buffer, the kmer index counter, Hasher, Sorter, and Extender units. Hasher performs hashing of the input kmer sets using MurmurHash3, which is a non-cryptographic hash function suitable for general hash-based lookup. Sorter sorts the hashed kmers (kmer signatures) and outputs 256 smallest signatures. Extender extends the signatures into larger DNA fragments and generates the GFMs.

While Hasher and Sorter process kmers in a pipelined fashion, "on the fly", the Extender requires access to the entire genome. Therefore, the Fragment Memory (FM) unit is organized as a double buffer, comprising buffers FM1 and FM2, operating in two phases: in phase 1, FM1 receives genome  $i$ , which is subsequently hashed and sorted. At the end of the phase, FM1 and FM2 are logically swapped. In phase 2, FM2 receives genome  $i + 1$ , which is being processed by the Hasher and Sorter. At the same time, FM1 serves the Extender, which generates the GFMs of the genome  $i$ . At the end of the phase, the buffers are swapped again, and so on. ACMI pipeline timing is shown in Figure 2.4.



Figure 2.3: The top-level architecture.



Figure 2.4: ACMI pipeline timing.

### 2.3.1 Kmer Buffer and Index Counter

The kmer buffer converts the DNA base stream received by ACMI into the stream of kmers (16mers). The index counter keeps track of the kmer index (the position of the kmer on the genome sequence).

### 2.3.2 Fragment Memory

The Fragment Memory buffers enable parallel write and read. The write operation occurs at a single-base granularity (each base is 2 bits). The read occurs in chunks of 256 bases. As depicted in Figure 2.5, each FM buffer comprises eight  $128 \times 32$  bases SRAM modules and can store up to 8KB of genomic information. FM address is composed of the row pointer (7 MS-bits), the SRAM module pointer (the following 3 bits), and the byte offset (the byte address within the row, 5 LS-bits). Since the read data granularity is 256 bases, only the 7 MS-bits of the address are used in read access.



Figure 2.5: A fragment memory buffer comprising 8  $128 \times 32$  bases SRAM modules.

### 2.3.3 Sorter

The Sorter maintains a list of compression factor = 256 lexicographically smallest kmer signatures. It receives the signatures from the Hasher, along with their indices. After all kmers of the genome are hashed and sorted, Sorter transfers the entire list of 256 indices of the smallest signatures to the Extender (in parallel). At this point, the FM buffers are swapped, and the Extender begins processing the index

data of the recently hashed and sorted genome, while a new genome is being written to the other FM buffer, and hashed and sorted by ACMI in a pipeline fashion.

The hashed kmers (signatures) are transferred sequentially from the Hasher to the Sorter. In each cycle, each comparator in the chain receives from its left neighbor a tuple comprising a kmer signature and its index, and compares such signature with the minimal signature value the comparator stores. If the new signature is smaller (lexicographically) than the stored one, the comparator saves the new tuple and outputs the old tuple, otherwise, it outputs the new tuple. After all, signatures are processed, the comparator chain retains the 256 smallest signatures and their indices (which are afterwards transferred to the Extender).

### 2.3.4 Extender

Extender converts each of the 256 kmers with the smallest signatures into a 256-base wide DNA fragment, by extending such kmer left and right. To accomplish that, Extender reads the DNA fragments directly from one of the FM buffers, using the index it receives from the Sorter to calculate the FM row pointer.

1. Calculate the fragment position on the genome sequence  $\text{frag\_idx} = \left\lfloor \frac{(kmer\_idx - \lfloor \frac{\text{frag\_len} - \text{kmer\_len}}{2} \rfloor)}{\text{frag\_len}} \right\rfloor$ .
2. Read the DNA fragment pointed to by  $\text{frag\_idx}$ .
3. Read the DNA fragment from the next FM buffer row, addressed by  $\text{frag\_idx} + 1$ .
4. Render the extended DNA fragment by concatenating the relevant parts of those two consecutive DNA fragments.

The 256 extended DNA fragments comprise the GFMs that become the output of ACMI.

An example of extension is shown in Figure 2.6. Suppose that the kmer length is 4 (a 4-mer), the fragment length is 8, and the size of the genome is 32. The Extender has to extend the 4-mer with index 9 (i.e., the 4-mer TAAG marked in red in Figure 2.6). In this case,

$$\text{frag\_idx} = \left\lfloor \frac{(kmer\_idx - \lfloor \frac{\text{frag\_len} - \text{kmer\_len}}{2} \rfloor)}{\text{frag\_len}} \right\rfloor = \left\lfloor \frac{(9 - \frac{8-4}{2})}{8} \right\rfloor = \left\lfloor \frac{7}{8} \right\rfloor = 0.$$

Hence the Extender will read from FM buffer fragments 0 and 1. It will then concatenate the relevant bases using shift operations, render a single 8-base wide DNA fragment, encode the bases using one-hot encoding, and append the encoded extended DNA fragment to the GFMs.



Figure 2.6: Example of extension. The fragment length is 8, the kmer size is 4. Each line represents an aligned fragment of the genome. The bases in pink are the bases of the 4mer at position 9. The extended fragment includes two bases (marked in green) to the left and to the right of the original 4mer.

## 2.4 Alternatives

The article presents two alternatives that deal with the same problem. We will conduct a brief review and discuss the differences.

### **2.4.1 UShER**

UShER (Ultrafast Sample placement on Existing tRees) is a state-of-the-art tool for SARS-CoV-2 lineage assignment. UShER uses a phylogenetic tree-based approach:

1. It maintains a phylogenetic tree of SARS-CoV-2 genomes.
2. When classifying a new genome, it identifies the optimal placement of the sample on the existing tree.
3. It uses an efficient data structure and algorithm to quickly traverse the tree and find the best placement.

According to the article, UShER performs well at higher coverage levels, outperforming Kraken2 for coverage levels of 4 $\times$  and above. However, its performance degrades significantly at lower coverage levels. For example, when applied to genomes assembled using Illumina HiSeq 2500 single-end reads, UShER achieves 87.6% accuracy at 8 $\times$  coverage, but its accuracy drops dramatically at lower coverages. The study uses UShER as a "golden reference" for evaluating ViTAL's performance on novel lineages, indicating its reliability for phylogenetic placement of SARS-CoV-2 genomes.

### **2.4.2 Kraken2**

Kraken2 is another tool used for genome classification and lineage assignment. The article shows that Kraken2 performs better than UShER at very low coverage levels (below 4 $\times$ ), but still falls short of ViTAL's performance. For instance, when classifying genomes assembled from Illumina HiSeq 2500 single-end reads at 1 $\times$  coverage, Kraken2 achieves an accuracy of 27.4%, which is significantly higher than UShER's 5.4% but much lower than ViTAL's 87.7%. The study also mentions that Kraken2, like other k-mer matching tools, faces limitations in classifying erroneous DNA reads and is not very efficient in SARS-CoV-2 lineage placement due to the limited differences between genomes of separate lineages.

Kraken2 uses a k-mer matching approach:

1. It builds a database of k-mers (short DNA sequences) associated with different taxonomic groups.
2. When classifying a genome, it breaks it into k-mers and looks up each k-mer in the database.
3. It uses a lowest common ancestor (LCA) algorithm to assign taxonomic labels based on the k-mer matches.

This method is fast and efficient for general classification tasks but can struggle with closely related genomes or erroneous reads.

### **2.4.3 Key Differences**

The key difference is that ViTAL's deep learning approach allows it to learn complex patterns from the data, making it more robust to low coverage and sequencing errors. UShER's tree-based method is well-suited for detailed phylogenetic analysis but may require higher quality data. Kraken2's k-mer matching is fast but less flexible in handling errors or closely related genomes. These differences explain why ViTAL outperforms the others, especially in low-coverage scenarios.

# Chapter 3

## ACMI's Components

### 3.1 Fragment Memory

#### 3.1.1 Description

The FM module is structured as a dual buffer system with multiple RAM blocks. Its structure is defined by key parameters from `proj_pkg`. The module is clocked, handling write and read operations for storing and retrieving data fragments, specifically for handling genetic data with a base length of 2 bits, representing the four DNA nucleotides (A, T, G, C).

#### 3.1.2 Parameters

| Inner Parameter   | Package Parameter         | Value | Description                    |
|-------------------|---------------------------|-------|--------------------------------|
| BUFFER_COUNT      | FM_BUFFER_COUNT           | 2     | Number of buffers              |
| RAMS              | FM_RAMs_COUNT             | 8     | Number of RAMs per buffer      |
| ENTRIES           | FM_ENTRIES_COUNT          | 128   | Entries per RAM                |
| OFFSET            | FM_OFFSET_COUNT           | 32    | Size of offset in each entry   |
| DATA_BITS         | FM_DATA_BITS              | 2     | Width of each memory cell      |
| BUFFER_SIZE       | FM_BUFFER_SIZE            | 32768 | Total buffer size              |
| INDICE_LEN        | INDICE_LEN                | 15    | Length of an index             |
| SIGNED_INDICE_LEN | SIGNED_INDICE_LEN         | 16    | Length of a signed index       |
| FRAG_LEN          | FM_EXTENDER_FRAG_LEN_BITS | 512   | Length of the fragment in bits |

#### 3.1.3 Interface

| Signal    | Direction | Width             |
|-----------|-----------|-------------------|
| clk       | Input     | 1                 |
| rst_n     | Input     | 1                 |
| in_wdata  | Input     | DATA_BITS         |
| chg_idx   | Input     | 1                 |
| frag_idx  | Input     | SIGNED_INDICE_LEN |
| out_rdata | Output    | FRAG_LEN          |
| out_wait  | Output    | 1                 |

#### Ports Description

##### Input Ports:

- `clk`: Clock signal

- `rst_n`: Active-low reset signal
- `in_wdata`: Input data to be written
- `chg_idx`: Signal to change active buffer index
- `frag_idx`: Fragment index for reading

#### Output Ports:

- `out_rdata`: Output data read from buffer
- `out_wait`: Wait signal for the module

#### 3.1.4 Block Diagram



### **3.1.5 Implementation**

The `proj_fm` module is designed to manage a dual-buffer memory system with each buffer consisting of multiple RAMs. The implementation follows a clocked, sequential logic that handles read and write operations on fragments of data. The primary goal is to enable efficient storage and retrieval, particularly for scenarios requiring fast memory switching and processing of indexed data, such as genetic or digital signal processing tasks.

#### **1. Buffer Management:**

The module supports two buffers (FM0 and FM1), where data can be written to one buffer while the other buffer is being read. The buffer index can be swapped using the `chg_idx` signal, which switches between read and write operations across the buffers. This enables non-blocking read and write access to the data, essential in high-speed memory applications.

#### **2. Address Control Logic:**

The address control logic computes the write address (`waddr`) and read address (`raddr`) dynamically. When the `chg_idx` signal is triggered, the buffer index changes, and the module uses a combination of parameters such as RAMS, ENTRIES, and OFFSET to calculate the memory location where data will be written or read. The calculation ensures that the write and read operations do not overlap unless intended, providing robustness in memory management.

The address logic also handles the buffer size, incrementing or resetting the address counters based on the current buffer position and buffer limits. This is critical in maintaining data integrity and ensuring that data is correctly mapped to its intended location within the buffer.

#### **3. Fragment Preparation Logic:**

The fragment preparation logic is responsible for constructing the output data fragment by reading from the appropriate buffer, which can consist of up to 8KB of data at a time. The module uses a combination of the index (`frag_idx`) and zero-padding to manage memory fragments efficiently. The logic handles both signed and unsigned indices, ensuring that negative index values are correctly processed via two's complement conversion (`flip_frag_idx`).

#### **4. Write and Read Operations:**

Write operations are governed by the `we` (write enable) signal, which is only asserted when the `hold_cond` signal is not active. The `hold_cond` ensures that writing does not occur when the buffer is in a locked or wait state (`out_wait`). This mechanism prevents data overwriting or corruption when switching between the two buffers.

Similarly, read operations occur concurrently with write operations, with data being read from the fragment buffer that is not currently being written to. The output data (`out_rdata`) is prepared based on the fragment and is padded with zeros if necessary to maintain consistency across varying fragment lengths.

In this implementation, the efficient use of dual buffers allows simultaneous read and write operations, reducing memory access latency and improving throughput in high-speed systems. The fragment preparation logic ensures that the system can handle varying fragment sizes and indexed data efficiently.

### 3.1.6 Codes & Tests

The example shown below is an example with reduced parameters (not the actual ACMI parameters), for demonstration purposes. At the time point described in the wave diagram, the buffer is already full, and we read from register #1. We begin reading with index 00, which yields the value D555. For index 10h, the value retrieved is FE8A. The next index obtained is 3Eh, this is a negative index and this value needs to be padded with two zeros. This means that instead of the value:

D555 = 1101 0101 0101 0101

we will get:

`5550 = 0101 0101 0101 0000`

This padding is necessary to align the data correctly and ensure proper fragment extraction.



Figure 3.1: fragment memory waveform for test bench file

The following is the code we wrote that describes the hardware component.

```
proj_fm.sv

1 `timescale 1ns / 1ps
2
3 module proj_fm #(
4     // Module parameters
5     parameter BUFFER_COUNT = proj_pkg::FM_BUFFER_COUNT,
6         ↪ // Number of buffers in the FM
7     parameter RAMS = proj_pkg::FM_RAMs_COUNT,
8         ↪ // Number of RAMs in each buffer
9     parameter ENTRIES = proj_pkg::FM_ENTRIES_COUNT,
10        ↪ // Number of entries in each RAM
11     parameter OFFSET = proj_pkg::FM_OFFSET_COUNT,
12         ↪ // Size of the offset in each entry
13     parameter DATA_BITS = proj_pkg::FM_DATA_BITS,           //
14         ↪ Width of each memory cell
15     parameter INDICE_LEN = proj_pkg::INDICE_LEN,           //
16         ↪ Length of the index
17     parameter SIGNED_INDICE_LEN = proj_pkg::SIGNED_INDICE_LEN,
18         ↪ // Length of signed index
19     parameter FRAG_LEN = proj_pkg::FM_EXTENDER_FRAG_LEN_BITS
20         ↪ // Length of the fragment - in bits!
21 ) (
22
23 )
```

```

14 // Module ports
15 input wire clk, // Clock signal
16 input wire rst_n, // Reset signal
17     ↪ (active low)
18 input wire [DATA_BITS-1:0] in_wdata, // Input data
19 input wire chg_idx, // Change
20     ↪ index signal
21 input logic [SIGNED_INDICE_LEN-1:0] frag_idx, // Fragment
22     ↪ index
23 output wire [FRAG_LEN-1:0] out_rdata, // Output
24     ↪ data
25 output wire out_wait // wait signal
26     ↪ for the module before the ACMI
27 );
28
29 // Local parameters
30 localparam FM_BUFFER_SIZE = 32; // RAMS * ENTRIES *
31     ↪ OFFSET = 2 * 8 * 2
32 localparam RAM_ADDR_BITS = 1; // $clog2(RAMS) =
33     ↪ $clog2(2)
34 localparam ENTRIES_ADDR_BITS = 3; // $clog2(ENTRIES) =
35     ↪ $clog2(8)
36 localparam OFFSET_ADDR_BITS = 1; // $clog2(OFFSET) =
37     ↪ $clog2(2)
38 localparam ADDR_BITS = 5; // RAM_ADDR_BITS +
39     ↪ ENTRIES_ADDR_BITS + OFFSET_ADDR_BITS
40
41 // Internal signals
42 logic clk, rst_n;
43 logic [ADDR_BITS-1:0] waddr, waddr_next;
44 logic [DATA_BITS-1:0] wdata;
45 logic
46     ↪ [BUFFER_COUNT-1:0] [FM_BUFFER_SIZE-1:0] [DATA_BITS-1:0]
47     ↪ FMbuffers;
48 logic end_addr, hold_cond, rst_addr;
49 logic wr_idx, rd_idx;
50 logic [FRAG_LEN-1:0] padded_fragment;
51 logic [INDICE_LEN-1:0] raddr, zeros_count;
52 logic [SIGNED_INDICE_LEN-1:0] flip_frag_idx;
53 logic we;
54
55 // Input assignments
56 assign wdata = hold_cond ? FMbuffers[wr_idx][waddr] :
57     ↪ in_wdata;
58
59 // Output assignment
60 assign out_rdata = padded_fragment;
61
62 assign out_wait = hold_cond;
63
64 // Address and control logic
65 assign waddr_next = end_addr ? 1'b0 : waddr + 1'b1;
66 assign end_addr = (waddr == (FM_BUFFER_SIZE-1)) ? 1'b1 :
67     ↪ 1'b0;

```

```

54     assign hold_cond = end_addr & ~chg_idx;
55     assign rst_addr = end_addr & chg_idx;
56     assign rd_idx = ~wr_idx;
57     assign we = ~hold_cond || end_addr;
58
59 // Fragment preparation logic
60 always_comb begin
61     padded_fragment = '0;
62     zeros_count = '0;
63     flip_frag_idx = '0;
64
65     if (frag_idx[SIGNED_INDICE_LEN-1] == 1'b1) begin
66         flip_frag_idx = (~frag_idx + 1'b1);
67         zeros_count = flip_frag_idx[INDICE_LEN-1:0];
68         raddr = '0;
69     end else begin
70         raddr = frag_idx[INDICE_LEN-1:0];
71     end
72 end
73
74 genvar i;
75 generate
76     for (i = 0; i < 16; i++) begin : gen_padded_fragment
77         // FRAG_LEN is 16
78         always_comb begin
79             if (i < (16 - zeros_count) && (raddr + i >= 0)
80                 && (raddr + i < 32)) begin // 
81                 // FM_BUFFER_SIZE is 32
82                 padded_fragment[i + (zeros_count << 1)] =
83                     FMbuffers[rd_idx][raddr + (i >>
84                         1)][i[0]];
85             end else begin
86                 padded_fragment[i + (zeros_count << 1)] =
87                     1'b0;
88             end
89         end
90     end
91 endgenerate
92
93 // Sequential logic
94 always_ff @(posedge clk or negedge rst_n) begin
95     if (~rst_n) begin
96         wr_idx <= '0;
97         waddr <= '0;
98     end else begin
99         // Change index logic
100        if (chg_idx) begin
101            wr_idx <= rd_idx;
102        end
103
104        // Write address logic
105        if (hold_cond) begin
106            waddr <= waddr;
107        end else if (rst_addr) begin

```

```
102         waddr <= '0;  
103     end else begin  
104         waddr <= waddr_next;  
105     end  
106  
107     // Write data to buffers  
108     if (we) begin  
109         FMbuffers[wr_idx][waddr] <= wdata;  
110     end  
111 end  
112  
113  
114 endmodule
```

## 3.2 Hasher

### 3.2.1 Description

Hasher is a hardware component that implements a single algorithm. According to the definitions of the article, the implementation hash function is Murmur3.

MurmurHash, a non-cryptographic hash function, is distinguished by its expeditious and efficacious computation of hash values, which are instrumental for general lookup operations. This algorithm is characterized by its operational simplicity and rapid data processing capabilities, rendering it particularly suitable for hash-based structures such as hash tables, caches, and checksums. The nomenclature 'MurmurHash' is derived from the algorithm's fundamental operations—'multiply' and 'rotate'—which are pivotal to its functionality. The most recent iteration, MurmurHash3, introduces enhancements that include the provision of both 32-bit and 128-bit hash values, along with optimizations for various computing platforms, thereby affirming its adaptability and extensive applicability in various domains.

### 3.2.2 Parameters

| Inner Parameter  | Package Parameter       | Value        | Description               |
|------------------|-------------------------|--------------|---------------------------|
| KMER_LEN         | KMER_LEN                | 16           | Length of k-mer in bases  |
| DATA_BITS        | BASE_LEN                | 2            | Bits to represent a base  |
| HASHER_DATA_BITS | HASHER_SORTER_SIGNATURE | 32           | Width of hasher signature |
| c1               | -                       | 32'hcc9e2d51 | Constant                  |
| c2               | -                       | 32'h1b873593 | Constant                  |
| m                | -                       | 32'd5        | Constant                  |
| n                | -                       | 32'he6546b64 | Constant                  |

### 3.2.3 Interface

| Interface   | Signal    | Direction | Width            |
|-------------|-----------|-----------|------------------|
| System      | clk       | Input     | 1                |
| System      | rst_n     | Input     | 1                |
| System      | seed      | Input     | HASHER_DATA_BITS |
| Kmer Buffer | kmer      | Input     | HASHER_DATA_BITS |
| Sorter      | signature | Output    | HASHER_DATA_BITS |

### Ports Description

#### Input Ports:

- clk: Clock signal (unused in this implementation)
- rst\_n: Active-low reset signal (unused in this implementation)
- seed: Initial hash value or key (32 bits)
- kmer: Input data to be hashed (32 bits)

#### Output Port:

- signature: Computed hash value (32 bits)

### 3.2.4 Block Diagram



### 3.2.5 Implementation

In each clock cycle, the hasher receives a single k-mer value represented by 4 bytes. Additionally, it maintains a fixed SEED value, also encoded as 4 bytes. Guided by the fixed constants, the hasher performs multiplications and rotations on the input k-mer. These operations transform the k-mer into a hash value. Finally, an approver issues a single signature based on the resulting hash value.

#### The structure of the component

The Hasher module is structured as a combinational logic block that performs the core operations of the MurmurHash3 algorithm: multiplication and rotation. It uses parameters from the `proj_pkg` package to define its input and output widths, making it flexible and configurable within the larger system.

#### Subcomponents

The main subcomponents are:

- Multiplication logic
- Rotation logic
- XOR gate
- Addition logic

#### Computational Logic

The Hasher module implements the MurmurHash3 algorithm through the following steps:

1. Kmer Processing:
  - Multiplies the for the input kmer with c1
  - Rotates the the former result by 15 bits
  - Multiplies the for the input kmer with c1
2. Seed Processing:
  - XORs the former result with the seed
3. Hash Computation:
  - Rotates the the former result by 15 bits
  - Multiplies the for the input kmer with m

- Adds n to the former result

The module uses predefined constants ( $c_1, c_2, m, n$ ) that are integral to the MurmurHash3 algorithm, ensuring the generation of high-quality hash values.

### 3.2.6 Codes & Tests

We will demonstrate the functionality of the component by inserting a signature value. 0xAB1020C5 will be the value of the tested signature. The mathematical operation to be performed appears in the following formulas:

$$\begin{aligned}
 k &= c_2 \cdot \text{Roll}_{15}(c_1 \cdot 0xAB1020C5) \\
 &= c_2 \cdot \text{Roll}_{15}(0x2B72FF55) \\
 &= c_2 \cdot 0x7FAA95B9 \\
 &= 0x1782463B
 \end{aligned} \tag{3.1}$$

$$\begin{aligned}
 \text{signature} &= m \cdot \text{Roll}_{13}(0xAC718ADD \oplus k) + n \\
 &= m \cdot \text{Roll}_{13}(0xBBF3CCE6) + n \\
 &= m \cdot 0x799CD77E + n \\
 &= 0x4664A0DA
 \end{aligned} \tag{3.2}$$

Let's look at the wave diagram that describes the operation of the component for preset values. Note that we get the desired value for the signature we tested.



Figure 3.2: Hasher waveform for test bench file

The following is the code we wrote that describes the hardware component.

```

proj_hasher.sv
1 `timescale 1ns / 1ps
2
3 module proj_hasher
4 #
5   parameter KMER_LEN = proj_pkg::KMER_LEN,
6   parameter DATA_BITS = proj_pkg::BASE_LEN,
7   parameter HASHER_DATA_BITS =
8     ↪ proj_pkg::HASHER_SORTER_SIGNATURE
9   )
10  input wire                                     clk,      // Clock signal
11  input wire                                     rst_n,    // Reset signal
12  input wire [HASHER_DATA_BITS-1:0] seed,

```

```

12     input wire [HASHER_DATA_BITS-1:0] kmer,
13     output logic [HASHER_DATA_BITS-1:0] signature
14 );
15
16 // Intermediate signals
17 logic [HASHER_DATA_BITS:0]           k, key;
18
19 localparam [HASHER_DATA_BITS-1:0]    c1 = 32'hcc9e2d51;
20 localparam [HASHER_DATA_BITS-1:0]    c2 = 32'h1b873593;
21 localparam [HASHER_DATA_BITS-1:0]    m = 32'd5;
22 localparam [HASHER_DATA_BITS-1:0]    n = 32'he6546b64;
23
24 // Hash computation
25 always_comb begin
26     k = kmer;
27     k = k * c1;
28     k = {k[HASHER_DATA_BITS-16:0],
29           ↳ k[HASHER_DATA_BITS-1:HASHER_DATA_BITS-15]}; // 
30           ↳ ROL15
31     k = k * c2;
32     key = seed;
33     key = key ^ k;
34     key = {key[HASHER_DATA_BITS-14:0],
35           ↳ key[HASHER_DATA_BITS-1:HASHER_DATA_BITS-13]}; // 
36           ↳ ROL13
37     signature = key * m + n;
38 end
39
40
41 endmodule

```

### 3.3 Sorter

#### 3.3.1 Description

Sorter is a hardware component that implements a sorting mechanism for signature-index pairs. It is designed to maintain a sorted list of the smallest indices based on their corresponding signatures.

The Sorter module is characterized by its ability to continuously update and maintain a sorted list of signature-index pairs, making it particularly suitable for applications requiring real-time sorting of data streams. This component is essential in systems that need to identify and track the smallest elements in a continuous data flow.

The term 'Sorter' is derived from its primary function—to sort and maintain an ordered list of elements. This module introduces features that include configurable list size, signature and index widths, along with control inputs for resetting and ending the sorting process, thereby affirming its adaptability and applicability in various data processing scenarios.

#### 3.3.2 Parameters

| Inner Parameter | Package Parameter             | Value | Description                   |
|-----------------|-------------------------------|-------|-------------------------------|
| INDICES_COUNT   | SORTER_EXTENDER_INDICES_COUNT | 256   | Number of indices to maintain |
| INDICE_LEN      | INDICE_LEN                    | 15    | Width of each index           |
| SIGNATURE_LEN   | HASHER_SORTER_SIGNATURE       | 32    | Width of each signature       |
| POSITION_LEN    | SORTER_POSITION_LEN           | 8     | Width of position value       |

#### 3.3.3 Interface

| Interface | Signal           | Direction | Width                      |
|-----------|------------------|-----------|----------------------------|
| System    | clk              | Input     | 1                          |
| System    | rst_n            | Input     | 1                          |
| Input     | in_signature     | Input     | SIGNATURE_LEN              |
| Input     | in_index         | Input     | INDICE_LEN                 |
| Control   | end_sorting      | Input     | 1                          |
| Output    | out_smallest_idx | Output    | INDICES_COUNT * INDICE_LEN |
| Output    | sort_valid       | Output    | 1                          |

#### Ports Description

##### Input Ports:

- clk: Clock signal
- rst\_n: Active-low reset signal
- in\_signature: Input signature
- in\_index: Input index
- end\_sorting: Signal to end sorting process

##### Output Ports:

- out\_smallest\_idx: Sorted smallest indices
- sort\_valid: Signal indicating valid sorted output of the smallest indices

### 3.3.4 Block Diagram



### 3.3.5 Implementation

In each clock cycle, the Sorter compares the incoming signature-index pair with the currently stored pairs. It determines the correct position for the new pair in the sorted list and updates the list accordingly. The module maintains a fixed-size list of the smallest signature-index pairs encountered.

#### The structure of the component

The Sorter module is structured as a sequential logic block that performs comparison and sorting operations. It uses parameters from the `proj_pkg` package to define its input and output widths, making it adaptable to different signature and index sizes within the larger system.

#### Subcomponents

The main subcomponents are:

- Comparison logic for determining the position of new elements
- Shift register logic for maintaining the sorted list
- Control logic for reset and end-of-sorting operations

#### Computational Logic

The `proj_sorter` module is responsible for implementing a real-time sorting mechanism for signature-index pairs. The main functionality is to maintain a sorted list of indices based on their associated signatures and continuously update this list as new data arrives. This section details the key elements of the implementation, including buffer management, sorting logic, and sequential updates.

##### 1. Buffer Management:

The `proj_sorter` module keeps track of the smallest indices in two buffers: `smallest_idx_curr` and `smallest_idx_next`. These buffers store the current sorted list of signature-index pairs and the next sorted values during the update phase.

The module processes new input in the form of signature-index pairs, provided by the `in_signature` and `in_index` signals. The new pair is compared against the current sorted list, and its position in the sorted order is calculated based on the signatures of the existing entries.

## 2. Sorting Logic:

The sorting logic is responsible for comparing the new signature with the existing list of sorted indices. It computes the number of signatures smaller than the new input and determines the position where the new input should be inserted. The signal `count_signatures_smaller_than` indicates how many current entries have a smaller signature, and `position_smaller_than` calculates the precise insertion point.

The sorting logic iterates through the list, updating `smallest_idx_next` based on the comparison results:

## 3. Insertion Mechanism:

Once the insertion point is computed, the new signature and index pair is inserted into the sorted list. The logic in `smallest_idx_next` updates the sorted array, ensuring that all other elements retain their order. This process allows the module to dynamically handle incoming data without disturbing the sorted order of the existing entries.

If the calculated position for the new entry is valid (i.e., `position_smaller_than` is non-zero), the new entry is inserted into the appropriate position, and the remaining elements are shifted accordingly:

## 4. Sequential Logic:

The sorting and buffer management are synchronized with the clock signal. The sequential logic ensures that the sorted list is updated every clock cycle, and the input values are registered in `new_pack_r` for processing in the next cycle.

The module responds to both the clock (`clk`) and reset (`rst_n`) signals. Upon reset, the sorted list is initialized to default values (e.g., signatures set to the maximum possible value and indices to zero):

The module uses combinational logic to determine the new position and prepare the next state of the sorted list. Sequential logic is used to update the current state of the list on each clock cycle. The design allows for continuous sorting of incoming data while maintaining a fixed-size list of the smallest elements.

### 3.3.6 Codes & Tests

The example shown below is an example with reduced parameters (not the actual ACMI parameters), for demonstration purposes. We will demonstrate the component's functionality by a case study of a limited component with only 4 stored values at any given time.

In the figure 3.3, it can be seen that there is a loading of new values into the table of saved values. It should be noted that when a new value arrives that is smaller than the values found in the table, the saved values will change their position according to the new value that arrived. In the following example, we refer to the situation given in the figure 3.3 at the moment of receiving the signature along with the index 4 (surrounded by a light blue square). At the moment before receiving the new package we have four values that fill the table. The signature with the lowest value, index 5, is in the lowest position while the signature with the highest value, index 8, is in the highest position. All the signatures are compared with the new signature and since it is smaller than all of them, the signatures of indexes 5, 6, 7 move to a higher cell respectively, the signature with the index 8 is thrown from the table and the new signature is

written in the lowest place since it is smaller than all the existing signatures. It should be noted that the index list is sent only when the signal (`end_sorting`) that all signatures have arrived is received.



Figure 3.3: Smallest-4 signatures Sorter waveform for test bench file

The following is the code we wrote that describes the hardware component.

```
proj_sorter.sv

1 import proj_pkg::*; // Include the project package
2
3 module proj_sorter #(
4     // Module parameters, using values from the project package
5     parameter INDICES_COUNT =
6         ↪ proj_pkg::SORTER_EXTENDER_INDICES_COUNT,
7     parameter INDICE_LEN = proj_pkg::INDICE_LEN,
8     parameter SIGNATURE_LEN =
9         ↪ proj_pkg::HASHER_SORTER_SIGNATURE,
10    parameter POSITION_LEN = proj_pkg::SORTER_POSITION_LEN
11 ) (
12     // Module inputs and outputs
13     input wire [SIGNATURE_LEN-1:0] in_signature,
14     input wire [INDICE_LEN-1:0] in_index,
15     output wire [INDICES_COUNT-1:0][INDICE_LEN-1:0]
16         ↪ out_smallest_idx,
17     input wire rst_n,
18     input wire end_sorting,
19     output wire sort_valid,
20     input wire clk
21 );
22     // Internal signals
23     signature_index_pack [INDICES_COUNT-1:0] smallest_idx_next;
24     signature_index_pack [INDICES_COUNT-1:0] smallest_idx_curr;
25     signature_index_pack new_pack;
26     logic [INDICES_COUNT-1:0] count_signatures_smaller_than;
27     logic [POSITION_LEN:0] position_smaller_than;
28     logic [POSITION_LEN:0] new_position_long;
29     logic [POSITION_LEN-1:0] new_position;
30
31     // Assign input values to internal signals
32     assign new_pack.signature = in_signature;
```

```

30 assign new_pack.index = in_index;
31
32
33 always_comb begin
34     count_signatures_smaller_than = '0;
35     position_smaller_than = '0;
36     new_position = '0;
37     new_position_long = '0;
38     for (int i = 0; i < INDICES_COUNT; i = i+1) begin
39         count_signatures_smaller_than[i] =
40             ↪ (new_pack.signature <
41             ↪ smallest_idx_curr[i].signature) ? 1'b1 :
42             ↪ 1'b0;
43         position_smaller_than +=
44             ↪ count_signatures_smaller_than[i];
45         if (position_smaller_than == '0) begin
46             smallest_idx_next[i].signature =
47                 ↪ smallest_idx_curr[i].signature;
48             smallest_idx_next[i].index =
49                 ↪ smallest_idx_curr[i].index;
50         end else begin
51             if (i < new_position) begin
52                 smallest_idx_next[i].signature =
53                     ↪ smallest_idx_curr[i].signature;
54                 smallest_idx_next[i].index =
55                     ↪ smallest_idx_curr[i].index;
56             end else if (i > new_position) begin
57                 smallest_idx_next[i].signature =
58                     ↪ smallest_idx_curr[i-1].signature;
59                 smallest_idx_next[i].index =
60                     ↪ smallest_idx_curr[i-1].index;
61             end
62         end
63     end
64     new_position_long = INDICES_COUNT -
65         ↪ position_smaller_than;
66     new_position = new_position_long[POSITION_LEN-1:0];
67 end
68
69 // Assign sorted indices to output
70 generate
71     for (genvar j = 0; j < INDICES_COUNT; j++) begin
72         assign out_smallest_idx[j] = end_sorting ?
73             ↪ smallest_idx_curr[j].index : 'x;
74     end
75 endgenerate
76
77 assign sort_valid = end_sorting ? 1'b1 : 1'b0;
78
79 // Sequential logic for updating current smallest indices
80 always_ff @(posedge clk) begin
81     for (int i = 0; i < INDICES_COUNT; i++) begin
82         if (~rst_n) begin

```

```
72 // Reset values
73 smallest_idx_curr[i].signature <= '1;
74 smallest_idx_curr[i].index <= '0;
75 end else begin
76 // Update with new values
77 if ((i == new_position) &
78     ↪ (position_smaller_than != '0)) begin
79     smallest_idx_curr[i].signature <=
80         ↪ new_pack.signature;
81     smallest_idx_curr[i].index <=
82         ↪ new_pack.index;
83 end else begin
84     smallest_idx_curr[i].signature <=
85         ↪ smallest_idx_next[i].signature;
86     smallest_idx_curr[i].index <=
87         ↪ smallest_idx_next[i].index;
88 end
89 end
90 endmodule
```

## 3.4 Extender

### 3.4.1 Description

The Extender is the final component of the ACMI (Accelerated Computational Microbiome Identification). It uses information from the Fragment Memory (FM) and performs expansion operations on the values that the Sorter found to be the smallest. The Extender extracts an 8-base wide DNA fragment for the relevant bases.

### 3.4.2 Parameters

| Inner Parameter   | Package Parameter             | Value | Description                             |
|-------------------|-------------------------------|-------|-----------------------------------------|
| FRAG_LEN_BITS     | FM_EXTENDER_FRAG_LEN_BITS     | 512   | Length of fragment in bits              |
| FRAG_SIZE         | FRAG_LEN                      | 256   | Length of fragment in bases             |
| KMER_SIZE         | KMER_LEN                      | 16    | Length of k-mer in bases                |
| INDICES_COUNT     | SORTER_EXTENDER_INDICES_COUNT | 256   | Number of indices to process            |
| INDICE_LEN        | INDICE_LEN                    | 15    | Width of each index                     |
| SIGNED_INDICE_LEN | SIGNED_INDICE_LEN             | 16    | Width of signed index                   |
| FRAG_PART_ONE_HOT | EXTENDER_OUT_PART_LEN_ONE_HOT | 8     | Length of one-hot encoded fragment part |
| BASE_LEN          | BASE_LEN                      | 2     | Number of bits per base                 |
| ONE_HOT_LEN       | ONE_HOT_LEN                   | 4     | Length of one-hot encoding per base     |
| FRAG_PART         | EXTENDER_OUT_PART_LEN         | 4     | Length of fragment part                 |

### 3.4.3 Interface

| Interface | Signal          | Direction | Width                      |
|-----------|-----------------|-----------|----------------------------|
| System    | clk             | Input     | 1                          |
| System    | rst_n           | Input     | 1                          |
| Input     | in_fragment     | Input     | FRAG_LEN_BITS              |
| Input     | in_kmer_indices | Input     | INDICES_COUNT * INDICE_LEN |
| Control   | valid_indices   | Input     | 1                          |
| Output    | out_index       | Output    | SIGNED_INDICE_LEN          |
| Output    | out_gfm         | Output    | FRAG_PART_ONE_HOT          |

### Ports Description

#### Input Ports:

- clk: Clock signal
- rst\_n: Reset signal (active low)
- in\_fragment: Input fragment
- in\_kmer\_indices: Input k-mer indices
- valid\_indices: Signal indicating valid input indices

#### Output Ports:

- out\_index: Output index
- out\_gfm: Output one-hot encoded fragment

### 3.4.4 Block Diagram



### 3.4.5 Implementation

The ‘`proj_extender`’ module utilizing parameters from the project package. The key implementation details include:

- **Fragment Extraction:** The module extracts the relevant fragment part based on the fragment index. The extraction logic uses the ‘`frag_parts_idx`’ signal to identify the current part of the fragment to be processed.
- **One-Hot Encoding:** The extracted fragment part is encoded into a one-hot format using a generate block, where each base of the DNA fragment (2 bits) is converted into a one-hot code (4 bits). The one-hot encoding allows efficient representation of the DNA bases for further processing.
- **Index Calculation:** The current index is selected from the k-mer indices array (‘`in_kmer_indices`’) based on the current index position (‘`indices_idx`’). The output index is calculated by adjusting the current index to account for the difference between the fragment size and k-mer size.
- **Sequential Logic:** The module uses sequential logic to update the fragment parts index and indices index on each clock cycle. It also stores the input k-mer indices into a register (‘`in_kmer_indices_r`’) when valid indices are provided.

The overall architecture is designed to handle the extraction and encoding of DNA fragments efficiently, allowing for parallel processing of multiple indices and fragment parts.

### 3.4.6 Codes & Tests

The example shown below is an example with reduced parameters (not the actual ACMI parameters), for demonstration purposes. In the wave diagram, you can see that information encoded over 16 bits is received from the fragment memory. The information obtained is:

$$0FFD = 000011111111101$$

It can be observed that the output is received as requested:

$$D = 11,01 \rightarrow \text{gfm}(D) = 1000,0010 = 82$$

$$F = 11,11 \rightarrow \text{gfm}(F) = 1000,1000 = 88$$

$$0 = 00,00 \rightarrow \text{gfm}(0) = 0001,0001 = 11$$



Figure 3.4: 16 bit fragment - Extender waveform for test bench file

The following is the code we wrote that describes the hardware component.

### proj\_extender.sv

```

1 // Timescale directive for simulation
2 `timescale 1ns / 1ps
3 // Import the project package
4 import proj_pkg::*;
5 // Module declaration with parameters
6 module proj_extender #(
7     parameter FRAG_LEN_BITS =
8         ↪ proj_pkg::FM_EXTENDER_FRAG_LEN_BITS,
9     parameter FRAG_SIZE = proj_pkg::FRAG_LEN,
10    parameter KMER_SIZE = proj_pkg::KMER_LEN,
11    parameter INDICES_COUNT =
12        ↪ proj_pkg::SORTER_EXTENDER_INDICES_COUNT,
13    parameter INDICE_LEN = proj_pkg::INDICE_LEN,
14    parameter SIGNED_INDICE_LEN = proj_pkg::SIGNED_INDICE_LEN,
15    parameter FRAG_PART_ONE_HOT =
16        ↪ proj_pkg::EXTENDER_OUT_PART_ONE_HOT,
17    parameter BASE_LEN = proj_pkg::BASE_LEN,
18    parameter ONE_HOT_LEN = proj_pkg::ONE_HOT_LEN,
19    parameter FRAG_PART = proj_pkg::EXTENDER_OUT_PART_LEN
20 ) (
21     // Input ports
22     input logic [FRAG_LEN_BITS-1:0] in_fragment,
23     input logic [INDICES_COUNT-1:0][INDICE_LEN-1:0]
24         ↪ in_kmer_indices,
25     input wire valid_indices,
26     input wire rst_n,
27     input wire clk,
28     // Output ports
29     output logic [SIGNED_INDICE_LEN-1:0] out_index,
30     output logic [FRAG_PART_ONE_HOT-1:0] out_gfm
31 );
32     // Local parameters
33     localparam FRAG_PARTS_COUNT = (FRAG_LEN_BITS >>

```

```

    ↪ $clog2(FRAG_PART));
30 localparam FRAG_PARTS_COUNT_BITS =
    ↪ $clog2(FRAG_PARTS_COUNT);
31 localparam INDICES_COUNT_BITS = $clog2(INDICES_COUNT);
32 // Internal signals
33 logic [FRAG_PARTS_COUNT_BITS-1:0] frag_parts_idx;
34 logic [FRAG_PARTS_COUNT_BITS-1:0] frag_parts_idx_next;
35 logic rst_frag_parts_idx;
36 logic [INDICE_LEN-1:0] curr_index;
37 logic [INDICES_COUNT_BITS-1:0] indices_idx;
38 logic [INDICES_COUNT_BITS-1:0] indices_idx_next;
39 logic [FRAG_PART-1:0] frag_part;
40 logic [INDICES_COUNT-1:0][INDICE_LEN-1:0]
    ↪ in_kmer_indices_r;
41
42 // Combinational logic
43 // Reset fragment parts index when it reaches the maximum
44 assign rst_frag_parts_idx = (frag_parts_idx ==
    ↪ (FRAG_PARTS_COUNT - 1)) ? 1'b1 : 1'b0;
45 // Calculate next fragment parts index
46 assign frag_parts_idx_next = rst_frag_parts_idx ? 1'b0 :
    ↪ frag_parts_idx + 1'b1;
47 // Select current index from input indices
48 assign curr_index = in_kmer_indices_r[indices_idx];
49 // Extract fragment part for output
50 assign frag_part = in_fragment[FRAG_PART*frag_parts_idx +:
    ↪ FRAG_PART];
51
52 generate
53     for (genvar i = 0; i < FRAG_PART >> $clog2(BASE_LEN);
    ↪ i++) begin : gen_gfm
54         always_comb begin
55             case (frag_part[i*BASE_LEN +: BASE_LEN])
56                 2'b00: out_gfm[i*ONE_HOT_LEN +:
    ↪ ONE_HOT_LEN] = 4'b0001;
57                 2'b01: out_gfm[i*ONE_HOT_LEN +:
    ↪ ONE_HOT_LEN] = 4'b0010;
58                 2'b10: out_gfm[i*ONE_HOT_LEN +:
    ↪ ONE_HOT_LEN] = 4'b0100;
59                 2'b11: out_gfm[i*ONE_HOT_LEN +:
    ↪ ONE_HOT_LEN] = 4'b1000;
60                 default: out_gfm[i*ONE_HOT_LEN +:
    ↪ ONE_HOT_LEN] = 4'b0000;
61             endcase
62         end
63     end
64 endgenerate
65 // Calculate next indices index
66 assign indices_idx_next = rst_frag_parts_idx ? indices_idx
    ↪ + 1'b1 : indices_idx;
67 // Calculate output index
68 assign out_index = {1'b0, curr_index} -
    ↪ SIGNED_INDICE_LEN'(((FRAG_SIZE - KMER_SIZE) >> 1));
69 // Sequential logic for fragment parts index

```

```

70    always_ff @(posedge clk or negedge rst_n) begin :
71        ↪ parts_index
72        if (~rst_n) begin
73            frag_parts_idx <= '0;
74        end else if (valid_indices) begin
75            frag_parts_idx <= '0;
76        end else begin
77            frag_parts_idx <= frag_parts_idx_next;
78        end
79    end
80
81    // Sequential logic for indices sample
82    always_ff @(posedge clk or negedge rst_n) begin :
83        ↪ index_sample
84        if (valid_indices) begin
85            in_kmer_indices_r <= in_kmer_indices;
86        end
87    end
88
89    // Sequential logic for indices index
90    always_ff @(posedge clk or negedge rst_n) begin :
91        ↪ indices_index
92        if (~rst_n) begin
93            indices_idx <= '0;
94        end else if (valid_indices) begin
95            indices_idx <= '0;
96        end else begin
97            indices_idx <= indices_idx_next;
98        end
99    end
100 endmodule

```

## 3.5 Index Counter

### 3.5.1 Description

Index Counter is a hardware component that implements a simple counting mechanism. It is designed to generate sequential indices and signal when a specific count has been reached.

The Counter module is characterized by its ability to generate sequential indices up to a predefined maximum value, making it particularly suitable for addressing or sequencing operations in digital systems. This component is essential in applications requiring ordered progression through a series of states or memory locations.

The nomenclature 'Counter' is derived from its primary function—to count or increment a value sequentially. This module introduces features that include the provision of configurable bit-width indices and a finished count signal, along with control inputs for starting and resetting the count, thereby affirming its adaptability and applicability in various digital design scenarios.

### 3.5.2 Parameters

| Inner Parameter | Package Parameter | Value | Description               |
|-----------------|-------------------|-------|---------------------------|
| INDICE_LEN      | INDICE_LEN        | 15    | Width of the index output |

### 3.5.3 Interface

| Interface | Signal         | Direction | Width      |
|-----------|----------------|-----------|------------|
| System    | clk            | Input     | 1          |
| System    | rst_n          | Input     | 1          |
| Control   | start          | Input     | 1          |
| Output    | index          | Output    | INDICE_LEN |
| Output    | finished_count | Output    | 1          |

#### Ports Description

##### Input Ports:

- clk: Clock signal
- rst\_n: Reset signal (active low)
- start: Signal to start counting

##### Output Ports:

- index: Current count value
- finished\_count: Signal indicating count completion

### 3.5.4 Block Diagram



### 3.5.5 Implementation

The ‘proj\_index\_counter’ module is implemented as a sequential logic block that performs counting operations. It uses the project package (‘proj\_pkg’) to configure the index width, enabling it to count up to a value defined by ‘FM\_BUFFER\_SIZE’. The module is responsible for generating sequential indices and signaling when the count is complete.

#### Operation

The counter begins counting when the ‘start’ signal is asserted. It increments the internal index value on each clock cycle, controlled by the ‘count\_enabled’ flag. The module stops counting when the index reaches the maximum value (‘FM\_BUFFER\_SIZE - 1’) and asserts the ‘finished\_count’ signal. If the counter is reset via the ‘rst\_n’ signal or when the end of the count is reached, the internal index is cleared, and counting can restart.

#### Control Logic

- **Start Detection:** The counter detects the rising edge of the ‘start’ signal to begin counting. It uses a previous state register (‘start\_prev’) to ensure counting starts only on a transition from low to high.
- **Count Enable:** The ‘count\_enabled’ flag controls whether the index increments. Counting continues until the end of the count is detected, at which point the counter stops.
- **Reset Behavior:** The index is reset either when the end of the count is reached or when the system receives a reset signal (‘rst\_n’).

#### Sequential Logic

The core of the module is a sequential logic block that operates on the rising edge of the clock (‘clk’). The block increments the index when counting is enabled and resets the index to zero on reset or when the maximum count is reached. It also manages the ‘start\_prev’ signal to detect rising edges of the ‘start’ signal.

## End-of-Count Detection

The module includes a comparator to check if the index has reached its maximum value ('FM\_BUFFER\_SIZE - 1'). When this condition is met, the 'finished\_count' signal is asserted, indicating that the counting operation is complete.

## Output Generation

The module provides the current index value on the 'index' output port, and the 'finished\_count' signal is used to indicate when the counting process is done. The index can be used as an address or state counter in various digital systems, while the 'finished\_count' flag can trigger subsequent operations in the system.

### 3.5.6 Codes & Tests



Figure 3.5: A counter of 16 indexes waveform for test bench file

The example shown below is an example with reduced parameters (not the actual ACMI parameters), for demonstration purposes. In the example, a counter of 16 index units is shown. In the wave diagram, we will notice that when we receive a rising start signal for a single clock cycle or when there is a reset signal, the count starts again. In addition, when these condition are not met and the count finished, the counter stopped counting as expected.

#### proj\_index\_counter.sv

```
1 import proj_pkg::*;

2 module proj_counter
3 #(
4   parameter INDICE_LEN = proj_pkg::INDICE_LEN
5 )
6 (
7   output wire [INDICE_LEN-1:0] index,
8   output logic finished_count,
9   input wire clk,
10  input wire rst_n,
11  input wire start
12 );
13 // Internal signals
14 logic [INDICE_LEN-1:0] out_index;
15 logic end_of_count;
16 logic [INDICE_LEN-1:0] idx_next;
```

```

18    logic rst_index;
19    logic count_enabled;
20    logic start_prev;
21
22    // Check if write address reached the end of the buffer
23    assign end_of_count = (out_index == (FM_BUFFER_SIZE-1)) ?
24        ↪ 1'b1 : 1'b0;
25    assign finished_count = end_of_count;
26
27    // Increment write address or keep current value
28    assign idx_next = (count_enabled & ~end_of_count) ?
29        ↪ out_index + 1'b1 : out_index;
30
31
32    // Assign output index
33    assign index = out_index;
34
35    // Sequential logic for updating the index and detecting
36    ↪ start signal
37    always_ff @(posedge clk or negedge rst_n) begin
38        if (~rst_n) begin
39            out_index <= '0;
40            count_enabled <= 1'b0;
41            start_prev <= 1'b0;
42        end else begin
43            start_prev <= start;
44            if (rst_index) begin
45                out_index <= '0;
46                count_enabled <= 1'b0;
47            end else if (start & ~start_prev) begin
48                // Rising edge of start signal
49                count_enabled <= 1'b1;
50                out_index <= '0;
51            end else if (count_enabled) begin
52                out_index <= idx_next;
53            end
54        end
55    endmodule

```

## 3.6 Kmer Buffer

### 3.6.1 Description

Kmer Buffer is a hardware component that implements a shifting buffer mechanism for storing and managing k-mers. It is designed to sequentially store incoming nucleotide data and output a complete k-mer when the buffer is full.

The Kmer Buffer module is characterized by its ability to store and shift nucleotide data, making it particularly suitable for applications in DNA sequence analysis and bioinformatics hardware accelerators. This component is essential in systems that need to process DNA sequences in k-mer units.

The term 'Kmer Buffer' is derived from its primary function—to buffer and prepare k-mers. This module introduces features that include configurable k-mer length and nucleotide data width, along with control inputs for resetting the buffer and a full signal output, thereby affirming its adaptability and applicability in various genomic data processing scenarios.

### 3.6.2 Parameters

| Inner Parameter | Package Parameter | Value | Description                        |
|-----------------|-------------------|-------|------------------------------------|
| DATA_BITS       | KMER_BUFFER_BITS  | 2     | Number of bits for each nucleotide |
| KMER_LEN        | KMER_BUFFER_LEN   | 16    | Length of the k-mer                |
| OUT_KMER        | -                 | 32    | Total bits in the output k-mer     |

### 3.6.3 Interface

| Interface | Signal     | Direction | Width                |
|-----------|------------|-----------|----------------------|
| System    | clk        | Input     | 1                    |
| System    | rst_n      | Input     | 1                    |
| Input     | in_data    | Input     | DATA_BITS            |
| Control   | start_over | Input     | 1                    |
| Output    | out_kmer   | Output    | KMER_LEN * DATA_BITS |
| Output    | full       | Output    | 1                    |

#### Ports Description

##### Input Ports:

- clk: Clock signal
- rst\_n: Reset signal (active low)
- in\_data: Input nucleotide data
- start\_over: Signal to reset the buffer

##### Output Ports:

- out\_kmer: Output k-mer
- full: Indicates when the buffer is full

### 3.6.4 Block Diagram



### 3.6.5 Implementation

In each clock cycle, the Kmer Buffer shifts in new nucleotide data and updates its internal buffer. It maintains a count of stored nucleotides and signals when the buffer is full (i.e., a complete k-mer is ready). The buffer can be reset either by the active-low reset signal or the start\_over input.

#### The structure of the component

The Kmer Buffer module is structured as a sequential logic block that performs shifting and storing operations. It uses parameters from the `proj_pkg` package to define its input and output widths, making it adaptable to different k-mer lengths and nucleotide representations within the larger system.

#### Subcomponents

The main subcomponents are:

- Shift register logic for the k-mer buffer
- Counter logic for tracking the number of stored nucleotides
- Control logic for buffer reset and full detection

#### Computational Logic

The Kmer Buffer module implements its functionality through the following steps:

1. Input Processing:
  - Shifts in new nucleotide data at each clock cycle

- Updates the internal k-mer buffer with the new data
2. Buffer Management:
- Maintains a count of stored nucleotides
  - Signals when the buffer is full (i.e., a complete k-mer is ready)
3. Output Generation:
- Continuously outputs the current state of the k-mer buffer
  - Asserts the full signal when the buffer contains a complete k-mer
4. Reset Handling:
- Resets the buffer and count on either system reset or start\_over signal

The module uses generate blocks to create the shifting logic, allowing for easy adaptation to different k-mer lengths. The buffer\_full signal is used to prevent overwriting of a complete k-mer until it is read out or intentionally reset.

### 3.6.6 Codes & Tests



Figure 3.6: 4-mer buffer waveform for test bench file

The example shown below is an example with reduced parameters (not the actual ACMI parameters), for demonstration purposes. The attached example shows how the information is saved in the buffer. The example shows a buffer with a depth of 4 that receives values and saves them as requested. The component functions like a queue, The component functions like a queue, it stores the values 1,1,3,1,1,1,2 in the order of their arrival. Note that when a start\_over signal is received, all the stored values are reset and the component will return to receive the values 2,1,2,3 with the next clock rise.

The following is the code we wrote that describes the hardware component.

```
1 `timescale 1ns / 1ps
2 import proj_pkg::*; // Include the project package
3
4 module proj_kmer_buffer #(
5     parameter DATA_BITS = proj_pkg::KMER_BUFFER_BITS,      //
6         ↪ Number of bits for each nucleotide
7     parameter KMER_LEN = proj_pkg::KMER_BUFFER_LEN,        //
9         ↪ Length of the k-mer
```

```

7      parameter OUT_KMER = KMER_LEN * DATA_BITS // Total bits
8      // in the output k-mer
9  ) (
10     input wire clk,           // Clock input
11     input wire rst_n,        // Active-low reset
12     input wire [DATA_BITS-1:0] in_data, // Input nucleotide
13     // data
14     output wire [KMER_LEN-1:0][DATA_BITS-1:0] out_kmer, // Output k-mer
15     input wire start_over,   // Signal to reset the buffer
16     output logic full       // Indicates when the buffer
17     // is full
18 );
19 // Internal signals
20 logic [KMER_LEN*DATA_BITS-1:0] kmer_buffer; // Current
21 // k-mer buffer
22 logic [KMER_LEN*DATA_BITS-1:0] kmer_buffer_nxt; // Next
23 // state of k-mer buffer
24 logic [$clog2(KMER_LEN)-1:0] buffer_count; // Current
25 // count of nucleotides in buffer
26 logic [$clog2(KMER_LEN)-1:0] buffer_count_nxt; // Next
27 // state of buffer count
28
29 // Increment buffer count
30 assign buffer_count_nxt = buffer_count + 1'b1;
31
32 // Determine if buffer is full
33 assign buffer_full = (buffer_count == KMER_LEN - 1) ? 1'b1
34     // : 1'b0;
35 assign full = buffer_full;
36
37 // Generate next state of k-mer buffer
38 generate
39   for (genvar i = 2; i < KMER_LEN; i++) begin
40     assign kmer_buffer_nxt[0] = in_data[0];
41     assign kmer_buffer_nxt[1] = in_data[1];
42     assign kmer_buffer_nxt[i] = kmer_buffer[i-1]; // Shift existing data
43   end
44 endgenerate
45
46 // Connect internal buffer to output
47 assign out_kmer = kmer_buffer;
48
49 // Update k-mer buffer on clock edge
50 always_ff @(posedge clk or negedge rst_n) begin
51   if (~rst_n) begin
52     kmer_buffer <= '0; // Reset buffer on active-low
53     // reset
54   end else if (start_over) begin
55     kmer_buffer <= '0; // Reset buffer on start_over
56     // signal
57   end else begin
58     kmer_buffer <= kmer_buffer_nxt; // Update buffer

```

```

49           ↪ with next state
50     end
51
52 // Update buffer count on clock edge
53 always_ff @(posedge clk or negedge rst_n) begin
54   if (~rst_n) begin
55     buffer_count <= '0; // Reset count on active-low
56     ↪ reset
57   end else if (start_over) begin
58     buffer_count <= '0; // Reset count on start_over
59     ↪ signal
60   end else if (buffer_full) begin
61     buffer_count <= buffer_count; // Maintain count
62     ↪ when buffer is full
63   end else begin
64     buffer_count <= buffer_count_nxt; // Increment count

```

# Chapter 4

## Integration

This chapter focuses on the integration of the various components of the ACMI hardware accelerator for the Vision Transformer algorithm, with specific emphasis on the implementation of MinHash for genome analysis. The successful integration of these components ensures the overall system performs the required computations efficiently and accurately.



### 4.1 Overview of Integration

The integration process involves connecting multiple modules, each of which plays a key role in the overall functionality of ACMI. These components, including the Fragment Memory, Kmer Buffer, Index counter, Hasher, Sorter, and Extender, must work together seamlessly to achieve high-performance genome fragment processing.

### 4.2 Integration Challenges

During the integration process, several challenges were encountered, including:

- Synchronization Issues: Ensuring that all components operate in sync was critical, particularly with the clock cycle timing between the Fragment Memory and the Hasher.
- Memory Access Conflicts: Managing access to shared memory resources without causing delays or data corruption.

### 4.3 codes & test



Figure 4.1: button-4 signatures ACMI test banch

The example shown below is an example with reduced parameters (not the actual ACMI parameters), for demonstration purposes. It can be seen that in every clock cycle the counter\_index increases by 1, the kmer is manipulated and a signature is obtained that the relationship between it and the kmer is not visible to the eye. When sort\_valid is received we see what the four minimum signatures are and they are passed to the extender. In the next clock cycle, we see the output 24 which corresponds to the gfm of 6 obtained from the fragment memory. In the next clock cycle we get 11 as expected for the value 0. In the next clock cycle we get 24 again because the value is 6 and finally we get 42 according to the value 9.

In total, we get the functionality expected from the component. It is important to note that for this simulation we used different sizes from the component sizes to allow easy observation of the results. The following is the code we wrote that describes the hardware component.

```
top.sv
1 `timescale 1ns / 1ps
2 import proj_pkg::*;
3 module proj_top (
4     input wire clk,
5     input wire rst_n,
6     input wire [proj_pkg::BASE_LEN-1:0] in_data,
7     output wire [proj_pkg::FRAG_LEN-1:0] out_fragment,
8     output wire out_wait
9 );
10    // Internal signals
11    wire [proj_pkg::INDICE_LEN-1:0] counter_index;
12    wire counter_finished;
13    wire [proj_pkg::KMER_LEN*proj_pkg::BASE_LEN-1:0] kmer;
```

```

14     wire kmer_buffer_full;
15     wire [proj_pkg::HASHER_SORTER_SIGNATURE-1:0] signature;
16     wire [proj_pkg::SORTER_EXTENDER_INDICES_COUNT-1:0]
17 [proj_pkg::INDICE_LEN-1:0] sorted_indices;
18     wire sort_valid;
19     wire [proj_pkg::SIGNED_INDICE_LEN-1:0] extender_index;
20     wire [proj_pkg::FM_EXTENDER_FRAG_LEN_BITS-1:0] fm_fragment;
21 // FM module
22 proj_fm #(
23     .BUFFER_COUNT(proj_pkg::FM_BUFFER_COUNT),
24     .RAMS(proj_pkg::FM_RAMs_COUNT),
25     .ENTRIES(proj_pkg::FM_ENTRIES_COUNT),
26     .OFFSET(proj_pkg::FM_OFFSET_COUNT),
27     .DATA_BITS(proj_pkg::FM_DATA_BITS),
28     .INDICE_LEN(proj_pkg::INDICE_LEN),
29     .SIGNED_INDICE_LEN(proj_pkg::SIGNED_INDICE_LEN),
30     .FRAG_LEN(proj_pkg::FM_EXTENDER_FRAG_LEN_BITS)
31 ) u_fm (
32     .clk(clk),
33     .rst_n(rst_n),
34     .in_wdata(in_data),
35     .chg_idx(counter_finished),
36     .frag_idx(extender_index),
37     .out_rdata(fm_fragment),
38     .out_wait(out_wait)
39 );
// Counter module
proj_counter#(
40     .INDICE_LEN(proj_pkg::INDICE_LEN)
41 ) u_counter
(
42     .clk(clk),
43     .rst_n(rst_n),
44     .start(kmer_buffer_full),
45     .index(counter_index),
46     .finished_count(counter_finished)
47 );
// Kmer Buffer module
proj_kmer_buffer #(
48     .DATA_BITS(proj_pkg::KMER_BUFFER_BITS),
49     .KMER_LEN(proj_pkg::KMER_BUFFER_LEN),
50     .OUT_KMER(proj_pkg::KMER_BUFFER_LEN *
51             ↗ proj_pkg::KMER_BUFFER_BITS)
52 ) u_kmer_buffer (
53     .clk(clk),
54     .rst_n(rst_n),
55     .in_data(in_data),
56     .out_kmer(kmer),
57     .start_over(counter_finished),
58     .full(kmer_buffer_full)
59 );
// Hasher module
proj_hasher #(
60     .KMER_LEN(proj_pkg::KMER_LEN),
61

```

```

67      .DATA_BITS(proj_pkg::BASE_LEN),
68      .HASHER_DATA_BITS(proj_pkg::HASHER_SORTER_SIGNATURE)
69  ) u_hasher (
70      .clk(clk),
71      .rst_n(rst_n),
72      .seed(32'hac718add),
73      .kmer(kmer),
74      .signature(signature)
75  );
76  // Sorter module
77  proj_sorter #(
78      .INDICES_COUNT(proj_pkg::SORTER_EXTENDER_INDICES_COUNT),
79      .INDICE_LEN(proj_pkg::INDICE_LEN),
80      .SIGNATURE_LEN(proj_pkg::HASHER_SORTER_SIGNATURE),
81      .POSITION_LEN(proj_pkg::SORTER_POSITION_LEN)
82  ) u_sorter (
83      .clk(clk),
84      .rst_n(rst_n),
85      .in_signature(signature),
86      .in_index(counter_index),
87      .out_smallest_idx(sorted_indices),
88      .end_sorting(counter_finished),
89      .sort_valid(sort_valid)
90  );
91  // Extender module
92  proj_extender #(
93      .FRAG_LEN_BITS(proj_pkg::FM_EXTENDER_FRAG_LEN_BITS),
94      .FRAG_SIZE(proj_pkg::FRAG_LEN),
95      .KMER_SIZE(proj_pkg::KMER_LEN),
96      .INDICES_COUNT(proj_pkg::SORTER_EXTENDER_INDICES_COUNT),
97      .INDICE_LEN(proj_pkg::INDICE_LEN),
98      .SIGNED_INDICE_LEN(proj_pkg::SIGNED_INDICE_LEN),
99      .FRAG_PART_ONE_HOT(proj_pkg::EXTENDER_OUT_PART_LEN_ONE_HOT),
100     .BASE_LEN(proj_pkg::BASE_LEN),
101     .ONE_HOT_LEN(proj_pkg::ONE_HOT_LEN),
102     .FRAG_PART(proj_pkg::EXTENDER_OUT_PART_LEN)
103  ) u_extender (
104      .clk(clk),
105      .rst_n(rst_n),
106      .in_fragment(fm_fragment),
107      .in_kmer_indices(sorted_indices),
108      .valid_indices(sort_valid),
109      .out_index(extender_index),
110      .out_gfm(out_fragment)
111  );
112 endmodule

```

# **Chapter 5**

## **Results**

### **5.1 Synthesis Process Overview**

The synthesis flow in this project uses Genus, a tool from Cadence, for logic synthesis. The process involves several stages including reading libraries, reading RTL, elaboration, constraint application, and actual synthesis.

### **5.2 Tools and Environment**

Cadence Genus serves as the primary synthesis tool. The environment setup involves sourcing various configuration files:

- Project-specific definitions from `../proj.defines`
- Technology-specific library paths and definitions from files such as `../libraries.$TECHNOLOGY.tcl`
- Standard cell, SRAM, and potentially I/O technology files

### **5.3 Synthesis Constraints**

Constraints are applied through SDC (Synopsys Design Constraints) files:

- The main constraints file is read using `read_sdc $design(functional_sdc)`
- Clock gating constraints are set:
- Minimum flops for clock gating: 8
- Clock gating style: latch

### **5.4 Area Analysis**

The area analysis provides insight into the physical footprint of the synthesized design. The total area of the component is  $176,241.14 \mu\text{m}^2$ , as calculated by the synthesis tool in the area analysis report. This total is composed of contributions from various cell types, including sequential cells, inverters, buffers, and logic cells. The area reports highlight the contributions of different standard cell types and libraries used in the design.

| Instance      | Module                                              | Cell Count | Cell Area  | Net Area  | Total Area |
|---------------|-----------------------------------------------------|------------|------------|-----------|------------|
| proj_top      |                                                     | 39711      | 111609.080 | 64632.062 | 176241.142 |
| u_fm          | proj_fm_BUFFER_COUNT2_RAMS8_ENTRIES128_OFFSET32_DA  | 24000      | 71428.320  | 44425.886 | 115854.206 |
| u_extender    | proj_extender_FRAG_LEN_BITS512_FRAG_SIZE256_KMER_S  | 9073       | 27833.184  | 13237.722 | 41070.906  |
| u_hasher      | proj_hasher_KMER_LEN16_DATA_BITS2_HASHER_DATA_BITS  | 1261       | 4487.760   | 1996.886  | 6484.646   |
| u_sorter      | proj_sorter_INDICES_COUNT256_INDICE_LEN15_SIGNATURE | 485        | 2338.272   | 881.362   | 3219.634   |
| u_kmer_buffer | proj_kmer_buffer_DATA_BITS2_KMER_LEN16_OUT_KMER32   | 51         | 237.960    | 46.271    | 284.231    |
| u_counter     | proj_counter_INDICE_LEN15                           | 36         | 152.640    | 56.982    | 209.622    |

Figure 5.1: Area Analysis

#### 5.4.1 Library Breakdown

The design utilizes several sub-modules, each contributing to the total area. The breakdown is as follows:

- u\_fm: 24,000 instances, contributing  $115,854.21 \mu\text{m}^2$  of area.
- u\_extender: 9,073 instances, contributing  $41,070.91 \mu\text{m}^2$  of area.
- u\_hasher: 1,261 instances, contributing  $6,484.65 \mu\text{m}^2$  of area.
- u\_sorter: 485 instances, contributing  $3,219.63 \mu\text{m}^2$  of area.
- u\_kmer\_buffer: 51 instances, contributing  $284.23 \mu\text{m}^2$  of area.
- u\_counter: 36 instances, contributing  $209.62 \mu\text{m}^2$  of area.

#### 5.4.2 Cell Type Breakdown

Different types of cells contribute variably to the total area:

- **Sequential cells** (such as flip-flops and latches) account for 3,779 instances, taking up  $25,640.36 \mu\text{m}^2$  of area, or 28.6% of the total.
- **Inverters** contribute significantly, with 4,435 instances occupying  $6,857.28 \mu\text{m}^2$  of area, 5.5% of the total area.
- **Logic cells** dominate the area utilization with 31,280 instances, contributing to  $81,587.52 \mu\text{m}^2$  of area, which is 65.5% of the total area.
- **Buffers** and **clock-gating cells** have minimal impact on the overall area, with buffers contributing  $483.48 \mu\text{m}^2$  (0.4%) and clock-gating cells only  $19.44 \mu\text{m}^2$ .

### 5.5 Timing Analysis

The timing analysis reports provide detailed information about the critical paths in the design. These reports show the propagation of signals through various cells, including arrival times, transition delays, and cell types.

### 5.5.1 Critical Path Analysis

From the provided timing reports, we can observe several critical paths:

- The path starting from `in_data[1]` has a total arrival time of 2786 ps, with the signal propagating through multiple cells including `A021_X3M_A9TL`, `INV_X3B_A9TL`, and `INV_X2M_A9TL`, as it can be seen in figure 5.2.
- Another critical path begins at `u_fm_waddr_reg[6]/CK` and ends at `out_wait`, with a total arrival time of 1432 ps. This path includes cells like `DFFRPQN_X2M_A9TL`, `INV_X6M_A9TL`, and `AND4_X0P5M_A9TL`, as it can be seen in figure 5.3.
- A more complex path is shown in the third image, starting from `u_kmer_buffer/kmer_buffer_reg[14]` and propagating through multiple `u_hasher` components. This path has a significantly longer arrival time of 8780 ps, as it can be seen in figure 5.4.

| # | Timing Point                                | Flags | Arc   | Edge | Cell                        | Fanout | Trans | Delay | Arrival |
|---|---------------------------------------------|-------|-------|------|-----------------------------|--------|-------|-------|---------|
| # |                                             |       |       |      |                             | (ps)   | (ps)  | (ps)  |         |
| # | <code>in_data[1]</code>                     | -     | -     | R    | (arrival)                   | 2      | 54    | 0     | 2016    |
|   | <code>g187504/Y</code>                      | -     | A0->Y | R    | <code>A021_X3M_A9TL</code>  | 15     | 219   | 208   | 2225    |
|   | <code>drc_bufs193659/Y</code>               | -     | A->Y  | F    | <code>INV_X3B_A9TL</code>   | 16     | 320   | 269   | 2494    |
|   | <code>drc_bufs193557/Y</code>               | -     | A->Y  | R    | <code>INV_X2M_A9TL</code>   | 16     | 295   | 292   | 2786    |
|   | <code>u_fm_FMbufers_reg[1][504][1]/D</code> | -     | -     | R    | <code>EDFFQ_X1M_A9TL</code> | 16     | -     | 0     | 2786    |

Figure 5.2: Timing Analysis: in2reg

| # | Timing Point                      | Flags | Arc    | Edge | Cell                          | Fanout | Trans | Delay | Arrival |
|---|-----------------------------------|-------|--------|------|-------------------------------|--------|-------|-------|---------|
| # |                                   |       |        |      |                               | (ps)   | (ps)  | (ps)  |         |
| # | <code>u_fm_waddr_reg[6]/CK</code> | -     | -      | R    | (arrival)                     | 9      | 0     | 0     | 0       |
|   | <code>u_fm_waddr_reg[6]/QN</code> | -     | CK->QN | F    | <code>DFFRPQN_X2M_A9TL</code> | 4      | 160   | 318   | 318     |
|   | <code>g192255/Y</code>            | -     | A->Y   | R    | <code>INV_X6M_A9TL</code>     | 14     | 123   | 122   | 440     |
|   | <code>g317400/Y</code>            | -     | C->Y   | R    | <code>AND4_X0P5M_A9TL</code>  | 1      | 123   | 201   | 641     |
|   | <code>g317384/Y</code>            | -     | D->Y   | R    | <code>AND4_X0P5M_A9TL</code>  | 1      | 140   | 218   | 859     |
|   | <code>g317374/Y</code>            | -     | A->Y   | F    | <code>NAND3_X2M_A9TL</code>   | 9      | 258   | 193   | 1052    |
|   | <code>g317368/Y</code>            | -     | AN->Y  | F    | <code>NAND2B_X2M_A9TL</code>  | 6      | 179   | 262   | 1315    |
|   | <code>g125451/Y</code>            | -     | A->Y   | R    | <code>INV_X1M_A9TL</code>     | 1      | 109   | 118   | 1432    |
|   | <code>out_wait</code>             | -     | -      | R    | (port)                        | -      | -     | 0     | 1432    |

Figure 5.3: Timing Analysis: reg2out

| #                                                 | Timing Point | Flags  | Arc | Edge               | Cell | Fanout | Trans | Delay | Arrival |
|---------------------------------------------------|--------------|--------|-----|--------------------|------|--------|-------|-------|---------|
| #                                                 |              |        |     |                    |      | (ps)   | (ps)  | (ps)  |         |
| #-----                                            |              |        |     |                    |      |        |       |       |         |
| u_kmer_buffer/kmer_buffer_reg[14]/CK              | -            | -      | R   | (arrival)          | 3434 | 0      | 0     | 0     |         |
| u_kmer_buffer/kmer_buffer_reg[14]/0               | -            | CK->Q  | R   | DFFRPQN_X2M_A9TL   | 8    | 208    | 373   | 373   |         |
| u_hasher/const mul_27_13_g7523/Y                  | -            | A->Y   | F   | INV_X4M_A9TL       | 5    | 84     | 90    | 463   |         |
| u_hasher/const mul_27_13_g7499_7482/Y             | -            | A->Y   | R   | NOR2_X2A_A9TL      | 2    | 145    | 115   | 578   |         |
| u_hasher/const mul_27_13_g7566_4733/Y             | -            | B0->Y  | R   | AO21_X1M_A9TL      | 2    | 137    | 180   | 758   |         |
| u_hasher/const mul_27_13_g7361_7482/Y             | -            | A->Y   | R   | XOR2_X1M_A9TL      | 1    | 242    | 144   | 902   |         |
| u_hasher/const mul_27_13_g7261_5122/S             | -            | C1->S  | F   | ADDX1P4M_A9TL      | 2    | 132    | 368   | 1270  |         |
| u_hasher/const mul_27_13_g7258/Y                  | -            | A->Y   | R   | INV_X3M_A9TL       | 1    | 68     | 74    | 1344  |         |
| u_hasher/const mul_27_13_g7222_8428/Y             | -            | A->Y   | R   | XOR2_X3M_A9TL      | 3    | 201    | 96    | 1440  |         |
| u_hasher/const mul_27_13_g7183_1666/Y             | -            | AN->Y  | R   | NOR2B_X4M_A9TL     | 2    | 93     | 165   | 1684  |         |
| u_hasher/const mul_27_13_g7167_8246/Y             | -            | A->Y   | F   | NOR2_X3M_A9TL      | 2    | 73     | 74    | 1679  |         |
| u_hasher/const mul_27_13_g7133_6131/Y             | -            | B->Y   | R   | NAND2_X3M_A9TL     | 2    | 94     | 88    | 1767  |         |
| u_hasher/const mul_27_13_g7099_6161/Y             | -            | B->Y   | F   | NOR2_X3B_A9TL      | 1    | 89     | 92    | 1859  |         |
| u_hasher/const mul_27_13_g7080_8428/Y             | -            | A->Y   | F   | OR4_X6M_A9TL       | 4    | 71     | 136   | 1995  |         |
| u_hasher/const mul_27_13_g2_2802/Y                | -            | A1->Y  | F   | AO21_X2M_A9TL      | 2    | 72     | 169   | 2164  |         |
| u_hasher/const mul_27_13_g7053_3680/Y             | -            | B1->Y  | F   | AO1B2_X2M_A9TL     | 1    | 93     | 100   | 2384  |         |
| u_hasher/const mul_27_13_g7044_6417/Y             | -            | S0->Y  | R   | MXT2_X3M_A9TL      | 11   | 216    | 244   | 2547  |         |
| u_hasher/const mul_29_12_g9423/Y                  | -            | A->Y   | F   | INV_X2M_A9TL       | 9    | 151    | 153   | 2700  |         |
| u_hasher/const mul_29_12_g9263/Y                  | -            | A1->Y  | R   | AO1211_X1M_A9TL    | 1    | 287    | 243   | 2943  |         |
| u_hasher/const mul_29_12_g9185_5526/S             | -            | A->S   | F   | ADDX1M_A9TL        | 1    | 108    | 368   | 3311  |         |
| u_hasher/const mul_29_12_g9134_7098/S             | -            | C1->S  | R   | ADDX1P4M_A9TR      | 2    | 129    | 249   | 3761  |         |
| u_hasher/const mul_29_12_g9125_8428/Y             | -            | B->Y   | R   | OR2_X0PSM_A9TL     | 3    | 240    | 233   | 3994  |         |
| u_hasher/const mul_29_12_g9106_1881/Y             | -            | B->Y   | F   | NAND2_X1A_A9TL     | 2    | 145    | 154   | 4148  |         |
| u_hasher/const mul_29_12_g9082_9315/Y             | -            | A1->Y  | R   | AO1B2_X0P7M_A9TL   | 3    | 146    | 271   | 4418  |         |
| u_hasher/const mul_29_12_g9019_6161/Y             | -            | A0->Y  | F   | AO211_X0P7M_A9TL   | 1    | 85     | 226   | 4644  |         |
| u_hasher/const mul_29_12_g9010_1705/Y             | -            | B0->Y  | F   | AO1211_X1M_A9TL    | 5    | 152    | 195   | 4840  |         |
| u_hasher/const mul_29_12_g8976_3680/Y             | -            | B->Y   | R   | NOR2_X1P4A_A9TL    | 2    | 166    | 152   | 4991  |         |
| u_hasher/const mul_29_12_g8971/Y                  | -            | A->Y   | F   | INV_X1M_A9TL       | 1    | 73     | 81    | 5072  |         |
| u_hasher/const mul_29_12_g8968_2398/Y             | -            | A->Y   | R   | NAND2_X1B_A9TL     | 3    | 171    | 113   | 5185  |         |
| u_hasher/const mul_29_12_g8957_7482/Y             | -            | A1N->Y | R   | AO12XB1_X0P5M_A9TL | 1    | 265    | 167   | 5452  |         |
| u_hasher/const mul_29_12_g8944_5526/Y             | -            | B->Y   | R   | XOR2_X0P7M_A9TL    | 1    | 249    | 216   | 5669  |         |
| u_hasher/g83/Y                                    | -            | A->Y   | F   | INV_X1M_A9TL       | 3    | 135    | 146   | 5815  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1309_2802/Y | -            | B->Y   | R   | NOR2_X1B_A9TL      | 3    | 274    | 214   | 6029  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1287_1881/Y | -            | B0->Y  | R   | AO21_X1M_A9TL      | 2    | 109    | 207   | 6236  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1251_2802/Y | -            | B->Y   | R   | OR2_X0PSM_A9TL     | 3    | 221    | 216   | 6452  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1189_6783/Y | -            | A1N->Y | R   | AO12XB1_X1M_A9TL   | 2    | 233    | 198   | 6722  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1172_4733/Y | -            | A1->Y  | F   | AO122_X1M_A9TL     | 1    | 148    | 150   | 6872  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1148_2346/Y | -            | B0->Y  | F   | AO211_X0P7M_A9TL   | 4    | 182    | 254   | 7126  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1144_6161/Y | -            | A->Y   | R   | NOR3_X1M_A9TL      | 2    | 313    | 246   | 7372  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1138_7098/Y | -            | A1->Y  | R   | AO21B_X1M_A9TL     | 5    | 228    | 215   | 7667  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1129_8428/Y | -            | A1->Y  | R   | AO21_X1M_A9TL      | 4    | 184    | 148   | 7915  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1122_7410/Y | -            | B->Y   | F   | NAND2_X1B_A9TL     | 1    | 109    | 105   | 8020  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1104_3680/Y | -            | A1->Y  | F   | AO211_X0P7M_A9TL   | 2    | 127    | 246   | 8166  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1313_8246/Y | -            | A1->Y  | R   | AO121_X1M_A9TL     | 2    | 248    | 206   | 8304  |         |
| u_hasher/WALLACE_CSA_DUMMY_OP_group1_g1097_6260/Y | -            | A1N->Y | R   | AO12XB1_X1M_A9TL   | 1    | 256    | 294   | 8598  |         |
| g191607/Y                                         | -            | B->Y   | F   | AO21_X0P5M_A9TL    | 1    | 90     | 182   | 8780  |         |
| u_sorter_new_pack_r_reg[signature][31]/B          | -            | -      | F   | A2DFQ_X3M_A9TL     | 1    | -      | 0     | 8780  |         |

Figure 5.4: Timing Analysis: reg2reg

The frequency for a critical path can be calculated using the formula:

$$\text{Frequency} = \frac{1}{\text{Critical Path Delay}}$$

where the delay is converted from picoseconds (ps) to seconds (s).

**Path from u\_kmer\_buffer/kmer\_buffer\_reg[14]/CK:**

$$\text{Critical Path Delay} = 8780 \text{ ps} = 8780 \times 10^{-12} \text{ s}$$

$$\text{Frequency} = \frac{1}{8780 \times 10^{-12}} \approx 113.9 \text{ MHz}$$

## 5.5.2 Cell Types and Delays

The timing reports reveal the use of various standard cell types:

- Flip-flops: DFFRPQN\_X2M\_A9TL
- Logic gates: INV (inverters), AND4, NAND2B, NOR2, XOR2
- Complex gates: OAII1, AOI21, MXT2

Transition delays vary significantly between cells, ranging from 68 ps to 320 ps, depending on the cell type and fanout.

Clock-to-Q delays are visible in the reports, particularly for flip-flops:

- The DFFRPQN\_X2M\_A9TL cell shows a clock-to-Q delay of 318 ps.
- For the u\_kmer\_buffer component, there's a notable clock-to-Q delay of 373 ps.

### 5.5.3 Fanout Analysis

Fanout values are reported for each cell in the critical paths:

- Most cells have relatively low fanout (1-4), which is good for maintaining signal integrity.
- Some cells, like MXT2\_X3M\_A9TL, have higher fanout (11), which may contribute to increased delays.

### 5.5.4 Conclusion

The timing analysis reveals complex path structures with varying delays across different components of the design. The longest path observed has an arrival time of 8780 ps, which could be a potential bottleneck for overall system performance. Further optimization may be necessary, focusing on cells with high fanout or long propagation delays to improve the overall timing of the design.

## 5.6 Power Estimation

The power estimation report generated by the Joules engine provides detailed information on the leakage, internal, and switching power of different components within the design. The total power consumption is broken down by different categories, as shown in the table below.

### 5.6.1 Component Power Breakdown

The total power consumption of the design is approximately 8.99 mW. The breakdown of power consumption by category is as follows:

- **Registers:** Registers account for 34.60% of the total power, consuming 3.11 mW. This includes 20.18 nW in leakage power, 2.51 mW in internal power, and 0.58 mW due to switching.
- **Logic:** Logic gates dominate the power consumption, contributing 65.32% of the total power at 5.87 mW. This includes 54.47 nW in leakage power, 2.00 mW in internal power, and 3.81 mW due to switching.
- **Latches:** Latches have a minimal impact on the overall power consumption, contributing only 0.06% or 5.06  $\mu$ W.
- **Clock:** The clock network consumes only 2.07  $\mu$ W, which accounts for 0.02% of the total power. This includes minimal leakage and internal power.

From the investigation of the report it appears that the power for the memory does not exist, but this result was obtained because the memory was defined in the FM component as a register. In order to get a stand on the power distribution, we will extract from the power report of the FM component its total power and estimate that it is the total power for memory and the remainder is the total power for the registers. This assessment is based on the fact that the absolute majority of the memory in the overall system is in the FM component and also that the logic in it can be balanced in its power consumption for other memories in the system. The FM component requires a total power of approximately 3.21 mW, which we will assume represents the total power consumption of the memory in the system.

With the assumption that the FM component power represents the memory's power consumption, we can now distribute the remaining power ( $8.99 \text{ mW} - 3.21 \text{ mW} = 5.77 \text{ mW}$ ) among the registers and logic. The revised power breakdown is as follows:

- **Memory (in FM component):** Consumes 3.21 mW, or 35.76% of the total system power.
- **Registers (outside FM component):** The remaining registers now consume 1.81 mW, accounting for 20.20% of the total system power.
- **Logic:** Logic gates outside the FM component consume 3.96 mW, which accounts for 44.04% of the total system power.

### 5.6.2 Power Distribution

The power report shows that dynamic power (comprising internal and switching power) accounts for the majority of the total power consumption:

- **Internal power** represents 50.25% of the total power, mainly driven by register and logic activities.
- **Switching power** represents 48.92%, indicating that a significant amount of power is consumed by signal transitions in the logic and registers.
- **Leakage power** accounts for only 0.83% of the total power, which is typical for designs using modern low-power technologies.

The table provides a comprehensive overview of the power distribution, with the majority of power being consumed by the logic cells due to high switching activity.

## 5.7 Optimization Results

The synthesis process includes several optimization stages:

1. Generic synthesis (syn\_generic)
2. Technology mapping (syn\_map)
3. Post-synthesis optimization (syn\_opt)

We did both physical-aware and non-physical-aware synthesis, controlled by the phys\_synth\_type variable. Physical-aware synthesis can potentially improve the results by considering the placement information during optimization. We set optimization efforts to "low" for all stages:

```
set_db syn_generic_effort low
set_db syn_map_effort low
set_db syn_opt_effort low
```

# Chapter 6

## Summary & Discussion

In this chapter, we present a comprehensive summary of the key findings from our project, followed by an in-depth discussion of their implications within the broader context of existing literature and future research directions.

Our study has yielded several significant outcomes, chief among them being the successful implementation of the ACMI component. This novel hardware solution has demonstrated a marked improvement in efficiency when integrated within the VIRAL algorithm, a cutting-edge approach in machine learning. The results of our extensive testing and analysis indicate that ACMI provides both high accuracy and impressive speed in genome sequencing tasks, particularly when compared to current state-of-the-art methods such as UShER and Kraken2.

We have meticulously documented the functionality of each sub-component within ACMI, as well as the performance of the integrated system as a whole. The design was proposed and implemented using the SystemVerilog hardware description language, a choice that allowed for efficient hardware modeling and verification. We have included the associated code and test files in our documentation, providing a thorough discussion of the results obtained from each stage of the development process.

### 6.1 Discussion of Results

Our findings strongly suggest that the ACMI component effectively addresses the significant challenges posed by large-scale genome data processing, a critical issue in modern genomics research. The integration of ACMI with the Vision Transformer algorithm has yielded substantial enhancements in both the speed and accuracy of lineage assignment. This improvement is particularly evident in the detailed power and timing analyses we conducted, which revealed impressive performance metrics.

When compared to existing tools like UShER and Kraken2, the ACMI component demonstrated superior performance, especially at lower genome coverage levels. This characteristic makes ACMI a particularly promising tool for future genomics research, as it could potentially enable more efficient analysis of partial or low-quality genomic data, a common challenge in real-world applications.

To further validate our design, we performed a comprehensive synthesis of the ACMI component. This process allowed us to extract crucial information about its size, determine its potential operating frequency, and estimate its power consumption under various conditions. The synthesis results aligned remarkably well with our initial expectations, confirming the effectiveness of the various optimizations we applied during the design and implementation phases.

Our analysis revealed that the ACMI component is not only resource-efficient but also capable of operating at high frequencies. This combination of efficiency and performance is crucial for real-world applications in genomics, where large datasets need to be processed quickly and accurately. The viability of our hardware design was thus confirmed, paving the way for potential future implementations in specialized genomic analysis hardware.

## 6.2 Conclusion

In conclusion, the successful implementation of the ACMI component represents a significant leap forward in the field of computational genomics. The results of our comprehensive study clearly demonstrate its potential for substantially improving both the efficiency and accuracy of genome sequencing processes. As genomic data continues to grow in volume and complexity, tools like ACMI will become increasingly vital in enabling researchers to extract meaningful insights from this wealth of information.

By providing a detailed showcase of the functionality of each sub-component and the ACMI system as a whole, we have offered a thorough understanding of the intricacies involved in this hardware implementation. The synthesis results, which closely matched our expectations in terms of size, operating frequency, and power consumption, further underscore the success of this project and validate our approach to hardware-accelerated genomics.

This work lays a solid foundation for further exploration into the realm of hardware-accelerated genomics. It opens up new avenues for high-performance computing in biology, potentially revolutionizing how we approach complex genomic analyses. Future research could focus on further optimizing the ACMI design, exploring its application in other areas of computational biology, or investigating its potential integration with other cutting-edge algorithms in machine learning and artificial intelligence.

# Chapter 7

## Appendices

### Top

top.sv

```
1 `timescale 1ns / 1ps
2 import proj_pkg::*;
3 module proj_top (
4     input wire clk,
5     input wire rst_n,
6     input wire [proj_pkg::BASE_LEN-1:0] in_data,
7     output wire [proj_pkg::FRAG_LEN-1:0] out_fragment,
8     output wire out_wait
9 );
10    // Internal signals
11    wire [proj_pkg::INDICE_LEN-1:0] counter_index;
12    wire counter_finished;
13    wire [proj_pkg::KMER_LEN*proj_pkg::BASE_LEN-1:0] kmer;
14    wire kmer_buffer_full;
15    wire [proj_pkg::HASHER_SORTER_SIGNATURE-1:0] signature;
16    wire [proj_pkg::SORTER_EXTENDER_INDICES_COUNT-1:0]
17    [proj_pkg::INDICE_LEN-1:0] sorted_indices;
18    wire sort_valid;
19    wire [proj_pkg::SIGNED_INDICE_LEN-1:0] extender_index;
20    wire [proj_pkg::FM_EXTENDER_FRAG_LEN_BITS-1:0] fm_fragment;
21    // FM module
22    proj_fm #(
23        .BUFFER_COUNT(proj_pkg::FM_BUFFER_COUNT),
24        .RAMS(proj_pkg::FM_RAMs_COUNT),
25        .ENTRIES(proj_pkg::FM_ENTRIES_COUNT),
26        .OFFSET(proj_pkg::FM_OFFSET_COUNT),
27        .DATA_BITS(proj_pkg::FM_DATA_BITS),
28        .INDICE_LEN(proj_pkg::INDICE_LEN),
29        .SIGNED_INDICE_LEN(proj_pkg::SIGNED_INDICE_LEN),
30        .FRAG_LEN(proj_pkg::FM_EXTENDER_FRAG_LEN_BITS)
31    ) u_fm (
32        .clk(clk),
33        .rst_n(rst_n),
34        .in_wdata(in_data),
35        .chg_idx(counter_finished),
36        .frag_idx(extender_index),
```

```

37     .out_rdata(fm_fragment),
38     .out_wait(out_wait)
39   );
40   // Counter module
41   proj_counter#(
42     .INDICE_LEN(proj_pkg::INDICE_LEN)
43   ) u_counter
44   (
45     .clk(clk),
46     .rst_n(rst_n),
47     .start(kmer_buffer_full),
48     .index(counter_index),
49     .finished_count(counter_finished)
50   );
51   // Kmer Buffer module
52   proj_kmer_buffer #(
53     .DATA_BITS(proj_pkg::KMER_BUFFER_BITS),
54     .KMER_LEN(proj_pkg::KMER_BUFFER_LEN),
55     .OUT_KMER(proj_pkg::KMER_BUFFER_LEN *
56       ↪ proj_pkg::KMER_BUFFER_BITS)
57   ) u_kmer_buffer (
58     .clk(clk),
59     .rst_n(rst_n),
60     .in_data(in_data),
61     .out_kmer(kmer),
62     .start_over(counter_finished),
63     .full(kmer_buffer_full)
64   );
65   // Hasher module
66   proj_hasher #(
67     .KMER_LEN(proj_pkg::KMER_LEN),
68     .DATA_BITS(proj_pkg::BASE_LEN),
69     .HASHER_DATA_BITS(proj_pkg::HASHER_SORTER_SIGNATURE)
70   ) u_hasher (
71     .clk(clk),
72     .rst_n(rst_n),
73     .seed(32'hac718add),
74     .kmer(kmer),
75     .signature(signature)
76   );
77   // Sorter module
78   proj_sorter #(
79     .INDICES_COUNT(proj_pkg::SORTER_EXTENDER_INDICES_COUNT),
80     .INDICE_LEN(proj_pkg::INDICE_LEN),
81     .SIGNATURE_LEN(proj_pkg::HASHER_SORTER_SIGNATURE),
82     .POSITION_LEN(proj_pkg::SORTER_POSITION_LEN)
83   ) u_sorter (
84     .clk(clk),
85     .rst_n(rst_n),
86     .in_signature(signature),
87     .in_index(counter_index),
88     .out_smallest_idx(sorted_indices),
89     .end_sorting(counter_finished),
     .sort_valid(sort_valid)

```

```

90    );
91    // Extender module
92    proj_extender #(
93        .FRAG_LEN_BITS(proj_pkg::FM_EXTENDER_FRAG_LEN_BITS),
94        .FRAG_SIZE(proj_pkg::FRAG_LEN),
95        .KMER_SIZE(proj_pkg::KMER_LEN),
96        .INDICES_COUNT(proj_pkg::SORTER_EXTENDER_INDICES_COUNT),
97        .INDICE_LEN(proj_pkg::INDICE_LEN),
98        .SIGNED_INDICE_LEN(proj_pkg::SIGNED_INDICE_LEN),
99        .FRAG_PART_ONE_HOT(proj_pkg::EXTENDER_OUT_PART_LEN_ONE_HOT),
100       .BASE_LEN(proj_pkg::BASE_LEN),
101       .ONE_HOT_LEN(proj_pkg::ONE_HOT_LEN),
102       .FRAG_PART(proj_pkg::EXTENDER_OUT_PART_LEN)
103   ) u_extender(
104       .clk(clk),
105       .rst_n(rst_n),
106       .in_fragment(fm_fragment),
107       .in_kmer_indices(sorted_indices),
108       .valid_indices(sort_valid),
109       .out_index(extender_index),
110       .out_gfm(out_fragment)
111   );
112 endmodule

```

### proj\_pkg.sv

```

1 // Filename: proj_pkg.svh
2
3 // Define the package
4 package proj_pkg;
5
6     // Generic parameters
7     // Base length
8     parameter BASE_LEN = 2;
9     // Kmer length in Bases
10    parameter KMER_LEN = 16;
11    // Fragment length in bases
12    parameter FRAG_LEN = 256;
13    // single one hot encoding base len
14    parameter ONE_HOT_LEN = 4;
15
16    // FM parameters
17    parameter FM_DATA_BITS = BASE_LEN;
18    // Number of buffers in the FM
19    parameter FM_BUFFER_COUNT = 2;
20    // Number of RAMs in each buffer
21    parameter FM_RAMs_COUNT = 8;
22    // Number of entries in each RAM
23    parameter FM_ENTRIES_COUNT = 128;
24    // Size of the offset in each entry
25    parameter FM_OFFSET_COUNT = 32;
26    // Size of each buffer
27    parameter FM_BUFFER_SIZE = FM_RAMs_COUNT *

```

```

28      ↪ FM_ENTRIES_COUNT * FM_OFFSET_COUNT;
29 // Size of the FM - Extender fragment, in bits
30 parameter FM_EXTENDER_FRAG_LEN_BITS = BASE_LEN * FRAG_LEN;
31 // Length of a certain index
32 parameter INDICE_LEN = $clog2(FM_BUFFER_SIZE);
33 // Length of a certain signed index
34 parameter SIGNED_INDICE_LEN = INDICE_LEN+1;
35
36 // Kmer Buffer parametres
37 parameter KMER_BUFFER_BITS = BASE_LEN;
38 parameter KMER_BUFFER_LEN = KMER_LEN;
39
40 // Sorter and Extender parameters
41 parameter SORTER_EXTENDER_INDICES_COUNT = 256;
42
43 // Sorter and Hasher parameters
44 parameter HASHER_SORTER_SIGNATURE = 32;
45
46 // Sorter parameters and structures
47 parameter SORTER_POSITION_LEN =
48     ↪ $clog2(SORTER_EXTENDER_INDICES_COUNT);
49 parameter POSITION_LEN = SORTER_POSITION_LEN;
50 typedef struct packed {
51     logic [HASHER_SORTER_SIGNATURE-1:0] signature;
52     logic [INDICE_LEN-1:0] index;
53 } signature_index_pack;
54
55 // Extender parameters
56 parameter EXTENDER_OUT_PART_COUNT = 2;
57 parameter EXTENDER_OUT_PART_LEN = EXTENDER_OUT_PART_COUNT
58     ↪ * BASE_LEN;
59 parameter EXTENDER_OUT_PART_LEN_ONE_HOT =
60     ↪ EXTENDER_OUT_PART_COUNT * ONE_HOT_LEN;
61
62 endpackage

```

### proj\_top\_tb.sv

```

1 `timescale 1ns / 1ps
2 import proj_pkg::*;
3
4 module proj_top_tb;
5
6     // Testbench signals
7     reg clk;
8     reg rst_n;
9     reg [proj_pkg::BASE_LEN-1:0] in_data;
10    wire [proj_pkg::EXTENDER_OUT_PART_LEN_ONE_HOT-1:0]
11        ↪ out_fragment;
12    wire out_wait;
13
14    // Instantiate the DUT (Device Under Test)
15    proj_top dut (
16        .clk(clk),

```

```

16      .rst_n(rst_n),
17      .in_data(in_data),
18      .out_fragment(out_fragment),
19      .out_wait(out_wait)
20  );
21
22  // Clock generation
23  always #5 clk = ~clk;
24
25  // Testbench stimulus
26  initial begin
27      // Initialize signals
28      clk = 0;
29      rst_n = 0;
30      in_data = 0;
31
32      // Reset
33      #20 rst_n = 1;
34
35      // Test with random data
36      repeat (1000) begin
37          @ (posedge clk);
38          if (!out_wait) begin
39              in_data = $random;
40          end
41      end
42
43      // End simulation
44      #100 $finish;
45  end
46
47  // Monitor outputs
48  always @ (posedge clk) begin
49      if (!out_wait) begin
50          $display("Time %t: in_data = %h, out_fragment = %h", $time, in_data, out_fragment);
51      end
52  end
53
54 endmodule

```

## Fragment Memory

|                                                                                                                                                                                                                                         |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| proj_fm.sv                                                                                                                                                                                                                              |
| <pre> 1 `timescale 1ns / 1ps 2 3 module proj_fm #( 4     // Module parameters 5     parameter BUFFER_COUNT = proj_pkg::FM_BUFFER_COUNT, 6         // Number of buffers in the FM 7     parameter RAMS = proj_pkg::FM_RAMs_COUNT, </pre> |

```

    ↵ // Number of RAMs in each buffer
7   parameter ENTRIES = proj_pkg::FM_ENTRIES_COUNT,
    ↵           // Number of entries in each RAM
8   parameter OFFSET = proj_pkg::FM_OFFSET_COUNT,
    ↵ // Size of the offset in each entry
9   parameter DATA_BITS = proj_pkg::FM_DATA_BITS,           //
    ↵ Width of each memory cell
10  parameter INDICE_LEN = proj_pkg::INDICE_LEN,             //
    ↵ Length of the index
11  parameter SIGNED_INDICE_LEN = proj_pkg::SIGNED_INDICE_LEN,
    ↵ // Length of signed index
12  parameter FRAG_LEN = proj_pkg::FM_EXTENDER_FRAG_LEN_BITS
    ↵ // Length of the fragment - in bits!
13 ) (
14   // Module ports
15   input wire                               clk,      // Clock signal
16   input wire                               rst_n,    // Reset signal
17   ↵ (active low)
18   input wire [DATA_BITS-1:0]               in_wdata, // Input data
19   input wire                               chg_idx,  // Change
    ↵ index signal
20   input logic [SIGNED_INDICE_LEN-1:0] frag_idx, // Fragment
    ↵ index
21   output wire [FRAG_LEN-1:0]              out_rdata, // Output
    ↵ data
22   output wire                           out_wait // wait signal
    ↵ for the module before the ACMI
23 );
24
25 // Local parameters
26 localparam FM_BUFFER_SIZE = 32; // RAMS * ENTRIES *
    ↵ OFFSET = 2 * 8 * 2
27 localparam RAM_ADDR_BITS = 1;    // $clog2(RAMS) =
    ↵ $clog2(2)
28 localparam ENTRIES_ADDR_BITS = 3; // $clog2(ENTRIES) =
    ↵ $clog2(8)
29 localparam OFFSET_ADDR_BITS = 1; // $clog2(OFFSET) =
    ↵ $clog2(2)
30 localparam ADDR_BITS = 5;        // RAM_ADDR_BITS +
    ↵ ENTRIES_ADDR_BITS + OFFSET_ADDR_BITS
31
32 // Internal signals
33 logic clk, rst_n;
34 logic [ADDR_BITS-1:0] waddr, waddr_next;
35 logic [DATA_BITS-1:0] wdata;
36 logic
    ↵ [BUFFER_COUNT-1:0] [FM_BUFFER_SIZE-1:0] [DATA_BITS-1:0]
    ↵ FMbuffers;
37 logic end_addr, hold_cond, rst_addr;
38 logic wr_idx, rd_idx;
39 logic [FRAG_LEN-1:0] padded_fragment;
40 logic [INDICE_LEN-1:0] raddr, zeros_count;
41 logic [SIGNED_INDICE_LEN-1:0] flip_frag_idx;
42 logic we;

```

```

42
43 // Input assignments
44 assign wdata = hold_cond ? FMbuffers[wr_idx][waddr] :
45   ↪ in_wdata;
46
47 // Output assignment
48 assign out_rdata = padded_fragment;
49
50 assign out_wait = hold_cond;
51
52 // Address and control logic
53 assign waddr_next = end_addr ? 1'b0 : waddr + 1'b1;
54 assign end_addr = (waddr == (FM_BUFFER_SIZE-1)) ? 1'b1 :
55   ↪ 1'b0;
56 assign hold_cond = end_addr & ~chg_idx;
57 assign rst_addr = end_addr & chg_idx;
58 assign rd_idx = ~wr_idx;
59 assign we = ~hold_cond || end_addr;
60
61 // Fragment preparation logic
62 always_comb begin
63   padded_fragment = '0;
64   zeros_count = '0;
65   flip_frag_idx = '0;
66
67   if (frag_idx[SIGNED_INDICE_LEN-1] == 1'b1) begin
68     flip_frag_idx = (~frag_idx + 1'b1);
69     zeros_count = flip_frag_idx[INDICE_LEN-1:0];
70     raddr = '0;
71   end else begin
72     raddr = frag_idx[INDICE_LEN-1:0];
73   end
74 end
75
76 genvar i;
77 generate
78   for (i = 0; i < 16; i++) begin : gen_padded_fragment
79     ↪ // FRAG_LEN is 16
80     always_comb begin
81       if (i < (16 - zeros_count) && (raddr + i >= 0)
82         ↪ && (raddr + i < 32)) begin // 
83         ↪ FM_BUFFER_SIZE is 32
84         padded_fragment[i + (zeros_count << 1)] =
85           ↪ FMbuffers[rd_idx][raddr + (i >
86             ↪ 1)][i[0]];
87       end else begin
88         padded_fragment[i + (zeros_count << 1)] =
89           ↪ 1'b0;
90       end
91     end
92   end
93 endgenerate
94
95 // Sequential logic

```

```

88    always_ff @(posedge clk or negedge rst_n) begin
89        if (~rst_n) begin
90            wr_idx <= '0;
91            waddr <= '0;
92        end else begin
93            // Change index logic
94            if (chg_idx) begin
95                wr_idx <= rd_idx;
96            end
97
98            // Write address logic
99            if (hold_cond) begin
100                waddr <= waddr;
101            end else if (rst_addr) begin
102                waddr <= '0;
103            end else begin
104                waddr <= waddr_next;
105            end
106
107            // Write data to buffers
108            if (we) begin
109                FMbuffers[wr_idx][waddr] <= wdata;
110            end
111        end
112    end
113
114 endmodule

```

### proj\_fm\_tb.sv

```

1 `timescale 1ns / 1ps
2 import proj_pkg::*;
3
4 module proj_fm_tb();
5
6     // Parameters
7     localparam BUFFER_COUNT = proj_pkg::FM_BUFFER_COUNT;
8         // Number of buffers in the FM
9     localparam RAMS = proj_pkg::FM_RAMs_COUNT;
10        // Number of RAMs in each buffer
11     localparam ENTRIES = proj_pkg::FM_ENTRIES_COUNT;
12         // Number of entries in each RAM
13     localparam OFFSET = proj_pkg::FM_OFFSET_COUNT;
14         // Size of the offset in each entry
15     localparam DATA_BITS = proj_pkg::FM_DATA_BITS;
16         // Width of each memory cell
17     localparam INDICE_LEN = proj_pkg::INDICE_LEN;           //
18         // Length of the index
19     localparam SIGNED_INDICE_LEN =
20         proj_pkg::SIGNED_INDICE_LEN; // Length of signed
21         index
22     localparam FRAG_LEN = proj_pkg::FM_EXTENDER_FRAG_LEN_BITS;
23         // Length of the fragment - in bits!

```

```

15     localparam FM_BUFFER_SIZE = proj_pkg::FM_BUFFER_SIZE;
16     localparam KMER_BUFFER_SIZE = proj_pkg::KMER_LEN;
17     localparam TESTS_NUM = 10;
18
19     // Signals
20     logic [DATA_BITS-1:0] in_wdata;
21     logic [DATA_BITS-1:0] data;
22     logic [FRAG_LEN-1:0] out_rdata;
23     logic clk;
24     logic rst_n;
25     logic chg_idx;
26     logic [SIGNED_INDICE_LEN-1:0] frag_idx;
27
28     // Instantiate the DUT
29     proj_fm #(
30         .BUFFER_COUNT(BUFFER_COUNT),
31         .RAMS(RAMS),
32         .ENTRIES(ENTRIES),
33         .OFFSET(OFFSET),
34         .DATA_BITS(DATA_BITS),
35         .INDICE_LEN(INDICE_LEN),
36         .SIGNED_INDICE_LEN(SIGNED_INDICE_LEN),
37         .FRAG_LEN(FRAG_LEN)
38     ) dut (
39         .in_wdata(in_wdata),
40         .out_rdata(out_rdata),
41         .clk(clk),
42         .rst_n(rst_n),
43         .chg_idx(chg_idx),
44         .frag_idx(frag_idx)
45     );
46
47     // Clock generation
48     always #5 clk = ~clk;
49
50     // Testbench logic
51     initial begin
52         // Initialize signals
53         clk = 0;
54         rst_n = 0;
55         in_wdata = 0;
56         chg_idx = 0;
57         frag_idx = 0;
58
59         // Reset
60         #10 rst_n = 1;
61
62
63     for (int test = 0; test < TESTS_NUM; test++) begin
64         $display("Starting test %0d", test);
65
66         // Write to the first buffer
67         for (int i = 0; i < FM_BUFFER_SIZE; i++) begin
68             @ (negedge clk);

```

```

69      in_wdata = $random & ((1 << DATA_BITS) - 1);
70  end
71 // Wait KMER_BUFFER_SIZE cycles after writing to the
72 //   ↪ first buffer
73 repeat(KMER_BUFFER_SIZE) @ (posedge clk);
74 // Assert chg_idx on the negedge of the clock
75 @ (negedge clk);
76 chg_idx = 1;
77 // Deassert chg_idx on the next negedge
78 @ (negedge clk);
79 chg_idx = 0;

80 // Test reading from both buffers
81 for (int i = 0; i < BUFFER_COUNT; i++) begin
82     frag_idx = 0;
83     @ (negedge clk);
84     $display("Test_%0d,_Buffer_%0d,_frag_idx=%0d,_
85         ↪ out_rdata=%b", test, i, frag_idx,
86         ↪ out_rdata);
87     frag_idx = FM_BUFFER_SIZE - FRAG_LEN;
88     @ (negedge clk);
89     $display("Test_%0d,_Buffer_%0d,_frag_idx=%0d,_
90         ↪ out_rdata=%b", test, i, frag_idx,
91         ↪ out_rdata);
92     // Test negative index
93     frag_idx = -2;
94     @ (negedge clk);
95     $display("Test_%0d,_Buffer_%0d,_frag_idx=%0d,_
96         ↪ out_rdata=%b", test, i, $signed(frag_idx),
97         ↪ out_rdata);
98 end

99 // Write to the second buffer
100 for (int i = 0; i < FM_BUFFER_SIZE; i++) begin
101     @ (negedge clk);
102     in_wdata = $random & ((1 << DATA_BITS) - 1);
103 end
104 // Wait KMER_BUFFER_SIZE cycles after writing to the
105 //   ↪ first buffer
106 repeat(KMER_BUFFER_SIZE) @ (posedge clk);
107 // Assert chg_idx on the negedge of the clock
108 @ (negedge clk);
109 chg_idx = 1;
110 // Deassert chg_idx on the next negedge
111 @ (negedge clk);
112 chg_idx = 0;
113 // Test reading from both buffers
114 for (int i = 0; i < BUFFER_COUNT; i++) begin
115     frag_idx = 0;
116     @ (negedge clk);
117     $display("Test_%0d,_Buffer_%0d,_frag_idx=%0d,_
118         ↪ out_rdata=%b", test, i, frag_idx,
119         ↪ out_rdata);

```

```

113     frag_idx = FM_BUFFER_SIZE - FRAG_LEN;
114     @ (negedge clk);
115     $display("Test_%0d,_Buffer_%0d,_frag_idx=%0d,_
116             ↪ out_rdata=%b", test, i, frag_idx,
117             ↪ out_rdata);
118     // Test negative index
119     frag_idx = -2;
120     @ (negedge clk);
121     $display("Test_%0d,_Buffer_%0d,_frag_idx=%0d,_
122             ↪ out_rdata=%b", test, i, $signed(frag_idx),
123             ↪ out_rdata);
124     end
125
126     $display("Finished_test_%0d", test);
127 end
128
129 endmodule

```

## Hasher

**proj\_hasher.sv**

```

1 `timescale 1ns / 1ps
2
3 module proj_hasher
4 #(
5   parameter KMER_LEN = proj_pkg::KMER_LEN,
6   parameter DATA_BITS = proj_pkg::BASE_LEN,
7   parameter HASHER_DATA_BITS =
8     ↪ proj_pkg::HASHER_SORTER_SIGNATURE
9 ) (
10   input wire                               clk,      // Clock signal
11   input wire                               rst_n,    // Reset signal
12   ↪ (active low)
13   input wire [HASHER_DATA_BITS-1:0] seed,
14   input wire [HASHER_DATA_BITS-1:0] kmer,
15   output logic [HASHER_DATA_BITS-1:0] signature
16 );
17
18 // Intermediate signals
19 logic [HASHER_DATA_BITS:0] k, key;
20
21 localparam [HASHER_DATA_BITS-1:0] c1 = 32'hcc9e2d51;
22 localparam [HASHER_DATA_BITS-1:0] c2 = 32'h1b873593;
23 localparam [HASHER_DATA_BITS-1:0] m = 32'd5;
24 localparam [HASHER_DATA_BITS-1:0] n = 32'he6546b64;
25
26 // Hash computation
27 always_comb begin

```

```

26     k = kmer;
27     k = k * c1;
28     k = {k[HASHER_DATA_BITS-16:0],
29           ↪ k[HASHER_DATA_BITS-1:HASHER_DATA_BITS-15]}; // 
30           ↪ ROL15
31     k = k * c2;
32     key = seed;
33     key = key ^ k;
34     key = {key[HASHER_DATA_BITS-14:0],
35           ↪ key[HASHER_DATA_BITS-1:HASHER_DATA_BITS-13]}; // 
36           ↪ ROL13
37     signature = key * m + n;
38   end
39
40 endmodule

```

### proj\_hasher\_tb.sv

```

1 `timescale 1ns / 1ps
2 import proj_pkg::*; // Include the package
3 module proj_hasher_tb;
4
5   parameter KMER_LEN = proj_pkg::KMER_LEN;
6   parameter BASE_BITS = proj_pkg::BASE_LEN;
7   parameter HASHER_DATA_BITS =
8     ↪ proj_pkg::HASHER_SORTER_SIGNATURE;
9
10  reg [HASHER_DATA_BITS-1:0] seed_tb;
11  reg [HASHER_DATA_BITS-1:0] kmer_tb;
12  wire [HASHER_DATA_BITS-1:0] signature_tb;
13
14  proj_hasher #( .HASHER_DATA_BITS(HASHER_DATA_BITS) )
15    dut (
16      .seed(seed_tb),
17      .kmer(kmer_tb),
18      .signature(signature_tb)
19    );
20
21  initial begin
22    // Initialize Inputs
23    seed_tb = 0;
24    kmer_tb = 0;
25    #10;
26
27    // Apply Test Vectors
28    seed_tb = 32'hac718add;
29    kmer_tb = 32'hab1020c5;
30
31    #10;
32
33    // Display the result
34    $display("Seed: %h, Kmer: %h, Signature: %h", seed_tb,
35           ↪ kmer_tb, signature_tb);

```

```

34
35
36 // Test reading from both buffers
37 for (int i = 0; i < HASHER_DATA_BITS; i++) begin
38   kmer_tb = $random ;
39
40   #10;
41
42 // Display the result
43 $display("Seed:_%h,_Kmer:_%h,_Signature:_%h", seed_tb,
44           ↪ kmer_tb, signature_tb);
45 end
46 #10;
47 $finish;
48 end
49 endmodule

```

## Sorter

### proj\_sorter.sv

```

1 import proj_pkg::*;

2 module proj_sorter #(
3   // Module parameters, using values from the project package
4   parameter INDICES_COUNT =
5     ↪ proj_pkg::SORTER_EXTENDER_INDICES_COUNT,
6   parameter INDICE_LEN = proj_pkg::INDICE_LEN,
7   parameter SIGNATURE_LEN =
8     ↪ proj_pkg::HASHER_SORTER_SIGNATURE,
9   parameter POSITION_LEN = proj_pkg::SORTER_POSITION_LEN
10 ) (
11   // Module inputs and outputs
12   input wire [SIGNATURE_LEN-1:0] in_signature,
13   input wire [INDICE_LEN-1:0] in_index,
14   output wire [INDICES_COUNT-1:0][INDICE_LEN-1:0]
15     ↪ out_smallest_idx,
16   input wire rst_n,
17   input wire end_sorting,
18   output wire sort_valid,
19   input wire clk
20 );
21   // Internal signals
22   signature_index_pack [INDICES_COUNT-1:0] smallest_idx_next;
23   signature_index_pack [INDICES_COUNT-1:0] smallest_idx_curr;
24   signature_index_pack new_pack;
25   logic [INDICES_COUNT-1:0] count_signatures_smaller_than;
26   logic [POSITION_LEN:0] position_smaller_than;
27   logic [POSITION_LEN:0] new_position_long;
28   logic [POSITION_LEN-1:0] new_position;
29
30   // Assign input values to internal signals

```

```

29     assign new_pack.signature = in_signature;
30     assign new_pack.index = in_index;
31
32
33     always_comb begin
34         count_signatures_smaller_than = '0;
35         position_smaller_than = '0;
36         new_position = '0;
37         new_position_long = '0;
38         for (int i = 0; i < INDICES_COUNT; i= i+1) begin
39             count_signatures_smaller_than[i] =
39                 ↪ (new_pack.signature <
39                  ↪ smallest_idx_curr[i].signature) ? 1'b1 :
39                  ↪ 1'b0;
40             position_smaller_than +=
40                 ↪ count_signatures_smaller_than[i];
41             if (position_smaller_than == '0) begin
42                 smallest_idx_next[i].signature =
42                     ↪ smallest_idx_curr[i].signature;
43                 smallest_idx_next[i].index =
43                     ↪ smallest_idx_curr[i].index;
44             end else begin
45                 if (i < new_position) begin
46                     smallest_idx_next[i].signature =
46                         ↪ smallest_idx_curr[i].signature;
47                     smallest_idx_next[i].index =
47                         ↪ smallest_idx_curr[i].index;
48                 end else if (i > new_position) begin
49                     smallest_idx_next[i].signature =
49                         ↪ smallest_idx_curr[i-1].signature;
50                     smallest_idx_next[i].index =
50                         ↪ smallest_idx_curr[i-1].index;
51                 end
52             end
53         end
54         new_position_long = INDICES_COUNT -
54             ↪ position_smaller_than;
55         new_position = new_position_long[POSITION_LEN-1:0];
56     end
57
58
59 // Assign sorted indices to output
60 generate
61     for (genvar j = 0; j < INDICES_COUNT; j++) begin
62         assign out_smallest_idx[j] = end_sorting ?
62             ↪ smallest_idx_curr[j].index : 'x;
63     end
64 endgenerate
65
66 assign sort_valid = end_sorting ? 1'b1 : 1'b0;
67
68 // Sequential logic for updating current smallest indices
69 always_ff @(posedge clk) begin
70     for (int i = 0; i < INDICES_COUNT; i++) begin

```

```

71      if (~rst_n) begin
72          // Reset values
73          smallest_idx_curr[i].signature <= '1;
74          smallest_idx_curr[i].index <= '0;
75      end else begin
76          // Update with new values
77          if ((i == new_position) &
78              ↪ (position_smaller_than != '0)) begin
79              smallest_idx_curr[i].signature <=
80                  ↪ new_pack.signature;
81              smallest_idx_curr[i].index <=
82                  ↪ new_pack.index;
83          end else begin
84              smallest_idx_curr[i].signature <=
85                  ↪ smallest_idx_next[i].signature;
86              smallest_idx_curr[i].index <=
87                  ↪ smallest_idx_next[i].index;
88          end
89      end
90  end
91 endmodule

```

### proj\_sorter\_tb.sv

```

1 `timescale 1ns / 1ps
2 import proj_pkg::*;
3
4 module proj_sorter_tb;
5     // Parameters
6     localparam INDICES_COUNT =
7         ↪ proj_pkg::SORTER_EXTENDER_INDICES_COUNT;
8     localparam INDICE_LEN = proj_pkg::INDICE_LEN;
9     localparam SIGNATURE_LEN =
10        ↪ proj_pkg::HASHER_SORTER_SIGNATURE;
11
12     // Inputs
13     logic [SIGNATURE_LEN-1:0] in_signature;
14     logic [INDICE_LEN-1:0] in_index;
15     logic in_rst_n;
16     logic in_clk;
17     logic end_sorting;
18
19     // Outputs
20     logic [INDICES_COUNT-1:0][INDICE_LEN-1:0] out_smallest_idx;
21     logic sort_valid;
22
23     // Instantiate the Unit Under Test (UUT)
24     proj_sorter #(
25         .INDICES_COUNT(INDICES_COUNT),
26         .INDICE_LEN(INDICE_LEN),
27         .SIGNATURE_LEN(SIGNATURE_LEN)
28     ) dut (

```

```

27     .in_signature(in_signature),
28     .in_index(in_index),
29     .out_smallest_idx(out_smallest_idx),
30     .rst_n(in_rst_n),
31     .clk(in_clk),
32     .end_sorting(end_sorting),
33     .sort_valid(sort_valid)
34   );
35
36 // Clock generation
37 always #5 in_clk = ~in_clk;
38
39 // Test procedure
40 initial begin
41   // Initialize inputs
42   in_signature = 0;
43   in_index = 0;
44   in_rst_n = 0;
45   in_clk = 0;
46   end_sorting = 0;
47
48   // Reset
49   #10 in_rst_n = 1;
50
51   // Test case 1: Insert values in descending order
52   for (int i = 10; i > 0; i--) begin
53     in_signature = i * 32'h10101010;
54     in_index = i;
55     #10;
56   end
57   end_sorting = 1;
58   in_signature = 32'hA0A0A0A0;
59   in_index = 10;
60   #10;
61   end_sorting = 0;
62   #10;
63
64   // Test case 2: Insert values in ascending order
65   for (int i = 1; i <= 10; i++) begin
66     in_signature = i * 32'h01010101;
67     in_index = i;
68     #10;
69   end
70   end_sorting = 1;
71   in_signature = 32'hA0A0A0A0;
72   in_index = 10;
73   #10;
74   end_sorting = 0;
75   #10;
76
77   // Test case 3: Insert random values
78   for (int i = 0; i < 20; i++) begin
79     in_signature = $urandom();
80     in_index = $urandom_range(0, 255);

```

```

81          #10;
82      end
83      end_sorting = 1;
84      #10;
85      end_sorting = 0;
86      #10;
87
88      // Test case 4: Reset and insert new values
89      in_RST_N = 0;
90      #10;
91      in_RST_N = 1;
92      for (int i = 0; i < INDICES_COUNT * 2; i++) begin
93          in_signature = $urandom();
94          in_index = $urandom_range(0, 255);
95          #10;
96      end
97      end_sorting = 1;
98      #10;
99      end_sorting = 0;
100     #10;
101
102     // End simulation
103     #100;
104     $finish;
105 end
106
107 // Monitor
108 always @ (posedge in_clk) begin
109     $display("Time=%0t, in_signature=%h, in_index=%h, "
110             "end_sorting=%b, sort_valid=%b",
111             $time, in_signature, in_index, end_sorting,
112             sort_valid);
113     if (sort_valid) begin
114         for (int i = 0; i < INDICES_COUNT; i++) begin
115             $display("out_smallest_idx[%0d]=%h", i,
116                     out_smallest_idx[i]);
117         end
118         $display("-----");
119     end
120 end
121
122 // Checker
123 always @ (posedge in_clk) begin
124     if (in_RST_N) begin
125         for (int i = 1; i < INDICES_COUNT; i++) begin
126             assert (dut.smallest_idx_curr[i].signature >=
127                     dut.smallest_idx_curr[i-1].signature)
128             else $error("Sorting_error: "
129                         "smallest_idx_curr[%0d].signature=%h is "
130                         "smaller than "
131                         "smallest_idx_curr[%0d].signature=%h",
132                         i,
133                         dut.smallest_idx_curr[i].signature,
134                         i-1,

```

```

126           ↪ dut.smallest_idx_curr[i-1].signature);
127       end
128   end
129
130 // Check sort_valid
131 always @(posedge in_clk) begin
132     assert(sort_valid == end_sorting)
133     else $error("sort_valid_mismatch:_sort_valid=%b,_
134             ↪ end_sorting=%b", sort_valid, end_sorting);
135 end
136 endmodule

```

## Extender

### proj\_extender.sv

```

1 // Timescale directive for simulation
2 `timescale 1ns / 1ps
3 // Import the project package
4 import proj_pkg::*;
5 // Module declaration with parameters
6 module proj_extender #(
7     parameter FRAG_LEN_BITS =
8         ↪ proj_pkg::FM_EXTENDER_FRAG_LEN_BITS,
9     parameter FRAG_SIZE = proj_pkg::FRAG_LEN,
10    parameter KMER_SIZE = proj_pkg::KMER_LEN,
11    parameter INDICES_COUNT =
12        ↪ proj_pkg::SORTER_EXTENDER_INDICES_COUNT,
13    parameter INDICE_LEN = proj_pkg::INDICE_LEN,
14    parameter SIGNED_INDICE_LEN = proj_pkg::SIGNED_INDICE_LEN,
15    parameter FRAG_PART_ONE_HOT =
16        ↪ proj_pkg::EXTENDER_OUT_PART_LEN_ONE_HOT,
17    parameter BASE_LEN = proj_pkg::BASE_LEN,
18    parameter ONE_HOT_LEN = proj_pkg::ONE_HOT_LEN,
19    parameter FRAG_PART = proj_pkg::EXTENDER_OUT_PART_LEN
20 ) (
21     // Input ports
22     input logic [FRAG_LEN_BITS-1:0] in_fragment,
23     input logic [INDICES_COUNT-1:0][INDICE_LEN-1:0]
24         ↪ in_kmer_indices,
25     input wire valid_indices,
26     input wire rst_n,
27     input wire clk,
28     // Output ports
29     output logic [SIGNED_INDICE_LEN-1:0] out_index,
30     output logic [FRAG_PART_ONE_HOT-1:0] out_gfm
31 );
32     // Local parameters
33 localparam FRAG_PARTS_COUNT = (FRAG_LEN_BITS >>
34     ↪ $clog2(FRAG_PART));
35 localparam FRAG_PARTS_COUNT_BITS =
36     ↪ $clog2(FRAG_PARTS_COUNT);

```

```

31 localparam INDICES_COUNT_BITS = $clog2(INDICES_COUNT);
32 // Internal signals
33 logic [FRAG_PARTS_COUNT_BITS-1:0] frag_parts_idx;
34 logic [FRAG_PARTS_COUNT_BITS-1:0] frag_parts_idx_next;
35 logic rst_frag_parts_idx;
36 logic [INDICE_LEN-1:0] curr_index;
37 logic [INDICES_COUNT_BITS-1:0] indices_idx;
38 logic [INDICES_COUNT_BITS-1:0] indices_idx_next;
39 logic [FRAG_PART-1:0] frag_part;
40 logic [INDICES_COUNT-1:0][INDICE_LEN-1:0]
    ↪ in_kmer_indices_r;
41
42 // Combinational logic
43 // Reset fragment parts index when it reaches the maximum
44 assign rst_frag_parts_idx = (frag_parts_idx ==
    ↪ (FRAG_PARTS_COUNT - 1)) ? 1'b1 : 1'b0;
45 // Calculate next fragment parts index
46 assign frag_parts_idx_next = rst_frag_parts_idx ? 1'b0 :
    ↪ frag_parts_idx + 1'b1;
47 // Select current index from input indices
48 assign curr_index = in_kmer_indices_r[indices_idx];
49 // Extract fragment part for output
50 assign frag_part = in_fragment[FRAG_PART*frag_parts_idx +:
    ↪ FRAG_PART];
51
52 generate
53     for (genvar i = 0; i < FRAG_PART >> $clog2(BASE_LEN);
54         ↪ i++) begin : gen_gfm
55         always_comb begin
56             case (frag_part[i*BASE_LEN +: BASE_LEN])
57                 2'b00: out_gfm[i*ONE_HOT_LEN +:
58                     ↪ ONE_HOT_LEN] = 4'b0001;
59                 2'b01: out_gfm[i*ONE_HOT_LEN +:
60                     ↪ ONE_HOT_LEN] = 4'b0010;
61                 2'b10: out_gfm[i*ONE_HOT_LEN +:
62                     ↪ ONE_HOT_LEN] = 4'b0100;
63                 2'b11: out_gfm[i*ONE_HOT_LEN +:
64                     ↪ ONE_HOT_LEN] = 4'b1000;
65                 default: out_gfm[i*ONE_HOT_LEN +:
66                     ↪ ONE_HOT_LEN] = 4'b0000;
67             endcase
68         end
69     end
70 endgenerate
71 // Calculate next indices index
72 assign indices_idx_next = rst_frag_parts_idx ? indices_idx
    ↪ + 1'b1 : indices_idx;
73 // Calculate output index
74 assign out_index = {1'b0, curr_index} -
    ↪ SIGNED_INDICE_LEN'(((FRAG_SIZE - KMER_SIZE) >> 1));
75 // Sequential logic for fragment parts index
76 always_ff @(posedge clk or negedge rst_n) begin :
77     ↪ parts_index
78     if (~rst_n) begin

```

```

72         frag_parts_idx <= '0;
73     end else if (valid_indices) begin
74         frag_parts_idx <= '0;
75     end else begin
76         frag_parts_idx <= frag_parts_idx_next;
77     end
78 end

79
80 // Sequential logic for indices sample
81 always_ff @(posedge clk or negedge rst_n) begin :
82     ↪ index_sample
83     if (valid_indices) begin
84         in_kmer_indices_r <= in_kmer_indices;
85     end
86 end

87 // Sequential logic for indices index
88 always_ff @(posedge clk or negedge rst_n) begin :
89     ↪ indices_index
90     if (~rst_n) begin
91         indices_idx <= '0;
92     end else if (valid_indices) begin
93         indices_idx <= '0;
94     end else begin
95         indices_idx <= indices_idx_next;
96     end
97 end
98 endmodule

```

## proj\_extender\_tb.sv

```

1 `timescale 1ns / 1ps
2 import proj_pkg::*;
3
4 module proj_extender_tb;
5     // Define local parameters
6     localparam FRAG_LEN_BITS =
7         ↪ proj_pkg::FM_EXTENDER_FRAG_LEN_BITS;
8     localparam FRAG_SIZE = proj_pkg::FRAG_LEN;
9     localparam KMER_SIZE = proj_pkg::KMER_LEN;
10    localparam INDICES_COUNT =
11        ↪ proj_pkg::SORTER_EXTENDER_INDICES_COUNT;
12    localparam INDICE_LEN = proj_pkg::INDICE_LEN;
13    localparam SIGNED_INDICE_LEN = proj_pkg::SIGNED_INDICE_LEN;
14    localparam FRAG_PART = proj_pkg::EXTENDER_OUT_PART_LEN;
15    localparam FRAG_PART_ONE_HOT =
16        ↪ proj_pkg::EXTENDER_OUT_PART_LEN_ONE_HOT;
17    localparam BASE_LEN = proj_pkg::BASE_LEN;
18    localparam ONE_HOT_LEN = proj_pkg::ONE_HOT_LEN;
19
20
21 // TB parameters
22 localparam MEM_WIDTH = 32;
23 localparam MEM_DEPTH = 32;

```

```

20     localparam FRAG_PARTS_COUNT = (FRAG_LEN_BITS >>
21         $clog2(FRAG_PART));
22
23     // Declare input and output signals
24     logic [FRAG_LEN_BITS-1:0] in_fragment;
25     logic [INDICES_COUNT-1:0][INDICE_LEN-1:0] in_kmer_indices;
26     logic valid_indices;
27     logic rst_n;
28     logic clk;
29     logic [SIGNED_INDICE_LEN-1:0] out_index;
30     logic [FRAG_PART_ONE_HOT-1:0] out_gfm;
31
32     // Declare external memory and padded fragment
33     logic [MEM_WIDTH-1:0] ext_mem;
34     logic [FRAG_LEN_BITS-1:0] padded_fragment;
35
36     // Implement padding logic
37     always_comb begin
38         padded_fragment = '0; // Initialize with zeros
39         for (int i = 0; i < FRAG_LEN_BITS; i++) begin
40             // Check if index is within valid range
41             if ((out_index + i) >= 0) && (out_index + i <
42                 MEM_WIDTH)) begin
43                 padded_fragment[i] = ext_mem[out_index + i];
44             end
45             // Else, keep as 0 (padded)
46         end
47     end
48
49     int step_count = 0;
50     int current_kmer_index = 0;
51
52     // Assign padded fragment to input fragment
53     assign in_fragment = padded_fragment;
54
55     // Instantiate the Unit Under Test (UUT)
56     proj_extender #(
57         .FRAG_LEN_BITS(FRAG_LEN_BITS),
58         .FRAG_SIZE(FRAG_SIZE),
59         .KMER_SIZE(KMER_SIZE),
60         .INDICES_COUNT(INDICES_COUNT),
61         .INDICE_LEN(INDICE_LEN),
62         .SIGNED_INDICE_LEN(SIGNED_INDICE_LEN),
63         .FRAG_PART(FRAG_PART),
64         .FRAG_PART_ONE_HOT(FRAG_PART_ONE_HOT),
65         .BASE_LEN(BASE_LEN),
66         .ONE_HOT_LEN(ONE_HOT_LEN)
67     ) dut (
68         .in_fragment(in_fragment),
69         .in_kmer_indices(in_kmer_indices),
70         .valid_indices(valid_indices),
71         .rst_n(rst_n),
72         .clk(clk),
73         .out_index(out_index),

```

```

72         .out_gfm(out_gfm)
73     );
74
75     // Clock generation
76     always begin
77         #5 clk = ~clk;
78     end
79
80     // Test procedure
81     initial begin
82         clk = 0;
83         rst_n = 0;
84         valid_indices = 0;
85         #10 rst_n = 1;
86
87         // Initialize external memory with random data
88         for (int i = 0; i < MEM_DEPTH; i++) begin
89             ext_mem = $random;
90         end
91
92         // Generate random indices
93         for (int i = 0; i < INDICES_COUNT; i++) begin
94             in_kmer_indices[i] = $urandom_range(0, 31);
95         end
96
97         // Run 10 test cases
98         for (int test = 0; test < 10; test++) begin
99             $display("Test_case_%0d:", test);
100
101            // Randomize external memory for each test
102            for (int i = 0; i < MEM_DEPTH; i++) begin
103                ext_mem = $random;
104            end
105
106            // Generate new random indices for each test
107            for (int i = 0; i < INDICES_COUNT; i++) begin
108                @(negedge clk);
109                in_kmer_indices[i] = $urandom_range(0, 31);
110            end
111
112            // Assert valid_indices for one clock cycle
113            @(negedge clk);
114            valid_indices = 1;
115            @(negedge clk);
116            valid_indices = 0;
117
118            // Wait for the module to process all fragment
119            // parts and indices
120            for (int i = 0; i < INDICES_COUNT; i++) begin
121                for (int j = 0; j < FRAG_PARTS_COUNT; j++)
122                    begin
123                        @(posedge clk);
124                        print_step_info(i);
125                    end

```

```

124         end
125
126         $display("\n"); // Add a blank line between test
127             ↪ cases
127     end
128
129     // End simulation
130     $finish;
131 end
132
133 // Function to print step information
134 function void print_step_info(int kmer_index);
135     step_count++;
136     $display("Step_%0d:", step_count);
137     $display("Current_kmer_indices[%0d]:_0x%h",
138             ↪ kmer_index, in_kmer_indices[kmer_index]);
139     $display("out_index:_0x%h", out_index);
140     $display("in_fragment:_0x%h", in_fragment);
140     $display("out_gfm_(one-hot)_part_%0d_of_the_"
141             ↪ in_fragment:_0b%b_(binary), _0x%h_
141             ↪ (hexadecimal)", dut.frag_parts_idx + 1, out_gfm,
141             ↪ out_gfm);
142     $display(""); // Add a blank line for readability
143 endfunction
144
144 // Add this to stop the simulation after a certain number
144     ↪ of steps or time
145 initial begin
146     #10000; // Stop after 10,000 time units, adjust as
146         ↪ needed
147     $display("Simulation_stopped_due_to_timeout");
148     $finish;
149 end
150
151 endmodule

```

## Index Counter

### proj\_index\_counter.sv

```

1 import proj_pkg::*; // Include the package
2
3 module proj_counter
4 #(
5     parameter INDICE_LEN = proj_pkg::INDICE_LEN
6 )
7 (
8     output wire [INDICE_LEN-1:0] index,
9     output logic finished_count,
10    input wire clk,
11    input wire rst_n,
12    input wire start
13 );

```

```

14 // Internal signals
15 logic [INDICE_LEN-1:0] out_index;
16 logic end_of_count;
17 logic [INDICE_LEN-1:0] idx_next;
18 logic rst_index;
19 logic count_enabled;
20 logic start_prev;
21
22 // Check if write address reached the end of the buffer
23 assign end_of_count = (out_index == (FM_BUFFER_SIZE-1)) ?
24     ↪ 1'b1 : 1'b0;
25 assign finished_count = end_of_count;
26
27 // Increment write address or keep current value
28 assign idx_next = (count_enabled & ~end_of_count) ?
29     ↪ out_index + 1'b1 : out_index;
30
31 // Reset index when end of count is reached or on reset
32 assign rst_index = end_of_count | (~rst_n);
33
34 // Assign output index
35 assign index = out_index;
36
37 // Sequential logic for updating the index and detecting
38 // start signal
39 always_ff @(posedge clk or negedge rst_n) begin
40     if (~rst_n) begin
41         out_index <= '0;
42         count_enabled <= 1'b0;
43         start_prev <= 1'b0;
44     end else begin
45         start_prev <= start;
46         if (rst_index) begin
47             out_index <= '0;
48             count_enabled <= 1'b0;
49         end else if (start & ~start_prev) begin
50             // Rising edge of start signal
51             count_enabled <= 1'b1;
52             out_index <= '0;
53         end else if (count_enabled) begin
54             out_index <= idx_next;
55         end
56     end
57 end
58
59 endmodule

```

### proj\_indexcounter\_tb.sv

```

1 `timescale 1ns / 1ps
2 import proj_pkg::*;
3
4 module proj_counter_tb();
5     // Parameters

```

```

6      localparam CLK_PERIOD = 10; // 10 ns clock period
7      localparam FM_BUFFER_SIZE = proj_pkg::FM_BUFFER_SIZE;
8
9      // Signals
10     logic clk;
11     logic rst_n;
12     logic [proj_pkg::FM_BUFFER_SIZE-1:0] index;
13     logic finished_count;
14     logic start;
15
16     // Instantiate the Unit Under Test (UUT)
17     proj_counter #(
18         .FM_BUFFER_SIZE(FM_BUFFER_SIZE)
19     )
20     dut (
21         .index(index),
22         .clk(clk),
23         .rst_n(rst_n),
24         .finished_count(finished_count),
25         .start(start)
26     );
27
28     // Clock generation
29     always begin
30         clk = 1'b0;
31         #(CLK_PERIOD/2);
32         clk = 1'b1;
33         #(CLK_PERIOD/2);
34     end
35
36     // Test variables
37     int error_count = 0;
38     logic [proj_pkg::FM_BUFFER_SIZE-1:0] expected_index;
39
40     // Test procedure
41     initial begin
42         $display("Starting proj_counter testbench");
43
44         // Initialize signals
45         rst_n = 1'b0;
46         start = 1'b0;
47         expected_index = '0;
48
49         // Reset the module
50         #(CLK_PERIOD);
51         rst_n = 1'b1;
52
53
54         // Check initial value
55         if (index !== expected_index) begin
56             $display("Error: Initial index mismatch. Expected %0d, got %0d", expected_index, index);
57             error_count++;
58         end

```

```

59     else begin
60         $display("Passed:_Initial_index_match._Expected_"
61             ↪ %0d,_got_%0d", expected_index, index);
62     end
63
64     // Test normal operation
65     start = 1'b1;
66     #(CLK_PERIOD);
67     start = 1'b0;
68
69     // Check that index is 0 immediately after starting
70     // the count
71     if (index !== '0) begin
72         $display("Error:_Index_not_0_after_starting_count._"
73             ↪ Expected_0,_got_%0d_at_time_%0t", index,
74             ↪ $time);
75         error_count++;
76     end else begin
77         $display("Passed:_Index_is_0_after_starting_count_"
78             ↪ at_time_%0t", $time);
79     end
80
81     for (int i = 1; i < proj_pkg::FM_BUFFER_SIZE; i++)
82         begin
83             @ (posedge clk);
84             expected_index = i;
85             #1;
86             if (index !== expected_index) begin
87                 $display("Error:_Index_mismatch_at_cycle_%0d._"
88                     ↪ Expected_%0d,_got_%0d_at_time_%0t", i,
89                     ↪ expected_index, index, $time);
90                 error_count++;
91             end else begin
92                 $display("Passed:_Index_match_at_cycle_%0d._"
93                     ↪ Expected_%0d,_got_%0d_at_time_%0t", i,
94                     ↪ expected_index, index, $time);
95             end
96         end
97
98         // Check finished_count
99         #1;
100        if (finished_count !== 1'b1) begin
101            $display("Error:_finished_count_not_asserted_at_"
102                ↪ the_end_of_count");
103            error_count++;
104        end else begin
105            $display("Passed:_finished_count_asserted_at_the_"
106                ↪ end_of_count");
107        end
108
109        repeat(10) begin
110            @ (posedge clk);
111        end

```

```

101    // Test normal operation again
102    @(negedge clk);
103    start = 1'b1;
104    @(negedge clk);
105    start = 1'b0;
106
107    // Check that index is 0 immediately after starting
108    // the count
109    if (index !== '0) begin
110        $display("Error:_Index_not_0_after_starting_count._"
111            "Expected_0,_got_%0d_at_time_%0t", index,
112            $time);
113        error_count++;
114    end else begin
115        $display("Passed:_Index_is_0_after_starting_count_"
116            "at_time_%0t", $time);
117    end
118
119    for (int i = 1; i < proj_pkg::FM_BUFFER_SIZE; i++)
120        begin
121            @(posedge clk);
122            expected_index = i;
123            #1;
124            if (index !== expected_index) begin
125                $display("Error:_Index_mismatch_at_cycle_%0d._"
126                    "Expected_%0d,_got_%0d_at_time_%0t", i,
127                    expected_index, index, $time);
128                error_count++;
129            end else begin
130                $display("Passed:_Index_match_at_cycle_%0d._"
131                    "Expected_%0d,_got_%0d_at_time_%0t", i,
132                    expected_index, index, $time);
133            end
134        end
135
136    // Check finished_count again
137    #1;
138    if (finished_count !== 1'b1) begin
139        $display("Error:_finished_count_not_asserted_at_"
140            "the_end_of_count");
141        error_count++;
142    end else begin
143        $display("Passed:_finished_count_asserted_at_the_"
144            "end_of_count");
145    end
146
147    // Final result
148    if (error_count == 0) begin
149        $display("\033[32m*****PROJ_COUNTER_TEST_"
150            "PASSED*****\033[0m");
151    end
152    else begin
153        $display("Test_FAILED:_%0d_errors_detected",
154            error_count);

```

```

142     $display("Failure_reasons:");
143     $display("1._Initial_index_after_reset_might_be_");
144         ↪ incorrect");
145     $display("2._Index_might_not_increment_correctly");
146     $display("3._Index_might_not_wrap_around_at_");
147         ↪ FM_BUFFER_SIZE");
148     $display("4._Reset_during_operation_might_not_work_");
149         ↪ as_expected");
150     $display("5._finished_count_signal_might_not_");
151         ↪ assert_correctly");
152     $display("6._Counter_might_not_respond_correctly_");
153         ↪ to_the_start_signal");

154 end
155 $finish;
156 end
157
158 endmodule

```

## Kmer Buffer

### proj\_kmer\_buffer.sv

```

1 `timescale 1ns / 1ps
2 import proj_pkg::*; // Include the project package
3
4 module proj_kmer_buffer #(
5     parameter DATA_BITS = proj_pkg::KMER_BUFFER_BITS,      //
6         ↪ Number of bits for each nucleotide
7     parameter KMER_LEN = proj_pkg::KMER_BUFFER_LEN,        //
8         ↪ Length of the k-mer
9     parameter OUT_KMER = KMER_LEN * DATA_BITS // Total bits
10         ↪ in the output k-mer
11 ) (
12     input wire clk,           // Clock input
13     input wire rst_n,         // Active-low reset
14     input wire [DATA_BITS-1:0] in_data, // Input nucleotide
15         ↪ data
16     output wire [KMER_LEN-1:0][DATA_BITS-1:0] out_kmer, // //
17         ↪ Output k-mer
18     input wire start_over,    // Signal to reset the buffer
19     output logic full        // Indicates when the buffer
20         ↪ is full
21 );
22
23     // Internal signals
24     logic [KMER_LEN*DATA_BITS-1:0] kmer_buffer;      // Current
25         ↪ k-mer buffer
26     logic [KMER_LEN*DATA_BITS-1:0] kmer_buffer_nxt; // Next
27         ↪ state of k-mer buffer
28     logic [$clog2(KMER_LEN)-1:0] buffer_count;       //
29         ↪ Current count of nucleotides in buffer
30     logic [$clog2(KMER_LEN)-1:0] buffer_count_nxt;    //
31         ↪ Next state of buffer count

```

```

22 // Increment buffer count
23 assign buffer_count_nxt = buffer_count + 1'b1;
24
25 // Determine if buffer is full
26 assign buffer_full = (buffer_count == KMER_LEN - 1) ? 1'b1
27   ↪ : 1'b0;
28 assign full = buffer_full;
29
30 // Generate next state of k-mer buffer
31 generate
32   for (genvar i = 2; i < KMER_LEN; i++) begin
33     assign kmer_buffer_nxt[0] = in_data[0];
34     assign kmer_buffer_nxt[1] = in_data[1];
35     assign kmer_buffer_nxt[i] = kmer_buffer[i-1]; // ↪ Shift existing data
36   end
37 endgenerate
38
39 // Connect internal buffer to output
40 assign out_kmer = kmer_buffer;
41
42 // Update k-mer buffer on clock edge
43 always_ff @(posedge clk or negedge rst_n) begin
44   if (~rst_n) begin
45     kmer_buffer <= '0; // Reset buffer on active-low
46       ↪ reset
47   end else if (start_over) begin
48     kmer_buffer <= '0; // Reset buffer on start_over
49       ↪ signal
50   end else begin
51     kmer_buffer <= kmer_buffer_nxt; // Update buffer
52       ↪ with next state
53   end
54 end
55
56 // Update buffer count on clock edge
57 always_ff @(posedge clk or negedge rst_n) begin
58   if (~rst_n) begin
59     buffer_count <= '0; // Reset count on active-low
60       ↪ reset
61   end else if (start_over) begin
62     buffer_count <= '0; // Reset count on start_over
63       ↪ signal
64   end else if (buffer_full) begin
65     buffer_count <= buffer_count; // Maintain count
66       ↪ when buffer is full
67   end else begin
68     buffer_count <= buffer_count_nxt; // Increment count
69   end
70 end
71
72 endmodule

```

## proj\_kmer\_buffer\_tb.sv

```
1 `timescale 1ns / 1ps
2 import proj_pkg::*;
3 module proj_kmer_buffer_tb();
4     // Parameters
5     localparam DATA_BITS = proj_pkg::KMER_BUFFER_BITS;
6     localparam KMER_LEN = proj_pkg::KMER_BUFFER_LEN;
7     localparam OUT_KMER = KMER_LEN * DATA_BITS;
8     localparam CLK_PERIOD = 10; // 10ns clock period
9     // Signals
10    logic clk;
11    logic rst_n;
12    logic [DATA_BITS-1:0] in_data;
13    logic [KMER_LEN-1:0][DATA_BITS-1:0] out_kmer;
14    logic start_over;
15    logic full;
16    // Instantiate the DUT
17    proj_kmer_buffer #(
18        .DATA_BITS(DATA_BITS),
19        .KMER_LEN(KMER_LEN),
20        .OUT_KMER(OUT_KMER)
21    ) dut (
22        .clk(clk),
23        .rst_n(rst_n),
24        .in_data(in_data),
25        .out_kmer(out_kmer),
26        .start_over(start_over),
27        .full(full)
28    );
29    int j;
30    // Clock generation
31    always #(CLK_PERIOD/2) clk = ~clk;
32    // Test stimulus
33    initial begin
34        $display("Starting_proj_kmer_buffer_testbench");
35        // Initialize signals
36        clk = 0;
37        rst_n = 0;
38        in_data = 0;
39        start_over = 0;
40        $display("Initializing_signals:_clk=%b,_rst_n=%b,_"
41             "in_data=%b,_start_over=%b", clk, rst_n, in_data,
42             start_over);
43        // Reset
44        #(CLK_PERIOD*2) rst_n = 1;
45        $display("Reset_complete:_rst_n=%b", rst_n);
46        // Test case 1: Fill the buffer
47        $display("Starting_Test_Case_1:_Filling_the_buffer");
48        for (int i = 0; i < KMER_LEN; i++) begin
49            in_data = $random;
50            $display("Cycle_%0d:_in_data=%b", i, in_data);
51            #CLK_PERIOD;
52        end
53        $display("Buffer_filled._Waiting_for_%0d_cycles",
```

```

      ↪ KMER_LEN);
52   for (int i = 0; i < KMER_LEN; i++) begin
53     in_data = $random;
54     $display("Cycle_%0d:_out_kmer=%b", i+KMER_LEN,
55           ↪ out_kmer);
56     #CLK_PERIOD;
57   end
58   // Test case 2: Start over
59   $display("Starting_Test_Case_2:_Start_over");
60   @ (negedge clk)
61   start_over = 1;
62   $display("start_over_asserted:_start_over=%b",
63           ↪ start_over);
64   @ (negedge clk)
65   start_over = 0;
66   $display("start_over_de-asserted:_start_over=%b",
67           ↪ start_over);
68   // Fill buffer again
69   $display("Filling_buffer_again");
70   for (int i = 0; i < KMER_LEN; i++) begin
71     @ (negedge clk)
72     j = i+1;
73     in_data = j[DATA_BITS-1:0];
74     $display("Cycle_%0d:_in_data=%b", i, in_data);
75   end
76   // Test case 3: Continuous input after full
77   $display("Starting_Test_Case_3:_Continuous_input_after_
78           ↪ full");
79   for (int i = 0; i < 5; i++) begin
80     @ (negedge clk)
81     in_data = $random;
82     $display("Additional_input_%0d:_in_data=%b", i,
83           ↪ in_data);
84     $display("out_kmer=%b", out_kmer);
85   end
86   $display("Testbench_completed");
87   $finish;
88 end
89 endmodule

```