

Design and Evaluation of  
**POST-QUANTUM  
CRYPTOGRAPHY**  
Accelerator on FPGA

PROGRESS PRESENTATION

**Members**

Pakin Panawattanakul 6580043  
Nitchayanan Thamkunanon 6580081  
Panupong Sangaphunchai 6580587

# Table of Contents

**Current Progress**

**Designing Philosophy**

**NTT Circuit Design**

**Future works**

# Current Progress



**Fig. 3** Overall Kyber encapsulation and decapsulation architecture.

## Modules sections

- 1) Control ■
  - 2) Encapsulation ■
  - 3) Decapsulation ■
  - 4) Main computation ■

Note Article does not include detailed implementation of each module

# Current Progress

| <b>Encapsulation</b>       | <b>Decapsulation</b>       | <b>Main Computation</b> |
|----------------------------|----------------------------|-------------------------|
| Decode PK                  | Pre/Post Indcpa Decryption | Addition                |
| Decode msg                 | Decode Decompress SK       | PACC                    |
| Hash (SHAKE+CBD+Rejection) | Decode Decompress Ct       | INTT                    |
| Compress Encode            | Compress Encode            | NTT                     |
| Pre/Post Indcpa Encryption |                            | Reduce                  |
|                            |                            | Subtraction             |

**Green** = Done

**Blue** = In Progress

**Red** = Will start soon

# Designing Philosophy

- Understanding official reference code or Documentations
- Implement in system verilog
- Verify the correctness using simulation compare to reference code

## Reference codes/Documents

- Kyber github
- FIPS202 (Hash)
- FIPS203 (Key Encapsulation Mechanism)
- Kyber Specification Round 3 (Core Kyber design)

## FIPS 203

Federal Information Processing Standards Publication

**Module-Lattice-Based  
Key-Encapsulation Mechanism Standard**



- Translating C code -> Verilog
- Similar loop structure to generate wire
- Add to MUX to reuse the same combinational circuit

```

reg [15:0] buf_in[0:255];
wire [15:0] a[0:6][0:127];
wire [15:0] b[0:6][0:127];
// generate a,b for muxes
genvar s, start, j;
generate
  for (s = 0; s < 7; s = s + 1) begin : g_stride
    for (start = 0; start < 256; start = start + 2 * (128 >> s)) begin : outer
      for (j = start; j < start + (128 >> s); j = j + 1) begin : inner
        localparam int idx = (start / (2 * (128 >> s))) * (128 >> s) + (j - start);
        assign a[s][idx] = buf_in[j];
        assign b[s][idx] = buf_in[j+(128>>s)];
      end
    end
  end
endgenerate
  
```

```

/*
* Name:          ntt
*
* Description: Inplace number-theoretic transform (NTT) in Rq.
*               input is in standard order, output is in bitreversed order
*
* Arguments:    - int16_t r[256]: pointer to input/output vector of elements of Zq
*/
void ntt(int16_t r[256]) {
  unsigned int len, start, j, k;
  int16_t t, zeta;

  k = 1;
  for(len = 128; len >= 2; len >>= 1) {
    for(start = 0; start < 256; start = j + len) {
      zeta = zetas[k++];
      for(j = start; j < start + len; j++) {
        t = fqmul(zeta, r[j + len]);
        r[j + len] = r[j] - t;
        r[j] = r[j] + t;
      }
    }
  }
}
  
```

```

2 // add them to muxes
3 genvar i;
4 generate
5   for (i = 0; i < 128; i = i + 1) begin : All generate block statement
6     ntt_mux7 mux_a (
7       .in0(a[0][i]),
8       .in1(a[1][i]),
9       .in2(a[2][i]),
10      .in3(a[3][i]),
11      .in4(a[4][i]),
12      .in5(a[5][i]),
13      .in6(a[6][i]),
14      .out(mux_out_a[i]),
15      .sel(stage)
16    );
17    ntt_mux7 muxb (
18      .in0(b[0][i]),
19      .in1(b[1][i]),
20      .in2(b[2][i]),
21      .in3(b[3][i]),
22      .in4(b[4][i]),
23      .in5(b[5][i]),
24      .in6(b[6][i]),
25      .out(mux_out_b[i]),
26      .sel(stage)
27    );
28  end
29 endgenerate
  
```

# NTT Circuit Design



- This kind of building block or pattern is called **Cooley–Tukey butterfly**



(a) Cooley-Tukey butterfly

- Recursion is iterative grouping of elements in particular order
- The grouping is called Cooley–Tukey butterfly

## NTT

- NTT is a particular flavor of Discrete Fourier Transform (DFT) over a finite field.**
- used for element-wise multiplication.  $O(n \log n)$
- $(P \cdot Q)(x_i) = P(x)_i * Q(x)_i$ .

```
void ntt(int16_t r[256]) {
    unsigned int len, start, j, k;
    int16_t t, zeta;

    k = 1;
    for(len = 128; len >= 2; len >>= 1) {
        for(start = 0; start < 256; start = j + len) {
            zeta = zetas[k++];
            for(j = start; j < start + len; j++) {
                t = fmul(zeta, r[j + len]);
                r[j + len] = r[j] - t;
                r[j] = r[j] + t;
            }
        }
    }
}
```

# NTT Circuit Design



- 2 Mux per Cooley Tukey
- 1 Mux per Buffer
- 128 Cooley Tukey = 248 Mux
- 256 Buffer = 256 Mux
- 504 Mux
- 128 Cooley Tukey Modules

# Future Works

Continue designing each module separately

Combine each module into a working system

Revised Model (with prof. Tanaka)

Implementing with a real FPGA hardware (IF POSSIBLE)

# Thank you