



# FHE for hardware, hardware for FHE and beyond!

Anisha Mukherjee

IAIK, Graz University of Technology, Austria

September 4, 2024

## Our Team

Team Sinha Roy



- \* HW and design for homomorphic encryption
- \* Post-quantum cryptographic schemes
- \* Zero-knowledge proofs

## Outline

- **Background and Motivation**
  - Homomorphic Encryption (HE)
  - Ring-LWE based HE and challenges
- **FHE-HW: Hardware acceleration for HE**
  - FNTT: Fermat's Number Technique for NTT
  - REED: Chiplet-based hardware accelerator
- **HW-FHE: HE for hardware reusability**
  - ModHE: Module-LWE based HE scheme
- **Beyond HE**
  - Hybrid Homomorphic Encryption (HHE)
  - SASTA: Fault attack on HHE

## Outline

- **Background and Motivation**
  - Homomorphic Encryption (HE)
  - Ring-LWE based HE and challenges
- **FHE-HW: Hardware acceleration for HE**
  - FNTT: Fermat's Number Technique for NTT
  - REED: Chiplet-based hardware accelerator
- **HW-FHE: HE for hardware reusability**
  - ModHE: Module-LWE based HE scheme
- **Beyond HE**
  - Hybrid Homomorphic Encryption (HHE)
  - SASTA: Fault attack on HHE

# Homomorphic Encryption (HE): Brief introduction

- Allows functional evaluation on encrypted data
- Preserves privacy of data owners

# Homomorphic Encryption (HE): Brief introduction



# Generations of lattice-based HE



## Outline

- **Background and Motivation**
  - Homomorphic Encryption (HE)
  - Ring-LWE based HE and challenges
- **FHE-HW: Hardware acceleration for HE**
  - FNTT: Fermat's Number Technique for NTT
  - REED: Chiplet-based hardware accelerator
- **HW-FHE: HE for hardware reusability**
  - ModHE: Module-LWE based HE scheme
- **Beyond HE**
  - Hybrid Homomorphic Encryption (HHE)
  - SASTA: Fault attack on HHE

# Ring-Learning with Errors (RLWE)

$$a(x) \cdot s(x) + e(x) = b(x) \pmod{q} \pmod{f(x)}$$

$$a(x) = (a_1x^0 + \dots + a_Nx^{N-1}) \in \mathcal{R}_q$$

$$s(x) \in \mathcal{R}_q$$

$$e_i \in \{0, 1, -1\}^N$$

$$b(x) \in \mathcal{R}_q$$

$$\begin{aligned} \mathcal{R}_q &= \mathbb{Z}_q[X] / \langle f(x) \rangle \\ f(x) &= x^N + 1 \\ \text{Eg: } (N &= 2^{14}, \log q = 411) \end{aligned}$$

RLWE meets HE 

# RLWE meets HE: Security and parameter selection



# RLWE meets HE: Security and parameter selection



# What's the catch?

- HE schemes are computationally intensive
- Usually incur an overhead of  $10^4 - 10^5 \times$  compared to plaintext comp.

## Hardware acceleration to close the gap



## Challenges with FHE-HW acceleration

- Many (large) polynomial arithmetic operations
  - Large degree polynomial arithmetic
  - Long integer arithmetic
- Memory management
  - Large ciphertext and key sizes
  - Limited on-chip memory

## Challenges with FHE-HW acceleration

- Many (large) polynomial arithmetic operations
  - Large degree polynomial arithmetic
  - Long integer arithmetic
- Memory management
  - Large ciphertext and key sizes
  - Limited on-chip memory

- 
- Poly mult is expensive
  - Naive method has  $\mathcal{O}(N^2)$  complexity
  - Use NTT/INTT with  $\mathcal{O}(N \log N)$  complexity

NTT : Number Theoretic Transform

## Challenges with FHE-HW acceleration

- Many (large) polynomial arithmetic
  - Large degree polynomial
  - Long integer arithmetic
- Memory management
  - Large ciphertext and key sizes
  - Limited on-chip memory

RNS decomposition for efficiency:

$$q = \prod_i q_i$$

$$\log q_i < q \quad \forall i$$

## Challenges with FHE-HW acceleration

- Many (large) polynomial arithmetic operations
  - Large degree polynomial arithmetic
  - Long integer arithmetic
- Memory management
  - Large ciphertext and key sizes
  - Limited on-chip memory



## Challenges with FHE-HW acceleration

- Many (large) polynomial arithmetic operations
  - NTT/INTT transformation involves modular add, subtract, mult
  - NTT/INTT transformation needs to support multiple RNS moduli
  - FHE requires many such NTT/INTT transformations

## Challenges with FHE-HW acceleration

- Many (large) polynomial arithmetic operations
    - NTT/INTT transformation involves modular multiplication
    - NTT/INTT transformation needs to support multiple RNS moduli
    - FHE requires many such NTT/INTT transformations
- 
- Fast and resource-efficient modular mult  
⇒ high-performance FHE accelerator

Can we make modular multiplications in NTT/INTT units extremely cheap ?

## Outline

- **Background and Motivation**
  - Homomorphic Encryption (HE)
  - Ring-LWE based HE and challenges
- **FHE-HW: Hardware acceleration for HE**
  - FNTT: Fermat's Number Technique for NTT
  - REED: Chiplet-based hardware accelerator
- **HW-FHE: HE for hardware reusability**
  - ModHE: Module-LWE based HE scheme
- **Beyond HE**
  - Hybrid Homomorphic Encryption (HHE)
  - SASTA: Fault attack on HHE

## Approach: The Fermat Number Technique

- Fermat number,  $P = 2^K + 1$  as auxiliary modulus

# Advantages of the Fermat Number Technique

- Fermat number,  $P = 2^K + 1$  as auxiliary modulus



NTT: Modular multiplications



FNTT: Simple shift operations

## Advantages and challenges

- ✓ Multiplier-less NTT using Fermat number
- ✓ Roots of unity are powers of two  $\Rightarrow$  no storage required
- ✓ We\* achieve  $1,200 \times$  speed-up compared to software implementations
  - Requires more number of computations

\*Andrey Kim, Ahmet Can Mert, Anisha Mukherjee, Aikata Aikata, Maxim Deryabin, Sunmin Kwon, HyungChul Kang, and Sujoy Sinha Roy. **Exploring the advantages and challenges of Fermat NTT in FHE acceleration.** CRYPTO, 2024.

## Outline

- **Background and Motivation**
  - Homomorphic Encryption (HE)
  - Ring-LWE based HE and challenges
- **FHE-HW: Hardware acceleration for HE**
  - FNTT: Fermat's Number Technique for NTT
  - REED: Chiplet-based hardware accelerator
- **HW-FHE: HE for hardware reusability**
  - ModHE: Module-LWE based HE scheme
- **Beyond HE**
  - Hybrid Homomorphic Encryption (HHE)
  - SASTA: Fault attack on HHE

## Challenges with FHE-HW acceleration

- Many (large) polynomial arithmetic operations
  - Large degree polynomial arithmetic
  - Long integer arithmetic
- Memory management
  - Large ciphertext and key sizes
  - Limited on-chip memory

# Challenges with FHE-HW acceleration

- Many (large) polynomials
  - Large degree
  - Long integers
- Memory requirements
  - Large ciphertexts
- Limited on-chip memory



System-level view

## Challenges with FHE-HW acceleration

Problem: FHE is slow

Solution: Design very large parallel accelerator

New problems:

- Large area decreases yield
- High manufacturing costs (16M US\$ or more)
- Difficult pre-silicon testing and verification



## Challenges with FHE-HW acceleration

Problem: FHE is slow

Solution: Design very large parallel accelerator

New problems:

- Large area decreases yield
- High manufacturing costs
- Difficult pre-silicon testing and verification



\*\*Picture credits: Tenor

## Approach: Chiplet integration

- Split a big design into multiple dies called ‘chiplets’
  - Dies are 2.5D or 3D ‘packaged’

## Chiplet-based FHE processor: REED\*



Side and top view–2.5D REED

\*Aikata Aikata, Ahmet Can Mert, Sunmin Kwon, Maxim Deryabin, and Sujoy Sinha Roy. **REED: Chiplet-based accelerator for fully homomorphic encryption.** <https://eprint.iacr.org/2023/1190>.

## Advantages and challenges of a chiplet-based design

- ☑ Higher yield
- ☑ Smaller and simpler chiplets
- ☑ Manufacturing feasibility
  - Slow chiplet-to-chiplet (C2C) communication
  - Optimal balance between area and number of chiplets is crucial

## Overcoming the challenges

- Algorithmic tweaks to develop a ring-based FHE C2C protocol
- No performance penalty
- Linear interconnection complexity

ASIC designs such as REED's chiplet system could bring FHE calculations within **10x** latency compared to plaintext calculations.

## Outline

- **Background and Motivation**
  - Homomorphic Encryption (HE)
  - Ring-LWE based HE and challenges
- **FHE-HW: Hardware acceleration for HE**
  - FNTT: Fermat's Number Technique for NTT
  - REED: Chiplet-based hardware accelerator
- **HW-FHE: HE for hardware reusability**
  - ModHE: Module-LWE based HE scheme
- **Beyond HE**
  - Hybrid Homomorphic Encryption (HHE)
  - SASTA: Fault attack on HHE

## Challenges with FHE-HW acceleration

- Many (large) polynomial arithmetic operations
  - Large degree polynomial arithmetic
  - Long integer arithmetic



More complex application → Larger ( $N, q$ )

## Challenges with FHE-HW acceleration

- Many (large) polynomial arithmetic operations
  - Large degree polynomial arithmetic
  - Long integer arithmetic

- 
- Software FHE is flexible but very slow
  - Hardware is **fast but inflexible**

Can we design an HE scheme that allows the same hardware to support multiple  $(N, q)$ ?

# Approach: Module-LWE (MLWE) meets HE



# Module-LWE (MLWE) meets HE

**Module**

$$\begin{bmatrix} a_{00}(x) & a_{01}(x) \\ a_{10}(x) & a_{11}(x) \end{bmatrix} \cdot \begin{bmatrix} s_0(x) \\ s_1(x) \end{bmatrix} + \begin{bmatrix} e_0(x) \\ e_1(x) \end{bmatrix} = \begin{bmatrix} b_0(x) \\ b_1(x) \end{bmatrix} \pmod{q} \pmod{f(x)}$$

$a_{ij}(x) \in \mathcal{R}_q(2^{13})$

$s_i(x) \in \mathcal{R}_q(2^{13})$

$b_i(x) \in \mathcal{R}_q(2^{13})$

$(N = 2^{14} = 2 \cdot 2^{13} = n \cdot r)$

```

graph TD
    A["Module"] --> B["[a_ij(x)]"]
    A --> C["[s_i(x)]"]
    A --> D["[b_i(x)]"]
    B --> E["a_ij(x) ∈ ℒ_q(2^13)"]
    C --> F["s_i(x) ∈ ℒ_q(2^13)"]
    D --> G["b_i(x) ∈ ℒ_q(2^13)"]
    E --> H["(N = 2^14 = 2 · 2^13 = n · r)"]
    F --> H
    G --> H
  
```

# Module-LWE (MLWE) meets HE

- Flexible parameters: Fix a ring degree ( $n$ ) and vary the rank ( $r$ )



## Module-LWE (MLWE) meets HE: ModHE\*

- Flexible parameters: Fix a ring degree ( $n$ ) and vary the rank ( $r$ )



Arithmetic on smaller and fixed degree ( $n$ ) polynomials)

\*Anisha Mukherjee, Aikata, Ahmet Can Mert, Yongwoo Lee, Sunmin Kwon, Maxim Deryabin, and Sujoy Sinha Roy. **Modhe: Modular homomorphic encryption using module lattices potentials and limitations.** TCCHES, 2024(1)

## ModHE: Potentials and limitations

- ☑ Better security assumptions
  - ☑ Hardware reusability and more scope for optimization
  - ☑ Increased scope of parallel computations
  - ☑ Ciphertext compression due to rank reduction
- Limitations: Increased key sizes, more precision loss

## Outline

- **Background and Motivation**
  - Homomorphic Encryption (HE)
  - Ring-LWE based HE and challenges
- **FHE-HW: Hardware acceleration for HE**
  - FNTT: Fermat's Number Technique for NTT
  - REED: Chiplet-based hardware accelerator
- **HW-FHE: HE for hardware reusability**
  - ModHE: Module-LWE based HE scheme
- **Beyond HE**
  - Hybrid Homomorphic Encryption (HHE)
  - SASTA: Fault attack on HHE

## Communication overhead in HE



## Communication overhead in HE



## Approach: Encryption using symmetric ciphers



## Approach: Hybrid Homomorphic Encryption (HHE)

- Clients encrypts data symmetrically
  - No ciphertext expansion
- Server does more computations
  - Extra homomorphic decryption of symmetric circuit before eval

## Approach: Hybrid Homomorphic Encryption (HHE)

- Clients encrypts data symmetrically
  - No ciphertext expansion
- Server does more computations
  - Extra homomorphic decryption of symmetric circuit before eval

## Outline

- **Background and Motivation**
  - Homomorphic Encryption (HE)
  - Ring-LWE based HE and challenges
- **FHE-HW: Hardware acceleration for HE**
  - FNTT: Fermat's Number Technique for NTT
  - REED: Chiplet-based hardware accelerator
- **HW-FHE: HE for hardware reusability**
  - ModHE: Module-LWE based HE scheme
- **Beyond HE**
  - Hybrid Homomorphic Encryption (HHE)
  - SASTA: Fault attack on HHE

## HHE: Beyond theoretical security

- SASTA\* introduces a novel fault attack on the **SE.Enc** step

\*Aikata Aikata, Ahaan Dabholkar, Dhiman Saha, and Sujoy Sinha Roy. **SASTA: Ambushing hybrid homomorphic encryption schemes with a single fault.** <https://eprint.iacr.org/2024/041>.

# Beyond HE

IAIK  
51



\*\*Picture credits: ClipArts

## ASTA: Differential Fault Analysis

Differential = Faulty ciphertext – Faultfree ciphertext

$$= ct' - ct$$

$$= (m + SE.\text{Enc}(K_{SE}, n)) - (m + SE'.\text{Enc}(K_{SE}, n))$$

$$\Delta E = SE.\text{Enc} - SE'.\text{Enc}$$

## ASTA: Features and limitations

- ☑ Single fault at identified Fault Injection Points (FIPs)
- ☑ Single pair of faulty and fault-free  $ct$  required for key-recovery
- ☑ Demonstrated success for many HHE ciphers
  - Attack success dependent on complexity of evaluation function

## Conclusion: Key-takeaways

- ▶ Homomorphic Encryption provides data privacy in untrusted environments
- ▶ Suffers from large computational overhead
- ▶ Interesting scopes in new hardware/computation paradigms & scheme design
- ▶ Interesting scopes for in-depth cryptanalysis



# FHE for hardware, hardware for FHE and beyond!

Anisha Mukherjee

IAIK, Graz University of Technology, Austria

September 4, 2024