



High-Throughput does not Compromise Energy Efficiency:

New Algorithms and Implementations for 5G Polar Codes

Furkan Ercan, Ph.D.  
5G PHY Developer - Octasic Inc.  
Vice Chair - IEEE Montréal Section  
Montréal, QC, Canada

May 6, 2021

# Communications & Channel Capacity



# Communications & Channel Capacity



# Communications & Channel Capacity



# Communications & Channel Capacity



# Communications & Channel Capacity



# Communications & Channel Capacity



# Communications & Channel Capacity

Transmitter



Receiver



Given , maximize  $\frac{\sum (\text{📦})}{\sum (\text{📦} + \text{🛡})}$

# Communications & Channel Capacity

Transmitter



Receiver



Given , maximize  $\frac{\sum (\text{orange cube})}{\sum (\text{orange cube} + \text{red shield})}$



**Channel Capacity:**

**Given channel condition, the maximum rate of information**

# Evolution of Digital Communications



- ▶ How to achieve the channel capacity?
- ▶ Various channel coding algorithms emerged (e.g. LDPC, Turbo).
- ▶ Polar codes provably achieve the channel capacity.
- ▶ They are involved as a coding scheme in 5G standard.

# Evolution of Digital Communications



- ▶ How to achieve the channel capacity?
- ▶ Various channel coding algorithms emerged (e.g. LDPC, Turbo).
- ▶ Polar codes provably achieve the channel capacity.
- ▶ They are involved as a coding scheme in 5G standard.

# Evolution of Digital Communications



- ▶ How to achieve the channel capacity?
- ▶ Various channel coding algorithms emerged (e.g. LDPC, Turbo).
- ▶ Polar codes provably achieve the channel capacity.
- ▶ They are involved as a coding scheme in 5G standard.

# Evolution of Digital Communications



- ▶ How to achieve the channel capacity?
- ▶ Various channel coding algorithms emerged (e.g. LDPC, Turbo).
- ▶ Polar codes provably achieve the channel capacity.
- ▶ They are involved as a coding scheme in 5G standard.

# 5G Use Cases

Enhanced Mobile Broadband (eMBB)



- High throughput

Ultra-Reliable Low-Latency Communications (URLLC)



- Low latency
- High reliability

Massive Machine-Type Communications (mMTC)



- Massive connectivity
- Energy efficiency

- ▶ 5G prioritizes various targets based on the use case.
- ▶ Polar codes are involved in 5G eMBB control channel.
- ▶ Currently, polar codes are considerable candidates for other use cases.
- ▶ Fast, practical, energy-efficient polar decoders are essential.

# 5G Use Cases

Enhanced Mobile Broadband (eMBB)



- High throughput

Ultra-Reliable Low-Latency Communications (URLLC)



- Low latency
- High reliability

Massive Machine-Type Communications (mMTC)



- Massive connectivity
- Energy efficiency

- ▶ 5G prioritizes various targets based on the use case.
- ▶ Polar codes are involved in 5G eMBB control channel.
- ▶ Currently, polar codes are considerable candidates for other use cases.
- ▶ Fast, practical, energy-efficient polar decoders are essential.

# 5G Use Cases

Enhanced Mobile Broadband (eMBB)



- High throughput

Ultra-Reliable Low-Latency Communications (URLLC)



- Low latency
- High reliability

Massive Machine-Type Communications (mMTC)



- Massive connectivity
- Energy efficiency

- ▶ 5G prioritizes various targets based on the use case.
- ▶ Polar codes are involved in 5G eMBB control channel.
- ▶ Currently, polar codes are considerable candidates for other use cases.
- ▶ Fast, practical, energy-efficient polar decoders are essential.

# An Overview of Polar Decoders



# An Overview of Polar Decoders



## Successive Cancellation (SC) Decoding \*

- ✓ Simple decoding
- ✗ Mediocre performance at practical lengths
- ✗ Sequential, long latency

\* E. Arikan, "Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input Memoryless Channels," in IEEE Tran. Inf. Theory, vol. 55, no. 7, pp. 3051-3073, July 2009.

# An Overview of Polar Decoders



## SC-List (SCL) Decoding \*

- ✓ Improved error-correction performance
- ✗ Increased complexity

\* I. Tal and A. Vardy, "List Decoding of Polar Codes," in IEEE TIT, May 2015.

# An Overview of Polar Decoders



## Fast-SSC Decoding \*

✓  $\approx 10 \times$  less latency

- ▶ No error correction performance degradation

\*

G. Sarkis et al., "Fast Polar Decoders: Algorithm and Implementation," in IEEE JSAC, May 2014.

# An Overview of Polar Decoders



## Fast-SC-List Decoding \*

- ✓ Latency reduced further
- ✗ Very high complexity

\*

S. A. Hashemi et al. "Fast and Flexible Successive-Cancellation List Decoders for Polar Codes," in IEEE TSP, Nov. 2017.

# An Overview of Polar Decoders



## SC-Flip (SCF) Decoding \*

- ✓ Improved error-correction performance
- ✓ Low complexity
- ✗ Variable latency

\*

O. Afisiadis et al. "A low-complexity improved successive cancellation decoder for polar codes," 2014 48th Asilomar Conference, 2014.

# An Overview of Polar Decoders



## Dynamic SC-Flip (DSCF) Decoding \*

- ✓ Improved error-correction performance
- ✗ Not practical due to complex computations

\*

L. Chandresris et al. "Dynamic-SCFlip Decoding of Polar Codes," in IEEE TCOM, June 2018.

# An Overview of Polar Decoders



## Fast SC-Flip Decoding \*

- ✓ Introduced fast computations
- ✓ Energy-efficient implementation

\*

F. Ercan, et al. "Energy-Efficient Hardware Architectures for Fast Polar Decoders," in IEEE TCAS-I, Jan. 2020.

# An Overview of Polar Decoders



## Practical Dynamic SC-Flip Decoding \*

- ✓ Replaced complex computations with simple approximations
- ✓ Introduced fast computations
- ✓ First hardware implementation

\* F. Ercan et al. "Practical Dynamic SC-Flip Polar Decoders: Algorithm and Implementation," in IEEE TSP, 2020.

# An Overview of Polar Decoders



## Overview of This Talk

- Part I: Background on SC Decoding



# An Overview of Polar Decoders



## Overview of This Talk

- ▶ Part I: Background on SC Decoding
- ▶ Part II: Realizing the SC-Flip Algorithm in Hardware
- ▶

# An Overview of Polar Decoders



## Overview of This Talk

- Part I: Background on SC Decoding
- Part II: Realizing the SC-Flip Algorithm in Hardware
- Part III: Making Dynamic SC-Flip Decoding Practical

## Part I: Preliminaries

# SC Tree and Special Nodes

Example: PC(16,8)



Legend:



# SC Tree and Special Nodes

Example: PC(16,8)



# SC Tree and Special Nodes

Example: PC(16,8)



# SC Tree and Special Nodes

Example: PC(16,8)



Legend:



# SC Tree and Special Nodes

Example: PC(16,8)



Legend:

- Frozen Bit (Rate-0 Node)
- Information Bit (Rate-1 Node)

# SC Tree and Special Nodes

Example: PC(16,8)



Legend:

- Frozen Bit (Rate-0 Node)
- Information Bit (Rate-1 Node)

# SC Tree and Special Nodes

Example: PC(16,8)



Legend:

|                                                                                             |                               |                                                                                   |                          |
|---------------------------------------------------------------------------------------------|-------------------------------|-----------------------------------------------------------------------------------|--------------------------|
| <span style="border: 1px solid black; border-radius: 50%; width: 1em; height: 1em;"></span> | Frozen Bit (Rate-0 Node)      | <span style="color: green; border-radius: 50%; width: 1em; height: 1em;"></span>  | Repetition Node          |
| <span style="color: black; border-radius: 50%; width: 1em; height: 1em;"></span>            | Information Bit (Rate-1 Node) | <span style="color: orange; border-radius: 50%; width: 1em; height: 1em;"></span> | Single Parity Check Node |

# Fast Decoding v Energy Efficiency

- ▶ Energy = Power × Delay

# Fast Decoding v Energy Efficiency

- ▶ Energy = Power × Delay

Table: SC Decoding, 65nm CMOS, PC(1024,512)

|                    | SC <sup>[1]</sup> | Fast-SSC <sup>[2]</sup> |
|--------------------|-------------------|-------------------------|
| Power (mW)         | 51                | 160                     |
| Latency ( $\mu$ s) | 12.7              | 0.6                     |
| Throughput (Mbps)  | 81                | 1719                    |
| Energy (pJ/bit)    | 1262              | 188                     |

\* Normalized for 65 nm CMOS.

---

<sup>[1]</sup> Giard et al., "PolarBear: A 28-nm FD-SOI ASIC for Decoding of Polar Codes," in IEEE JETCAS, 2017.

<sup>[2]</sup> Sarkis et al., "Fast Polar Decoders: Algorithm and Implementation," in IEEE JSAC, 2014.

# Fast Decoding v Energy Efficiency

- Energy = Power × Delay

Table: SC Decoding, 65nm CMOS, PC(1024,512)

|                   | SC <sup>[1]</sup> | Fast-SSC <sup>[2]</sup> |         |
|-------------------|-------------------|-------------------------|---------|
| Power (mW)        | 51                | 160                     | ← 3.1 × |
| Latency (μs)      | 12.7              | 0.6                     |         |
| Throughput (Mbps) | 81                | 1719                    |         |
| Energy (pJ/bit)   | 1262              | 188                     |         |

\* Normalized for 65 nm CMOS.

<sup>[1]</sup> Giard et al., "PolarBear: A 28-nm FD-SOI ASIC for Decoding of Polar Codes," in IEEE JETCAS, 2017.

<sup>[2]</sup> Sarkis et al., "Fast Polar Decoders: Algorithm and Implementation," in IEEE JSAC, 2014.

# Fast Decoding v Energy Efficiency

- Energy = Power × Delay

Table: SC Decoding, 65nm CMOS, PC(1024,512)

|                   | SC <sup>[1]</sup> | Fast-SSC <sup>[2]</sup> |         |
|-------------------|-------------------|-------------------------|---------|
| Power (mW)        | 51                | 160                     | ← 3.1 × |
| Latency (μs)      | 12.7              | 0.6                     | ← 21 ×  |
| Throughput (Mbps) | 81                | 1719                    |         |
| Energy (pJ/bit)   | 1262              | 188                     |         |

\* Normalized for 65 nm CMOS.

<sup>[1]</sup> Giard et al., "PolarBear: A 28-nm FD-SOI ASIC for Decoding of Polar Codes," in IEEE JETCAS, 2017.

<sup>[2]</sup> Sarkis et al., "Fast Polar Decoders: Algorithm and Implementation," in IEEE JSAC, 2014.

# Fast Decoding v Energy Efficiency

- Energy = Power × Delay

Table: SC Decoding, 65nm CMOS, PC(1024,512)

|                   | SC <sup>[1]</sup> | Fast-SSC <sup>[2]</sup> |         |
|-------------------|-------------------|-------------------------|---------|
| Power (mW)        | 51                | 160                     | ← 3.1 × |
| Latency (μs)      | 12.7              | 0.6                     | ← 21 ×  |
| Throughput (Mbps) | 81                | 1719                    |         |
| Energy (pJ/bit)   | 1262              | 188                     | ← 6.7 × |

Normalized for 65 nm CMOS.

<sup>[1]</sup> Giard et al., "PolarBear: A 28-nm FD-SOI ASIC for Decoding of Polar Codes," in IEEE JETCAS, 2017.

<sup>[2]</sup> Sarkis et al., "Fast Polar Decoders: Algorithm and Implementation," in IEEE JSAC, 2014.

# Fast Decoding v Energy Efficiency

- ▶ Energy = Power × Delay

Table: CA-SCL Decoding, 65nm CMOS, PC(512,256),  $L = 2$

|                    | SCL <sup>[3]</sup> | Fast-SSCL <sup>[4]</sup> |
|--------------------|--------------------|--------------------------|
| Power (mW)         | 75.3               | 119.7                    |
| Latency ( $\mu$ s) | 1.71               | 0.43                     |
| Throughput (Mbps)  | 300                | 1201                     |
| Energy (pJ/bit)    | 502                | 199                      |

---

<sup>[3]</sup> Stimming et al., "LLR-Based Successive Cancellation List Decoding of Polar Codes," in IEEE TSP, 2015.

<sup>[4]</sup> Hashemi et al., "Fast and Flexible Successive-Cancellation List Decoders for Polar Codes," in IEEE TSP, 2017.

# Fast Decoding v Energy Efficiency

- ▶ Energy = Power × Delay

Table: CA-SCL Decoding, 65nm CMOS, PC(512,256),  $L = 2$

|                    | SCL <sup>[3]</sup> | Fast-SSCL <sup>[4]</sup> |         |
|--------------------|--------------------|--------------------------|---------|
| Power (mW)         | 75.3               | 119.7                    | ← 59%   |
| Latency ( $\mu$ s) | 1.71               | 0.43                     | ← 4 ×   |
| Throughput (Mbps)  | 300                | 1201                     |         |
| Energy (pJ/bit)    | 502                | 199                      | ← 2.5 × |

- ▶ Fast decoding: Huge gains in latency > Penalty in power

---

<sup>[3]</sup> Stimming et al., "LLR-Based Successive Cancellation List Decoding of Polar Codes," in IEEE TSP, 2015.

<sup>[4]</sup> Hashemi et al., "Fast and Flexible Successive-Cancellation List Decoders for Polar Codes," in IEEE TSP, 2017.

# Fast Decoding v Energy Efficiency

- ▶ Energy = Power × Delay

Table: CA-SCL Decoding, 65nm CMOS, PC(512,256),  $L = 2$

|                    | SCL <sup>[3]</sup> | Fast-SSCL <sup>[4]</sup> |         |
|--------------------|--------------------|--------------------------|---------|
| Power (mW)         | 75.3               | 119.7                    | ← 59%   |
| Latency ( $\mu$ s) | 1.71               | 0.43                     | ← 4 ×   |
| Throughput (Mbps)  | 300                | 1201                     |         |
| Energy (pJ/bit)    | 502                | 199                      | ← 2.5 × |

- ▶ Fast decoding: Huge gains in latency > Penalty in power
- ▶ Motivation: Even more energy efficient with SCF
  - ▶ All while being competitive in error correction performance & T/P!

---

<sup>[3]</sup> Stimming et al., "LLR-Based Successive Cancellation List Decoding of Polar Codes," in IEEE TSP, 2015.

<sup>[4]</sup> Hashemi et al., "Fast and Flexible Successive-Cancellation List Decoders for Polar Codes," in IEEE TSP, 2017.

Part II:  
Realizing the SC-Flip Algorithm

# SC-Flip (SCF) Decoding



---

Afisiadis, et al., "A low-complexity improved successive cancellation decoder for polar codes," 48th Asilomar Conference, 2014.

# SC-Flip (SCF) Decoding



## Legend

- Frozen bit
- Information bit
- Incorrect Estimation

---

Afisiadis, et al., "A low-complexity improved successive cancellation decoder for polar codes," 48th Asilomar Conference, 2014.

# SC-Flip (SCF) Decoding



## Legend

- |   |                      |   |                  |
|---|----------------------|---|------------------|
| ○ | Frozen bit           | ● | Channel Error    |
| ● | Information bit      | ○ | Propagated Error |
| ● | Incorrect Estimation |   |                  |

---

Afisiadis, et al., "A low-complexity improved successive cancellation decoder for polar codes," 48th Asilomar Conference, 2014.

# SC-Flip (SCF) Decoding



Legend

|   |                      |   |                  |
|---|----------------------|---|------------------|
| ○ | Frozen bit           | ● | Channel Error    |
| ● | Information bit      | ○ | Propagated Error |
| ● | Incorrect Estimation | ● | Flipped Bit      |

---

Afisiadis, et al., "A low-complexity improved successive cancellation decoder for polar codes," 48th Asilomar Conference, 2014.

# Components of SC-Flip Algorithm



- ▶ SC decoder
- ▶ An outer Cyclic Redundancy Check (CRC) decoder
  - ▶ To tell if a decoding attempt fails
- ▶ A sorter for bit-flipping indices
  - ▶ To sort them w.r.t. their reliability metric
  - ▶ Reliability metric: Log-likelihood ratio (LLR) value at each index

# Components of SC-Flip Algorithm



- ▶ SC decoder
- ▶ An outer Cyclic Redundancy Check (CRC) decoder
  - ▶ To tell if a decoding attempt fails
- ▶ A sorter for bit-flipping indices
  - ▶ To sort them w.r.t. their reliability metric
  - ▶ Reliability metric: Log-likelihood ratio (LLR) value at each index

# Components of SC-Flip Algorithm



- ▶ SC decoder
- ▶ An outer Cyclic Redundancy Check (CRC) decoder
  - ▶ To tell if a decoding attempt fails
- ▶ A sorter for bit-flipping indices
  - ▶ To sort them w.r.t. their reliability metric
  - ▶ Reliability metric: Log-likelihood ratio (LLR) value at each index

# Previous Works on Fast SCF



- ▶ Implementing fast nodes for SCF was proposed before [1],[2]
  - ▶ High sorting complexity
    - ▶ *i.e.* up to  $n!$  bit-flipping candidates are evaluated in a node of size  $n$
  - ▶ No hardware architecture

---

[1] Giard et al., "Fast-SSC-flip decoding of polar codes," IEEE WCNCW 2018.

[2] Ardakani et al., "Fast Successive-Cancellation Based Decoders of Polar Codes," in IEEE TCOM 2019.

# Previous Works on Fast SCF



- ▶ Implementing fast nodes for SCF was proposed before [1],[2]
  - ▶ High sorting complexity
    - ▶ *i.e.* up to  $n!$  bit-flipping candidates are evaluated in a node of size  $n$
  - ▶ No hardware architecture

---

[1] Giard et al., "Fast-SSC-flip decoding of polar codes," IEEE WCNCW 2018.

[2] Ardakani et al., "Fast Successive-Cancellation Based Decoders of Polar Codes," in IEEE TCOM 2019.

# Previous Works on Fast SCF



- ▶ Implementing fast nodes for SCF was proposed before [1],[2]
  - ▶ High sorting complexity
    - ▶ *i.e.* up to  $n!$  bit-flipping candidates are evaluated in a node of size  $n$
  - ▶ No hardware architecture

---

[1] Giard et al., "Fast-SSC-flip decoding of polar codes," IEEE WCNCW 2018.

[2] Ardakani et al., "Fast Successive-Cancellation Based Decoders of Polar Codes," in IEEE TCOM 2019.

# Our Work on Fast-SCF Decoding



- ▶ An energy-efficient **Fast-SSC architecture**, to be used by Fast-SCF
  - ▶ Introduced resource sharing for merged operations
  - ▶ Reduced the overall memory requirement for soft and hard decision memories
    - ▶ Example: LLR memory

# Our Work on Fast-SCF Decoding



- ▶ An energy-efficient **Fast-SSC architecture**, to be used by Fast-SCF
  - ▶ Introduced resource sharing for merged operations
  - ▶ Reduced the overall memory requirement for soft and hard decision memories
    - ▶ Example: LLR memory



# Our Work on Fast-SCF Decoding



- ▶ An energy-efficient **Fast-SSC architecture**, to be used by Fast-SCF
  - ▶ Introduced resource sharing for merged operations
  - ▶ Reduced the overall memory requirement for soft and hard decision memories
    - ▶ Example: LLR memory



# Our Work on Fast-SCF Decoding



- ▶ An energy-efficient **Fast-SSC architecture**, to be used by Fast-SCF
  - ▶ Introduced resource sharing for merged operations
  - ▶ Reduced the overall memory requirement for soft and hard decision memories
    - ▶ Example: LLR memory



# Our Work on Fast-SCF Decoding



- ▶ An energy-efficient **Fast-SSC architecture**, to be used by Fast-SCF
  - ▶ Introduced resource sharing for merged operations
  - ▶ Reduced the overall memory requirement for soft and hard decision memories
    - ▶ Example: LLR memory



# Our Work on Fast-SCF Decoding



- ▶ An energy-efficient **Fast-SSC architecture**, to be used by Fast-SCF
  - ▶ Introduced resource sharing for merged operations
  - ▶ Reduced the overall memory requirement for soft and hard decision memories
    - ▶ Example: LLR memory



Old LLR memory architecture

# Our Work on Fast-SCF Decoding



- ▶ An energy-efficient **Fast-SSC architecture**, to be used by Fast-SCF
  - ▶ Introduced resource sharing for merged operations
  - ▶ Reduced the overall memory requirement for soft and hard decision memories
    - ▶ Example: LLR memory



# Our Work on Fast-SCF Decoding



- ▶ How to select and **sort** bit-flipping indices for fast nodes?

- ▶ Example: Performance v iterations, PC(1024,512)

- ▶ Fast-SCF: Only index with minimum LLR is flipped in Rate-1 nodes



# Our Work on Fast-SCF Decoding



- ▶ How to select and **sort** bit-flipping indices for fast nodes?
  - ▶ Example: Performance v iterations, PC(1024,512)
  - ▶ Fast-SCF: Only index with minimum LLR is flipped in Rate-1 nodes



# Our Work on Fast-SCF Decoding



- ▶ How to select and **sort** bit-flipping indices for fast nodes?

- ▶ Example: Performance v iterations, PC(1024,512)
  - ▶ Fast-SCF: Only index with minimum LLR is flipped in Rate-1 nodes



- ▶ One flipping index for Rate-1 nodes
- ▶ Up to two flipping indices for SPC nodes

# Our Work on Fast-SCF Decoding



- ▶ How to select and sort bit-flipping indices for fast nodes?

- ▶ Example: Performance v iterations, PC(1024,512)
- ▶ Fast-SCF: Only index with minimum LLR is flipped in Rate-1 nodes



- ▶ One flipping index for Rate-1 nodes
- ▶ Up to two flipping indices for SPC nodes



# Our Work on Fast-SCF Decoding



- ▶ How to select and sort bit-flipping indices for fast nodes?

- ▶ Example: Performance v iterations, PC(1024,512)
- ▶ Fast-SCF: Only index with minimum LLR is flipped in Rate-1 nodes



- ▶ One flipping index for Rate-1 nodes
- ▶ Up to two flipping indices for SPC nodes



# Our Work on Fast-SCF Decoding



- ▶ How to select and sort bit-flipping indices for fast nodes?

- ▶ Example: Performance v iterations, PC(1024,512)
- ▶ Fast-SCF: Only index with minimum LLR is flipped in Rate-1 nodes



- ▶ One flipping index for Rate-1 nodes
- ▶ Up to two flipping indices for SPC nodes



# Our Work on Fast-SCF Decoding



- ▶ How to select and sort bit-flipping indices for fast nodes?

- ▶ Example: Performance v iterations, PC(1024,512)
- ▶ Fast-SCF: Only index with minimum LLR is flipped in Rate-1 nodes



- ▶ One flipping index for Rate-1 nodes
- ▶ Up to two flipping indices for SPC nodes



# Our Work on Fast-SCF Decoding



- ▶ How to select and sort bit-flipping indices for fast nodes?

- ▶ Example: Performance v iterations, PC(1024,512)
- ▶ Fast-SCF: Only index with minimum LLR is flipped in Rate-1 nodes



- ▶ One flipping index for Rate-1 nodes

- ▶ Up to two flipping indices for SPC nodes



- ▶ Implemented a highly parallelized CRC datapath to support fast decoding

# Our Work on Fast-SCF Decoding



- ▶ How to select and sort bit-flipping indices for fast nodes?

- ▶ Example: Performance v iterations, PC(1024,512)
  - ▶ Fast-SCF: Only index with minimum LLR is flipped in Rate-1 nodes



- ▶ One flipping index for Rate-1 nodes

- ▶ Up to two flipping indices for SPC nodes



- ▶ Implemented a highly parallelized CRC datapath to support fast decoding
- ▶ Putting it altogether: An energy-efficient Fast-SCF polar decoder architecture

# Simulation Results: Fast-SCF Decoding

5G PC(1024, 512), 11-bit 5G CRC,  $T_{max} = 20$



---

[1] Afisiadis, et al., "A low-complexity improved successive cancellation decoder for polar codes," 48th Asilomar Conference, 2014.

[2] Giard et al., "Fast-SSC-flip decoding of polar codes," IEEE WCNCW 2018.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

[4] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

# Simulation Results: Fast-SCF Decoding

5G PC(1024, 512), 11-bit 5G CRC,  $T_{max} = 20$



[1] Afisiadis, et al., "A low-complexity improved successive cancellation decoder for polar codes," 48th Asilomar Conference, 2014.

[2] Giard et al., "Fast-SSC-flip decoding of polar codes," IEEE WCNCW 2018.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

[4] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

# Simulation Results: Fast-SCF Decoding

5G PC(1024, 512), 11-bit 5G CRC,  $T_{max} = 20$



[1] Afisiadis, et al., "A low-complexity improved successive cancellation decoder for polar codes," 48th Asilomar Conference, 2014.

[2] Giard et al., "Fast-SSC-flip decoding of polar codes," IEEE WCNCW 2018.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

[4] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

# Simulation Results: Fast-SCF Decoding

5G PC(1024, 512), 11-bit 5G CRC,  $T_{max} = 20$



[1] Afisiadis, et al., "A low-complexity improved successive cancellation decoder for polar codes," 48th Asilomar Conference, 2014.

[2] Giard et al., "Fast-SSC-flip decoding of polar codes," IEEE WCNCW 2018.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

[4] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

# Simulation Results: Fast-SCF Decoding

5G PC(1024, 512), 11-bit 5G CRC,  $T_{max} = 20$



[1] Afisiadis, et al., "A low-complexity improved successive cancellation decoder for polar codes," 48th Asilomar Conference, 2014.

[2] Giard et al., "Fast-SSC-flip decoding of polar codes," IEEE WCNCW 2018.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

[4] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

# Implementation Results: Fast-SCF Decoder

- ▶ TSMC 65nm CMOS technology
- ▶  $PC(512, 256)$

|                                         | SCL [1] | Fast-SSCL [2] | This Work [3] |
|-----------------------------------------|---------|---------------|---------------|
| Power (mW)                              | 75.2    | 119.7         | <b>57.4</b>   |
| Latency ( $\mu$ s)                      | 1.71    | 0.43          | <b>0.33</b>   |
| Worst Case Latency ( $\mu$ s)           | 1.71    | <b>0.43</b>   | 6.93          |
| Throughput (Mbps)                       | 300     | 1201          | <b>1552</b>   |
| Area Efficiency (Gbps/mm <sup>2</sup> ) | 1.36    | 2.85          | <b>4.31</b>   |
| Energy (pJ/info. bit)                   | 502     | 199           | <b>74</b>     |

[1] Stimming et al., "LLR-Based Successive Cancellation List Decoding of Polar Codes," in IEEE TSP, 2015.

[2] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

# Implementation Results: Fast-SCF Decoder

- ▶ TSMC 65nm CMOS technology
- ▶  $PC(512, 256)$

|                                         | SCL [1] | Fast-SSCL [2] | This Work <sup>[3]</sup> | ← 2.1 × |
|-----------------------------------------|---------|---------------|--------------------------|---------|
| Power (mW)                              | 75.2    | 119.7         | <b>57.4</b>              |         |
| Latency ( $\mu$ s)                      | 1.71    | 0.43          | <b>0.33</b>              |         |
| Worst Case Latency ( $\mu$ s)           | 1.71    | <b>0.43</b>   | 6.93                     |         |
| Throughput (Mbps)                       | 300     | 1201          | <b>1552</b>              |         |
| Area Efficiency (Gbps/mm <sup>2</sup> ) | 1.36    | 2.85          | <b>4.31</b>              |         |
| Energy (pJ/info. bit)                   | 502     | 199           | <b>74</b>                |         |

[1] Stimming et al., "LLR-Based Successive Cancellation List Decoding of Polar Codes," in IEEE TSP, 2015.

[2] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

# Implementation Results: Fast-SCF Decoder

- ▶ TSMC 65nm CMOS technology
- ▶  $PC(512, 256)$

|                                         | SCL [1] | Fast-SSCL [2] | This Work [3] |         |
|-----------------------------------------|---------|---------------|---------------|---------|
| Power (mW)                              | 75.2    | 119.7         | <b>57.4</b>   | ← 2.1 × |
| Latency ( $\mu$ s)                      | 1.71    | 0.43          | <b>0.33</b>   | ← 23%   |
| Worst Case Latency ( $\mu$ s)           | 1.71    | <b>0.43</b>   | 6.93          |         |
| Throughput (Mbps)                       | 300     | 1201          | <b>1552</b>   |         |
| Area Efficiency (Gbps/mm <sup>2</sup> ) | 1.36    | 2.85          | <b>4.31</b>   |         |
| Energy (pJ/info. bit)                   | 502     | 199           | <b>74</b>     |         |

[1] Stimming et al., "LLR-Based Successive Cancellation List Decoding of Polar Codes," in IEEE TSP, 2015.

[2] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

# Implementation Results: Fast-SCF Decoder

- ▶ TSMC 65nm CMOS technology
- ▶  $PC(512, 256)$

|                                         | SCL [1] | Fast-SSCL [2] | This Work [3] |         |
|-----------------------------------------|---------|---------------|---------------|---------|
| Power (mW)                              | 75.2    | 119.7         | <b>57.4</b>   | ← 2.1 × |
| Latency ( $\mu$ s)                      | 1.71    | 0.43          | <b>0.33</b>   | ← 23%   |
| Worst Case Latency ( $\mu$ s)           | 1.71    | <b>0.43</b>   | 6.93          |         |
| Throughput (Mbps)                       | 300     | 1201          | <b>1552</b>   | ← 29%   |
| Area Efficiency (Gbps/mm <sup>2</sup> ) | 1.36    | 2.85          | <b>4.31</b>   |         |
| Energy (pJ/info. bit)                   | 502     | 199           | <b>74</b>     |         |

[1] Stimming et al., "LLR-Based Successive Cancellation List Decoding of Polar Codes," in IEEE TSP, 2015.

[2] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

# Implementation Results: Fast-SCF Decoder

- ▶ TSMC 65nm CMOS technology
- ▶  $PC(512, 256)$

|                                         | SCL [1] | Fast-SSCL [2] | This Work [3] |         |
|-----------------------------------------|---------|---------------|---------------|---------|
| Power (mW)                              | 75.2    | 119.7         | <b>57.4</b>   | ← 2.1 × |
| Latency ( $\mu$ s)                      | 1.71    | 0.43          | <b>0.33</b>   | ← 23%   |
| Worst Case Latency ( $\mu$ s)           | 1.71    | <b>0.43</b>   | 6.93          |         |
| Throughput (Mbps)                       | 300     | 1201          | <b>1552</b>   | ← 29%   |
| Area Efficiency (Gbps/mm <sup>2</sup> ) | 1.36    | 2.85          | <b>4.31</b>   | ← 51%   |
| Energy (pJ/info. bit)                   | 502     | 199           | <b>74</b>     |         |

[1] Stimming et al., "LLR-Based Successive Cancellation List Decoding of Polar Codes," in IEEE TSP, 2015.

[2] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

# Implementation Results: Fast-SCF Decoder

- ▶ TSMC 65nm CMOS technology
- ▶  $PC(512, 256)$

|                                         | SCL [1] | Fast-SSCL [2] | This Work [3] |         |
|-----------------------------------------|---------|---------------|---------------|---------|
| Power (mW)                              | 75.2    | 119.7         | <b>57.4</b>   | ← 2.1 × |
| Latency ( $\mu$ s)                      | 1.71    | 0.43          | <b>0.33</b>   | ← 23%   |
| Worst Case Latency ( $\mu$ s)           | 1.71    | <b>0.43</b>   | 6.93          |         |
| Throughput (Mbps)                       | 300     | 1201          | <b>1552</b>   | ← 29%   |
| Area Efficiency (Gbps/mm <sup>2</sup> ) | 1.36    | 2.85          | <b>4.31</b>   | ← 51%   |
| Energy (pJ/info. bit)                   | 502     | 199           | <b>74</b>     | ← 2.7 × |

[1] Stimming et al., "LLR-Based Successive Cancellation List Decoding of Polar Codes," in IEEE TSP, 2015.

[2] Hashemi et al., "A fast polar code list decoder architecture based on sphere decoding," IEEE TCAS-I 2016.

[3] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

## Part III: Making Dynamic SC-Flip Decoding Practical

# Dynamic SC-Flip (DSCF) Decoding

- ▶ A better metric that can distinguish channel errors from propagated errors.
- ▶ This gives an opportunity to tackle more than one channel error.
- ▶ Error correction performance is greatly improved.
- ▶ Metric computation is **impractical**:

$$\triangleright M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} \frac{1}{\alpha} \log \left( 1 + \exp(-\alpha |L^0[\mathcal{E}_{\omega-1}]_j|) \right)$$

# Dynamic SC-Flip (DSCF) Decoding

- ▶ A better metric that can distinguish channel errors from propagated errors.
- ▶ This gives an opportunity to tackle more than one channel error.
- ▶ Error correction performance is greatly improved.
- ▶ Metric computation is **impractical**:

$$\triangleright M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} \frac{1}{\alpha} \log \left( 1 + \exp(-\alpha |L^0[\mathcal{E}_{\omega-1}]_j|) \right)$$

# Dynamic SC-Flip (DSCF) Decoding

- ▶ A better metric that can distinguish channel errors from propagated errors.
- ▶ This gives an opportunity to tackle more than one channel error.
- ▶ Error correction performance is greatly improved.
- ▶ Metric computation is **impractical**:

$$\triangleright M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} \frac{1}{\alpha} \log \left( 1 + \exp(-\alpha |L^0[\mathcal{E}_{\omega-1}]_j|) \right)$$

# Dynamic SC-Flip (DSCF) Decoding

- ▶ A better metric that can distinguish channel errors from propagated errors.
- ▶ This gives an opportunity to tackle more than one channel error.
- ▶ Error correction performance is greatly improved.
- ▶ Metric computation is **impractical**:

$$\triangleright M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} \frac{1}{\alpha} \log \left( 1 + \exp(-\alpha |L^0[\mathcal{E}_{\omega-1}]_j|) \right)$$

# Dynamic SC-Flip (DSCF) Decoding

$$M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} \frac{1}{\alpha} \log \left( 1 + \exp(-\alpha |L^0[\mathcal{E}_{\omega-1}]_j|) \right)$$

---

W. Gross and P. Gulak. "Simplified MAP algorithm suitable for implementation of turbo decoders," Electronics Letters, vol. 34, no. 16, pp. 1577–1578, Aug 1998.

# Dynamic SC-Flip (DSCF) Decoding

$$M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} \frac{1}{\alpha} \log \left( 1 + \exp(-\alpha |L^0[\mathcal{E}_{\omega-1}]_j|) \right)$$

LLRs of flipping indices  
(Same as SCF)

---

W. Gross and P. Gulak. "Simplified MAP algorithm suitable for implementation of turbo decoders," Electronics Letters, vol. 34, no. 16, pp. 1577–1578, Aug 1998.

# Dynamic SC-Flip (DSCF) Decoding

$$M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} \frac{1}{\alpha} \log \left( 1 + \exp(-\alpha |L^0[\mathcal{E}_{\omega-1}]_j|) \right)$$

LLRs of flipping indices  
(Same as SCF)

Difficult part  
 $f_\alpha(|L|)$

---

W. Gross and P. Gulak. "Simplified MAP algorithm suitable for implementation of turbo decoders," Electronics Letters, vol. 34, no. 16, pp. 1577–1578, Aug 1998.

# Dynamic SC-Flip (DSCF) Decoding

$$M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} \frac{1}{\alpha} \log \left( 1 + \exp(-\alpha |L^0[\mathcal{E}_{\omega-1}]_j|) \right)$$

LLRs of flipping indices  
(Same as SCF)

Difficult part  
 $f_\alpha(|L|)$



---

W. Gross and P. Gulak. "Simplified MAP algorithm suitable for implementation of turbo decoders," Electronics Letters, vol. 34, no. 16, pp. 1577–1578, Aug 1998.

# Dynamic SC-Flip (DSCF) Decoding

$$M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} \frac{1}{\alpha} \log \left( 1 + \exp(-\alpha |L^0[\mathcal{E}_{\omega-1}]_j|) \right)$$

LLRs of flipping indices  
(Same as SCF)

Difficult part  
 $f_\alpha(|L|)$



Replace  $f_\alpha(x)$  with  $f_\alpha^*(x) = \begin{cases} \frac{3}{2}, & \text{if } |x| \leq 5 \\ 0, & \text{otherwise.} \end{cases}$

W. Gross and P. Gulak. "Simplified MAP algorithm suitable for implementation of turbo decoders," Electronics Letters, vol. 34, no. 16, pp. 1577–1578, Aug 1998.

# Towards Practical DSCF Decoding

Idea: Reformulate the simplified metric to be used in special nodes.  
Recall the original DSCF metric:

$$M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} f_\alpha(|L^0[\mathcal{E}_{\omega-1}]_j|)$$

# Towards Practical DSCF Decoding

Idea: Reformulate the simplified metric to be used in special nodes.  
Recall the original DSCF metric:

$$M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} f_\alpha(|L^0[\mathcal{E}_{\omega-1}]_j|)$$

Let us split it as follows:

$$M_\alpha(\mathcal{E}_\omega) = |L^0[\mathcal{E}_{\omega-1}]_{i_\omega}| + \sum_{j \in \mathcal{E}_{\omega-1}} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} f_\alpha(|L^0[\mathcal{E}_{\omega-1}]_j|).$$

# Towards Practical DSCF Decoding

Idea: Reformulate the simplified metric to be used in special nodes.  
Recall the original DSCF metric:

$$M(\mathcal{E}_\omega) = \sum_{j \in \mathcal{E}_\omega} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} f_\alpha(|L^0[\mathcal{E}_{\omega-1}]_j|)$$

Let us split it as follows:

$$M_\alpha(\mathcal{E}_\omega) = \underbrace{|L^0[\mathcal{E}_{\omega-1}]_{i_\omega}|}_{M_1(L)} + \underbrace{\sum_{j \in \mathcal{E}_{\omega-1}} |L^0[\mathcal{E}_{\omega-1}]_j| + \sum_{\substack{j \leq i_\omega \\ j \in \mathcal{A}}} f_\alpha(|L^0[\mathcal{E}_{\omega-1}]_j|)}_{M_2(L)}.$$

- ▶  $M_1(L)$  is the *instantaneous component* obtained from the node, on the spot.
- ▶  $M_2(L)$  is the *accumulative component*, formed over the course of the decoding.

# Incorporation of Special Nodes into DSCF

$$M(L) = \textcolor{blue}{M}_1(L) + \textcolor{red}{M}_2(L).$$

- ▶  $\textcolor{blue}{M}_1(L)$  is directly obtained from the LLR magnitude of the index.
- ▶  $\textcolor{red}{M}_2(L)$  is updated by the LLR magnitude of the index.
- ▶

# Incorporation of Special Nodes into DSCF

$$M(L) = \textcolor{blue}{M}_1(L) + \textcolor{red}{M}_2(L).$$

- ▶  $\textcolor{blue}{M}_1(L)$  is directly obtained from the LLR magnitude of the index.
- ▶  $\textcolor{red}{M}_2(L)$  is updated by the LLR magnitude of the index.



▶

# Incorporation of Special Nodes into DSCF

$$M(L) = \textcolor{blue}{M}_1(L) + \textcolor{red}{M}_2(L).$$

- ▶  $\textcolor{blue}{M}_1(L)$  is directly obtained from the LLR magnitude of the index.
- ▶  $\textcolor{red}{M}_2(L)$  is updated by the LLR magnitude of the index.



- ▶ **Main idea:** Obtain  $L$  from the special nodes.

# More on Practical Fast-DSCF Decoding

- ▶ Algorithm-level improvements:
  - ▶ Performed a mathematical study to minimize the sorting complexity
- ▶ Hardware-level simplifications:
  - ▶ Normalized metric computation to avoid saturation
  - ▶ Custom sorter architecture to minimize delay
  - ▶ Reduced sorter length by 50% to reduce complexity

# More on Practical Fast-DSCF Decoding

- ▶ Algorithm-level improvements:
  - ▶ Performed a mathematical study to minimize the sorting complexity
- ▶ Hardware-level simplifications:
  - ▶ Normalized metric computation to avoid saturation
  - ▶ Custom sorter architecture to minimize delay
  - ▶ Reduced sorter length by 50% to reduce complexity

# More on Practical Fast-DSCF Decoding

- ▶ Algorithm-level improvements:
  - ▶ Performed a mathematical study to minimize the sorting complexity
- ▶ Hardware-level simplifications:
  - ▶ Normalized metric computation to avoid saturation
  - ▶ Custom sorter architecture to minimize delay
  - ▶ Reduced sorter length by 50% to reduce complexity

# Simulation Results - Fast-DSCF Decoding

- 5G PC(1024, 512), 16-bit 5G CRC,  $T_{\max} = 400$ .



[1] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

[2] Chandresris et al., "Dynamic-SCFlip decoding of polar codes," IEEE TCOM 2018.

[3] Ercan et al., "Practical dynamic SC-flip polar decoders: algorithm and implementation," IEEE TSP 2020.

# Simulation Results - Fast-DSCF Decoding

- 5G PC(1024, 512), 16-bit 5G CRC,  $T_{\max} = 400$ .



[1] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

[2] Chandresris et al., "Dynamic-SCFlip decoding of polar codes," IEEE TCOM 2018.

[3] Ercan et al., "Practical dynamic SC-flip polar decoders: algorithm and implementation," IEEE TSP 2020.

# Simulation Results - Fast-DSCF Decoding

- 5G PC(1024, 512), 16-bit 5G CRC,  $T_{\max} = 400$ .



[1] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

[2] Chandresris et al., "Dynamic-SCFlip decoding of polar codes," IEEE TCOM 2018.

[3] Ercan et al., "Practical dynamic SC-flip polar decoders: algorithm and implementation," IEEE TSP 2020.

# Simulation Results - Fast-DSCF Decoding

- 5G PC(1024, 512), 16-bit 5G CRC,  $T_{\max} = 400$ .



[1] Ercan et al., "Energy-efficient hardware architectures for fast polar decoders," IEEE TCAS-I 2019.

[2] Chandresris et al., "Dynamic-SCFlip decoding of polar codes," IEEE TCOM 2018.

[3] Ercan et al., "Practical dynamic SC-flip polar decoders: algorithm and implementation," IEEE TSP 2020.

# Implementation Results: Fast-DSCF Decoder

- ▶ TSMC 65nm CMOS technology
- ▶  $PC(1024, 512)$

|                                         | SC-List [1] | SC-List [2] | This Work <sup>[3]</sup> |
|-----------------------------------------|-------------|-------------|--------------------------|
| Frequency (MHz)                         | 676         | 911         | 410                      |
| Area (mm <sup>2</sup> )                 | 4.21        | 3.90        | <b>0.55</b>              |
| Latency ( $\mu$ s)                      | <b>0.76</b> | 1.60        | 1.11                     |
| Worst Case Latency ( $\mu$ s)           | <b>0.76</b> | 1.60        | 447                      |
| Average Throughput (Mbps)               | <b>1340</b> | 637         | 935                      |
| Area Efficiency (Gbps/mm <sup>2</sup> ) | 0.32        | 0.16        | <b>0.94</b>              |
| Power (mW)                              | —           | —           | 201                      |
| Energy (pJ/info. bit)                   | —           | —           | 439                      |

<sup>[1]</sup> Xia et al., "A high-throughput architecture of list successive cancellation polar codes decoder with large list size," in IEEE TSP, 2018.

<sup>[2]</sup> Fan et al., "A low latency list successive-cancellation decoding implementation for polar codes," IEEE JSAC 2016.

<sup>[3]</sup> Ercan et al., "Practical dynamic SC-flip polar decoders: algorithm and implementation," IEEE TSP 2020.

# Implementation Results: Fast-DSCF Decoder

- ▶ TSMC 65nm CMOS technology
- ▶  $PC(1024, 512)$

|                                         | SC-List [1] | SC-List [2] | This Work <sup>[3]</sup> |
|-----------------------------------------|-------------|-------------|--------------------------|
| Frequency (MHz)                         | 676         | 911         | 410                      |
| Area (mm <sup>2</sup> )                 | 4.21        | 3.90        | <b>0.55</b> ← 7.1 ×      |
| Latency (μs)                            | <b>0.76</b> | 1.60        | 1.11                     |
| Worst Case Latency (μs)                 | <b>0.76</b> | 1.60        | 447                      |
| Average Throughput (Mbps)               | <b>1340</b> | 637         | 935                      |
| Area Efficiency (Gbps/mm <sup>2</sup> ) | 0.32        | 0.16        | <b>0.94</b>              |
| Power (mW)                              | —           | —           | 201                      |
| Energy (pJ/info. bit)                   | —           | —           | 439                      |

<sup>[1]</sup> Xia et al., "A high-throughput architecture of list successive cancellation polar codes decoder with large list size," in IEEE TSP, 2018.

<sup>[2]</sup> Fan et al., "A low latency list successive-cancellation decoding implementation for polar codes," IEEE JSAC 2016.

<sup>[3]</sup> Ercan et al., "Practical dynamic SC-flip polar decoders: algorithm and implementation," IEEE TSP 2020.

# Implementation Results: Fast-DSCF Decoder

- ▶ TSMC 65nm CMOS technology
- ▶  $PC(1024, 512)$

|                                         | SC-List [1] | SC-List [2] | This Work <sup>[3]</sup>                        |
|-----------------------------------------|-------------|-------------|-------------------------------------------------|
| Frequency (MHz)                         | 676         | 911         | 410 <span style="color: green;">← 50%</span>    |
| Area (mm <sup>2</sup> )                 | 4.21        | 3.90        | 0.55 <span style="color: green;">← 7.1 ×</span> |
| Latency ( $\mu$ s)                      | 0.76        | 1.60        | 1.11 <span style="color: red;">← 29%</span>     |
| Worst Case Latency ( $\mu$ s)           | 0.76        | 1.60        | 447                                             |
| Average Throughput (Mbps)               | 1340        | 637         | 935 <span style="color: red;">← 30%</span>      |
| Area Efficiency (Gbps/mm <sup>2</sup> ) | 0.32        | 0.16        | 0.94                                            |
| Power (mW)                              | —           | —           | 201                                             |
| Energy (pJ/info. bit)                   | —           | —           | 439                                             |

<sup>[1]</sup> Xia et al., "A high-throughput architecture of list successive cancellation polar codes decoder with large list size," in IEEE TSP, 2018.

<sup>[2]</sup> Fan et al., "A low latency list successive-cancellation decoding implementation for polar codes," IEEE JSAC 2016.

<sup>[3]</sup> Ercan et al., "Practical dynamic SC-flip polar decoders: algorithm and implementation," IEEE TSP 2020.

# Implementation Results: Fast-DSCF Decoder

- ▶ TSMC 65nm CMOS technology
- ▶  $PC(1024, 512)$

|                                         | SC-List [1] | SC-List [2] | This Work [3] |         |
|-----------------------------------------|-------------|-------------|---------------|---------|
| Frequency (MHz)                         | 676         | 911         | 410           | ← 50%   |
| Area (mm <sup>2</sup> )                 | 4.21        | 3.90        | 0.55          | ← 7.1 × |
| Latency (μs)                            | 0.76        | 1.60        | 1.11          | ← 29%   |
| Worst Case Latency (μs)                 | 0.76        | 1.60        | 447           |         |
| Average Throughput (Mbps)               | 1340        | 637         | 935           | ← 30%   |
| Area Efficiency (Gbps/mm <sup>2</sup> ) | 0.32        | 0.16        | 0.94          | ← 3 ×   |
| Power (mW)                              | —           | —           | 201           |         |
| Energy (pJ/info. bit)                   | —           | —           | 439           |         |

[1] Xia et al., "A high-throughput architecture of list successive cancellation polar codes decoder with large list size," in IEEE TSP, 2018.

[2] Fan et al., "A low latency list successive-cancellation decoding implementation for polar codes," IEEE JSAC 2016.

[3] Ercan et al., "Practical dynamic SC-flip polar decoders: algorithm and implementation," IEEE TSP 2020.

# Conclusions

- ▶ Algorithms and implementations for practical and energy-efficient polar decoders
- ▶ Energy-efficient Fast-SCF decoding:
  - ▶ Reduced memory, high throughput
  - ▶ Energy-efficient alternative
- ▶ Practical Fast-DSCF decoding:
  - ▶ Tackled barriers towards hardware implementation
  - ▶ Area- and energy-efficient alternative

# Conclusions

- ▶ Algorithms and implementations for practical and energy-efficient polar decoders
- ▶ Energy-efficient Fast-SCF decoding:
  - ▶ Reduced memory, high throughput
  - ▶ Energy-efficient alternative
- ▶ Practical Fast-DSCF decoding:
  - ▶ Tackled barriers towards hardware implementation
  - ▶ Area- and energy-efficient alternative

# Conclusions

- ▶ Algorithms and implementations for practical and energy-efficient polar decoders
- ▶ Energy-efficient Fast-SCF decoding:
  - ▶ Reduced memory, high throughput
  - ▶ Energy-efficient alternative
- ▶ Practical Fast-DSCF decoding:
  - ▶ Tackled barriers towards hardware implementation
  - ▶ Area- and energy-efficient alternative

Thank you!