

# Altera CPLDs and Small FPGAs



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

# Altera Large FPGAs

- Arria V
- Stratix V
- Arria 10
- Stratix 10



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

In this video we will continue our survey of modern programmable logic devices, with Large FPGAs from Altera, including the Arria V, the Stratix V, the Arria 10, and the Stratix 10.

## Programmable Logic Device Selection Criteria

1. Reprogrammability (Configuration Memory Type)
2. Size or Logic Density (amount of logic in systems gates, LEs, Slices, ALMs, etc.)
3. Cost per logic gate
4. Speed (Maximum clock frequency)
5. Power Consumption (static and dynamic)
6. Cost per I/O (I/O Density) and extent of supported I/O standards
7. Hard IP available on chip (Memory, DSP Blocks, Transceivers, etc.)
8. Deterministic timing (timing is consistent in every implementation)
9. Reliability (FIT rate)
10. Endurance (number of programming cycles and years of retention)
11. Design and Data Security



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

The goal in programmable logic device selection is to pick the best fit part for our requirements. We will use the programmable logic device selection criteria we established earlier to evaluate these devices as listed here.

# Altera

Table 4: Maximum Resource Counts for Arria V GX Devices

| Resource                     | Member Code |         |         |         |         |         |         |         |
|------------------------------|-------------|---------|---------|---------|---------|---------|---------|---------|
|                              | A1          | A3      | A5      | A7      | B1      | B3      | B5      | B7      |
| Logic Elements (LE) (K)      | 75          | 156     | 190     | 242     | 300     | 362     | 420     | 504     |
| ALM                          | 28,302      | 58,900  | 71,698  | 91,680  | 113,208 | 136,880 | 158,491 | 190,240 |
| Register                     | 113,208     | 235,600 | 286,792 | 366,720 | 452,832 | 547,520 | 633,964 | 760,960 |
| Memory (Mb)                  | M10K        | 8,000   | 10,510  | 11,800  | 13,660  | 15,100  | 17,260  | 20,540  |
|                              | MLAB        | 463     | 961     | 1,173   | 1,448   | 1,852   | 2,098   | 2,532   |
| Variable-precision DSP Block |             | 240     | 396     | 600     | 800     | 920     | 1,045   | 1,156   |
| 6 Gbps Transceiver           |             | 9       | 9       | 24      | 24      | 24      | 36      | 36      |
| GPIO <sup>(3)</sup>          | 416         | 416     | 544     | 544     | 704     | 704     | 704     | 704     |
| LVD S                        | Transmitter | 67      | 67      | 120     | 120     | 160     | 160     | 160     |
|                              | Receiver    | 80      | 80      | 136     | 136     | 176     | 176     | 176     |
| PCIe Hard IP Block           |             | 1       | 1       | 2       | 2       | 2       | 2       | 2       |
| Hard Memory Controller       |             | 2       | 2       | 4       | 4       | 4       | 4       | 4       |



Copyright © 2017 University of Colorado

To start, let's look at the product table for the Arria V mid-range FPGA. The Altera Arria V has reprogrammable SRAM configuration and routing, so it needs an external nonvolatile configuration memory. On power up, the device transfers the configuration information to the internal SRAM.

Notice there are up to 500,000 logic elements, about double that of a Cyclone V. [Annotate]

Speed is limited to 625 MHz on global clock buffers.

Power is not specified, use the Power Play Analyzer tool to determine it.

This device has up to 704 IO pins, a 715:1 ratio of logic cells to I/O. [Annotate]

A number of hard IP blocks have been added, including Block memory [Annotate], DSP blocks [Annotate], High Speed Transceivers at 6.6 Gbps [Annotate] standard, with transceiver parts to 12.5 Gbps, and External Memory Interfaces up to DDR3 at 1600 Mbps. [Annotate].

# Altera



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

Here is a picture of the Arria V layout, with an array of logic blocks interspersed with memory and DSP blocks, and the Transceivers and Hard DDR memory interfaces on the outside.

# Altera

Figure 1-7: ALM High-Level Block Diagram for Arria V GX, GT, SX, and, ST Devices



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

This is a diagram of the Arria V ALM, which looks just like the Cyclone V ALM – made of 2 6-input LUTs driving 4 flip-flop registers. Once the FPGA designer develops a logic cell architecture they like, they tend to repeat it.

**Altera**

Table 3: Stratix V GK Device Features

| Features                               | SSGXA<br>3        | SSGXA<br>4  | SSGXA<br>5       | SSGXA<br>7       | SSGXA<br>9    | SSGXA<br>8    | SSGXB<br>5 | SSGXB<br>6 | SSGXB<br>9 | SSGXB<br>8 |
|----------------------------------------|-------------------|-------------|------------------|------------------|---------------|---------------|------------|------------|------------|------------|
| Logic Elements (K)                     | 340               | 420         | 490              | 622              | 840           | 952           | 490        | 597        | 840        | 952        |
| ALMs                                   | 128,300           | 158,500     | 185,000          | 234,720          | 317,000       | 359,200       | 185,000    | 225,400    | 317,000    | 359,200    |
| Registers (K)                          | 513               | 634         | 740              | 939              | 1,268         | 1,437         | 740        | 902        | 1,268      | 1,437      |
| 14.1-Gbps Transceivers                 | 12, 24,<br>or 36  | 24 or<br>36 | 24, 36,<br>or 48 | 24, 36,<br>or 48 | 36 or<br>48   | 36 or<br>48   | 66         | 66         | 66         | 66         |
| PCIe hard IP Blocks                    | 1 or 2            | 1 or 2      | 1, 2, or<br>4    | 1, 2, or<br>4    | 1, 2, or<br>4 | 1, 2, or<br>4 | 1 or 4     | 1 or 4     | 1 or 4     | 1 or 4     |
| Fractional PLUs                        | 20 <sup>(5)</sup> | 24          | 28               | 28               | 28            | 28            | 24         | 24         | 32         | 32         |
| M20K Memory Blocks                     | 957               | 1,900       | 2,304            | 2,560            | 2,640         | 2,640         | 2,100      | 2,660      | 2,640      | 2,640      |
| M20K Memory (MBits)                    | 19                | 37          | 45               | 50               | 52            | 52            | 41         | 52         | 52         | 52         |
| Variable Precision Multipliers (18x18) | 512               | 512         | 512              | 512              | 704           | 704           | 798        | 798        | 704        | 704        |
| Variable Precision Multipliers (27x27) | 256               | 256         | 256              | 256              | 352           | 352           | 399        | 399        | 352        | 352        |

University of Colorado Boulder Copyright © 2017 University of Colorado

Now consider Stratix V large FPGA.

The Altera Stratix V has reprogrammable SRAM configuration and routing, so it needs an external nonvolatile configuration memory. On power up, the device transfers the configuration information to the internal SRAM.

Notice there are up to 950,000 logic elements, about double that of an Arria V.  
[Annotate]

Speed is limited to 717 MHz on global clock buffers.

Power is not specified, use the Power Play Analyzer tool to determine it.

This device has up to 840 IO pins, a 1100:1 ratio of logic cells to I/O. [Annotate]

A number of hard IP blocks have been added, including High Speed Transceivers up to 28.05 Gbps (in the GT part) [Annotate], Block memory [Annotate], Multipliers to 27 by 27 precision [Annotate], DSP blocks, and External Memory Interfaces to DDR3 at 1600 Mbps.

# Altera



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

Here is a picture of the Stratix V layout, with an array of logic blocks interspersed with memory and DSP blocks, and the Transceivers on the Left and Right sides, and Hard DDR memory interfaces on the top and bottom of the device.

# Altera

Figure 1-6: ALM High-Level Block Diagram for Stratix V Devices



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

This is a diagram of the Stratix V ALM, which looks just like the Cyclone V or Arria V ALM – made of 2 6-input LUTs driving 4 flip-flop registers. You know how this one works already, thanks to Altera's lack of imagination. Actually, we know their imagination is very good, but they know a good design when they see one.

# Altera

Table 6: Maximum Resource Counts for Arria 10 GX Devices (GX 570, GX 660, GX 900, and GX 1150)—Preliminary

| Resource                     | Product Line         |                 |                 |                 |
|------------------------------|----------------------|-----------------|-----------------|-----------------|
|                              | GX 570               | GX 660          | GX 900          | GX 1150         |
| Logic Elements (LE) (K)      | 570                  | 660             | 900             | 1,150 ←         |
| ALM                          | 217,080              | 250,540         | 339,620         | 427,200         |
| Register                     | 868,320              | 1,002,160       | 1,358,480       | 1,708,800       |
| Memory (Kb)                  | M20K<br>MLAB         | 36,000<br>5,096 | 42,620<br>5,788 | 48,460<br>9,386 |
| Variable-precision DSP Block |                      | 1,523           | 1,678           | 1,518           |
| 18 x 19 Multiplier           |                      | 3,016           | 3,356           | 3,036           |
| PLL                          | Fractional Synthesis | 16              | 16              | 32              |
|                              | I/O                  | 16              | 16              | 16              |
| 17.4 Gbps Transceiver        |                      | 48              | 48              | 96              |
| GPIO <sup>(2)</sup>          |                      | 696             | 696             | 768 ←           |



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

Now consider Arria 10 large FPGA.

The Altera Arria 10 has reprogrammable SRAM configuration and routing, so it needs an external nonvolatile configuration memory. On power up, the device transfers the configuration information to the internal SRAM.

Notice there are up to over 1 million logic elements, about double that of an Arria V.  
[Annotate]

Speed is limited to 644 MHz on global clock buffers, not much faster than Arria V.

Power is not specified, use the Power Play Analyzer tool to determine it.

This device has up to 840 IO pins, a 1100:1 ratio of logic cells to I/O. [Annotate]

A number of hard IP blocks have been added, including High Speed Transceivers up to 28.3 Gbps, [Annotate], Block memory [Annotate], Multipliers, [Annotate], DSP blocks, 10G Ethernet interfaces, and External Memory Interfaces with DDR4 up to 2666 Mbps.

Altera

Figure 5: ALM for Arria 10 Devices



This is a diagram of the Arria 10 ALM, which is the same as the Cyclone V ALM – made of an 8-input adaptive LUT (usually configured as 2 6-input LUTs), driving 4 flip-flop registers.

The table provides a detailed comparison of Altera Stratix 10 devices across different families (GX, SX, LX, and EX) and models (500, 650, 850, 1100, 1650, 2100, 2500, 2800, 4500, 5500). It includes columns for logic elements, memory, registers, and various IP blocks like multipliers, DSPs, and Ethernet. The table also lists security features like AES encryption and secure boot.

| Product Line                                                           | GX 500<br>SX 500                                                                                                                                                                                                                                                                                                | GX 650<br>SX 650 | GX 850<br>SX 850 | GX 1100<br>SX 1100 | GX 1650<br>SX 1650 | GX 2100<br>SX 2100 | GX 2500<br>SX 2500 | GX 2800<br>SX 2800 | GX 4500<br>SX 4500 | GX 5500<br>SX 5500 |
|------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
| Logic elements (LEs)                                                   | 484,000                                                                                                                                                                                                                                                                                                         | 646,000          | 841,000          | 1,092,000          | 1,624,000          | 2,005,000          | 2,427,000          | 2,753,000          | 4,463,000          | 5,510,000          |
| Adaptive logic modules (ALMs)                                          | 164,160                                                                                                                                                                                                                                                                                                         | 218,880          | 284,960          | 370,080            | 550,540            | 676,680            | 821,150            | 933,170            | 1,512,820          | 1,867,680          |
| ALM registers                                                          | 656,640                                                                                                                                                                                                                                                                                                         | 875,520          | 1,139,840        | 1,480,320          | 2,202,160          | 2,718,720          | 3,284,600          | 3,732,480          | 6,051,280          | 7,470,720          |
| Hyper-Registers from HyperFlex™ architecture                           |                                                                                                                                                                                                                                                                                                                 |                  |                  |                    |                    |                    |                    |                    |                    |                    |
| Programmable clock trees synthesizable                                 |                                                                                                                                                                                                                                                                                                                 |                  |                  |                    |                    |                    |                    |                    |                    |                    |
| M20K memory blocks                                                     | 2,196                                                                                                                                                                                                                                                                                                           | 2,583            | 3,477            | 4,401              | 5,851              | 6,501              | 9,963              | 11,721             | 7,033              | 7,033              |
| M20K memory size (Mb)                                                  | 43                                                                                                                                                                                                                                                                                                              | 50               | 68               | 86                 | 114                | 127                | 195                | 229                | 137                | 137                |
| MLAB memory size (Mb)                                                  | 3                                                                                                                                                                                                                                                                                                               | 3                | 4                | 6                  | 8                  | 11                 | 13                 | 15                 | 23                 | 29                 |
| Variable-precision digital signal processing (DSP) blocks              | 1,152                                                                                                                                                                                                                                                                                                           | 1,440            | 2,016            | 2,570              | 3,145              | 3,744              | 5,011              | 5,760              | 1,980              | 1,980              |
| 18 x 19 multipliers                                                    | 2,304                                                                                                                                                                                                                                                                                                           | 2,888            | 4,032            | 5,040              | 6,290              | 7,488              | 10,022             | 11,520             | 3,960              | 3,960              |
| Peak fixed-point performance (TMACs) <sup>v</sup>                      | 4.6                                                                                                                                                                                                                                                                                                             | 5.8              | 8.1              | 10.1               | 12.6               | 15.0               | 20.0               | 23.0               | 7.9                | 7.9                |
| Peak floating-point performance (TFLOPS) <sup>f</sup>                  | 1.8                                                                                                                                                                                                                                                                                                             | 2.3              | 3.2              | 4.0                | 5.0                | 6.0                | 8.0                | 9.2                | 3.2                | 3.2                |
| Hyper-Registers distributed throughout the monolithic FPGA fabric      |                                                                                                                                                                                                                                                                                                                 |                  |                  |                    |                    |                    |                    |                    |                    |                    |
| Resources                                                              |                                                                                                                                                                                                                                                                                                                 |                  |                  |                    |                    |                    |                    |                    |                    |                    |
| Hundreds of synthesizable clock trees                                  |                                                                                                                                                                                                                                                                                                                 |                  |                  |                    |                    |                    |                    |                    |                    |                    |
| Secure device manager                                                  | AES-256/SHA-256 bitstream encryption/authentication, physically unclonable function (PUF), ECDSA 256/384 boot code authentication, side channel attack protection                                                                                                                                               |                  |                  |                    |                    |                    |                    |                    |                    |                    |
| Hard processor system <sup>g</sup>                                     | Quad-core 64 bit ARM® Cortex®-A53 up to 1.5 GHz with 32 KB I/D cache, NEON™ coprocessor, 1 MB L2 cache, direct memory access (DMA), system memory management unit, cache coherency unit, hard memory controllers, USB 2.0 x2, 1G EMAC x3, UART x2, SPI x4, I2C x5, general-purpose timers x7, watchdog timer x4 |                  |                  |                    |                    |                    |                    |                    |                    |                    |
| Maximum user I/O pins                                                  | 488                                                                                                                                                                                                                                                                                                             | 488              | 736              | 736                | 704                | 704                | 1160               | 1160               | 1640               | 1640               |
| Maximum LVDS pairs 1.6 Gbps (RX or TX)                                 | 240                                                                                                                                                                                                                                                                                                             | 240              | 360              | 360                | 336                | 336                | 576                | 576                | 816                | 816                |
| Total full duplex transceiver count                                    | 24                                                                                                                                                                                                                                                                                                              | 24               | 48               | 48                 | 96                 | 96                 | 144                | 144                | 72                 | 72                 |
| GTX full duplex transceiver count (up to 30 Gbps) <sup>d</sup>         | 16                                                                                                                                                                                                                                                                                                              | 16               | 32               | 32                 | 64                 | 64                 | 96                 | 96                 | 48                 | 48                 |
| GX full duplex transceiver count (up to 17.4 Gbps)                     | 8                                                                                                                                                                                                                                                                                                               | 8                | 16               | 16                 | 32                 | 32                 | 48                 | 48                 | 24                 | 24                 |
| PCI Express® (PCIe®) hard intellectual property (IP) blocks (Gen3 x16) | 1                                                                                                                                                                                                                                                                                                               | 1                | 2                | 2                  | 4                  | 4                  | 6                  | 6                  | 3                  | 3                  |
| Memory devices supported                                               | DDR4, DDR3, DDR2, DDR, QDR II, QDR II+, RLDRAM II, RLDRAM 3, HMC, Motsys                                                                                                                                                                                                                                        |                  |                  |                    |                    |                    |                    |                    |                    |                    |
| I/O and Architectural Features                                         |                                                                                                                                                                                                                                                                                                                 |                  |                  |                    |                    |                    |                    |                    |                    |                    |
| University of Colorado Boulder                                         | Copyright © 2017 University of Colorado                                                                                                                                                                                                                                                                         |                  |                  |                    |                    |                    |                    |                    |                    |                    |

Lastly let's look at the Stratix 10 large FPGA.

The Altera Stratix 10 has reprogrammable SRAM configuration and routing, so it needs an external nonvolatile configuration memory. On power up, the device transfers the configuration information to the internal SRAM.

Notice there are up to over 5.5 million logic elements, 5 times that of an Arria 10.  
[Annotate]

Speed is limited to 1100 MHz on global clock buffers, the fastest FPGA in existence at this time.

Power is not specified, use the Power Play Analyzer tool to determine it.

This device has up to 1640 IO pins, a 3300:1 ratio of logic cells to I/O. [Annotate]

A number of hard IP blocks have been added, including High Speed Transceivers up to 30 Gbps, [Annotate], Block memory, Multipliers, DSP blocks, 10G Ethernet interfaces, and External Memory Interfaces with DDR4 up to 2666 Mbps.

This part is a beast. Imagine what you can do with something this powerful.

# Altera Hyper-Flex Architecture

## The HyperFlex Advantage

The key innovations that contribute to the HyperFlex advantage are:

### *Registers Everywhere*

The registers in the interconnect routing, called Hyper-Registers

### *Enhanced Core Clocking*

Localized clock trees reduce skew and timing uncertainty.

### *Hyper-Aware Design Flow*

The Hyper-Aware design flow includes three new improvements: a Fast Forward Compile tool, a Hyper-Retimer step, and enhanced synthesis and place-and-route algorithms that use the Hyper-Registers.



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

## Altera's Hyperflex Architecture

## The HyperFlex Advantage

The key innovations that contribute to the HyperFlex advantage are:

### *Registers Everywhere*

The “registers everywhere” in the interconnect routing, called Hyper- Registers, are distinct from the conventional registers that are contained within the adaptive logic modules (ALMs). A Hyper-Register is associated with each individual routing segment in the devices.

### *Enhanced Core Clocking*

The programmable clock tree synthesis allows system designers to create localized clock trees, reducing skew and timing uncertainty to obtain maximum core clocking performance. This capability is a key feature that allows the HyperFlex architecture to reach 2X performance. *Hyper-Aware Design Flow*

The Hyper-Aware design flow includes three new improvements: a Fast Forward Compile tool, a Hyper-Retimer step, and enhanced synthesis and place-and-route algorithms that use the Hyper-Registers.

Altera



Figure 2. "Registers Everywhere" HyperFlex Architecture



Copyright © 2017 University of Colorado

Here's a picture of the hyperflex architecture, with registers in orange at every routing node.

# Altera

For more about Hyperflex Architecture, please see this HyperFlex Video:

<https://www.altera.com/support/training/videos/hyperflex-architecture-overview-video.tablet.html>



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

For more about Hyperflex Architecture, please see this HyperFlex Video:

<https://www.altera.com/support/training/videos/hyperflex-architecture-overview-video.tablet.html>

## How large of a comparator can be made using a Stratix 10 Logic Cell?



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

Recall the 4-bit comparator.

How many comparator bits can be implemented in a LUT? In the Stratix 10 case, the 6 independent input LUT will handle 3 bits of comparator, so wider comparators can be made with less delay.

How many full adders can be created in a Stratix 10?



1 bit full adder



4 bit adder in 3 ALMs

How many full adders can be made in a logic cell? For the Stratix 10 the ALM is very similar to the Cyclone V ALM, and from a previous video we used 3 Cyclone V ALMs to implement a 4-bit adder, so each ALM creates about 1.33 adders.

## Altera Large FPGAs Summary

- D- Altera offers the Arria V, Stratix V, Arria 10 and Stratix 10 FPGAs for large designs.
- D- Arria V and Stratix V are large devices with up to a million gates and integration of many hard IP blocks to create very powerful parts with great processing power.
- D- The Arria 10 increases the processing power further, with 28 Gbps serial transceivers and 2666 Mbps DDR4 DRAM interface.
- D- The Stratix 10 is the largest and fastest FPGA currently in production, with over 5.5 million logic elements and core clock speed of 1100 MHz.
- D- Altera's Hyperflex Architecture is a revolutionary change that will increase FPGA performance further, doubling performance at equivalent power levels.



University of Colorado **Boulder**

Copyright © 2017 University of Colorado

In this video we have learned

- Altera offers the Arria V, Stratix V, Arria 10 and Stratix 10 FPGAs for large designs.
- Arria V and Stratix V are large devices with up to a million gates and integration of many hard IP blocks to create very powerful parts with great processing power.
- The Arria 10 increases the processing power further, with 28 Gbps serial transceivers and 2666 Mbps DDR4 DRAM interface.
- The Stratix 10 is the largest and fastest FPGA currently in production, with over 5.5 million logic elements and core clock speed of 1100 MHz.
- Altera's Hyperflex Architecture is a revolutionary change that will increase FPGA performance further, doubling performance at equivalent power levels.