

# Lincoln AI Computing Survey (LAICS) and Trends

Albert Reuther, Peter Michaleas, Michael Jones,  
Vijay Gadepally, and Jeremy Kepner

*MIT Lincoln Laboratory Supercomputing Center  
Lexington, MA, USA  
{reuther,pmichaleas,michael.jones,vijayg,kepner}@ll.mit.edu*

**Abstract**—In the past year, generative AI (GenAI) models have received a tremendous amount of attention, which in turn has increased attention to computing systems for training and inference for GenAI. Hence, an update to this survey is due. This paper is an update of the survey of AI accelerators and processors from past seven years, which is called the Lincoln AI Computing Survey – LAICS (pronounced “lace”). This multi-year survey collects and summarizes the current commercial accelerators that have been publicly announced with peak performance and peak power consumption numbers. In the same tradition of past papers of this survey, the performance and power values are plotted on a scatter graph, and a number of dimensions and observations from the trends on this plot are again discussed and analyzed. Market segments are highlighted on the scatter plot, and zoomed plots of each segment are also included. A brief description of each of the new accelerators that have been added in the survey this year is included, and this update features a new categorization of computing architectures that implement each of the accelerators.

**Index Terms**—Machine learning, GPU, TPU, tensor, dataflow, CGRA, accelerator, embedded inference, computational performance

## I. INTRODUCTION

In the past 10 years, artificial intelligence and machine learning (AI/ML) has garnered much attention, both in the technical press and in the general media. Starting with deep neural networks (DNNs), then convolutional neural networks (CNNs), and recently generative AI (GenAI), the advances and ensuing exuberance has been energized, in part, by ever-increasing parallel computing capabilities. Much of this capability so far has been delivered to the data center marketplace by Nvidia GPUs, but the sheer size of the AI accelerator market, both in data centers and various embedded “edge” applications, has attracted vast amounts of venture funding for startups and internal development funding in established companies. Further, it has brought to light a wave of innovation in computational architectures, processor-memory interactions, and numerical methods which have been, in many cases, decades in the making.

Perhaps most notable is the emergence of very large GenAI foundation models, which has driven the recent computa-

Distribution Statement A. Approved for public release. Distribution is unlimited. This material is based upon work supported by the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001 or FA8702-25-D-B002. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Under Secretary of Defense for Research and Engineering.

tional demand for both training and inference. Before these GenAI models, AI accelerators focused on matrix-matrix fused multiply-add operations, while GenAI models have been emphasizing more matrix-vector fused multiply-add operations and high memory bandwidth to compensate for somewhat lower arithmetic intensity [1]. For training these very large models many more accelerators are being used simultaneously in a synchronous parallel manner interconnected with very high bandwidth networks: Infiniband, NV-Link, and Converged/Ultra Ethernet. However, this survey continues to focus on the accelerators themselves across a wide range of deployed applications – from sub-watt to multi-kilowatts – and on single instances of deployment rather than multiple networked instances. Many new AI accelerators have come to market since the last published iteration of this survey in September 2023, and more insight into the programmability and functionality have been published so another edition of this survey is surely due.

As in past years, this paper continues to focus on accelerators and processors that are geared toward deep neural networks (DNNs), convolutional neural networks (CNNs), and generative AI as they are quite computationally intense. This survey focuses on accelerators and processors for inference for a variety of reasons including that defense and national security AI/ML edge applications rely heavily on inference, though plenty of accelerators in the survey are capable of both inference and training, both computationally and with numerical precisions. And we will consider all of the numerical precision types that an accelerator supports, but for most of them, their best inference performance (both computationally and in accuracy) is in int8 and/or fp16/bf16 (IEEE 16-bit floating point or Google’s 16-bit brain float) precisions.

For much of the background of this study, please refer to the previous IEEE-HPEC papers that our team has published [2]–[6]. The background material in these papers include explanations of the AI ecosystem architecture, the history of the emergence of AI accelerators and accelerators in general, a more detailed explanation of the survey scatter plots, and a discussions of broader observations and trends during each of those years.

## II. RELATED WORK

With the amount of research articles and technical press attention that AI accelerators have been given recently, it

should not be surprising that quite a number of surveys have been published recently. There are many surveys [7]–[18] and other papers that cover various aspects of AI accelerators. For instance, the first paper in this multi-year survey included the peak performance of FPGAs for certain AI models; however, several of the aforementioned surveys cover FPGAs in depth so they are no longer included in this survey. Similarly, early surveys often covered research accelerators that were designed, and sometimes produced and tested, to research various features, design choices, and technologies. However, as more commercial accelerators came to market, these research accelerators became less relevant in discussions about what accelerator to use for a project or to deploy in a data center. Hence, research accelerators are no longer included in this survey.

This multi-year survey effort and this paper continues to focus on gathering a comprehensive list of AI accelerators with their computational capability, power efficiency, and ultimately the computational effectiveness of utilizing accelerators in embedded and data center applications. Along with this focus, this paper mainly compares neural network accelerators that are useful for government and industrial sensor and data processing applications. A few accelerators and processors that were included in previous years’ papers have been left out of this year’s survey. They have been dropped because they have been surpassed by new accelerators from the same company, they are no longer offered, or they are no longer relevant to the topic.

### III. SURVEY OF PROCESSORS

This paper is an update to IEEE-HPEC papers from the past seven years [2]–[6]. This survey continues to cast a wide net to include accelerators and processors for a variety of applications including defense and national security AI/ML edge applications. The survey collects information on all of the numerical precision types that an accelerator supports, but for most of them, their best inference performance is in int8 or fp16/bf16, so that is what usually is plotted. This survey gathers peak performance and power information from publicly available materials including research papers, technical trade press, and company benchmarks. We have been gathering peak performance and power because it is the most effective for grouping them into application/deployment categories.

All of the AI accelerators for which a marker is plotted is listed in Table I, which summarizes some of the important metadata of the accelerators, cards, and systems, including the labels used in Figure 1. The key metrics of this public data are plotted in Figure 1, which graphs recent processor capabilities (as of Summer 2025) mapping peak performance vs. peak power consumption. As in past years, the x-axis indicates peak power, and the y-axis indicate peak giga-operations per second (GOps/s), both on a logarithmic scale. The computational precision of the processing capability is depicted by the marker used. The form factor, for which peak power is reported, is depicted by color: blue corresponds to a single chip; orange corresponds to a card; and green corresponds to entire

systems (embedded system, single node desktop and server systems). Finally, the hollow markers are peak performance for inference-only accelerators, while the solid markers are performance for accelerators that are designed to perform both training and inference. The same reasonable categorization of accelerators follows their intended application type. The five categories are shown as ellipses on the graph, which roughly correspond to performance and power consumption: Very Low Power for wake word detection, speech processing, very small sensors, etc.; Embedded for cameras, small UAVs, robots, etc.; Autonomous for driver assist services, autonomous driving, and autonomous robots; Data Center Chips and Cards; and Data Center Systems. A zoomed in scatter plot for each of these categories is shown in the subfigures of Figure 2.

TABLE I: List of accelerator metadata, accelerator category, and labels for plots.

| Company          | Product                | Label           | Country     | Technology | Form Factor | References |
|------------------|------------------------|-----------------|-------------|------------|-------------|------------|
| AIStorm          | AIStorm                | AIStorm         | USA         | analog     | Chip        | [19]       |
| Alibaba          | HanGuang 800           | Alibaba         | China       | tensor     | Card        | [20]       |
| Amazon           | Inferentia             | AWSI1           | USA         | tensor     | Card        | [21]       |
| Amazon           | Inferentia2            | AWSI2           | USA         | tensor     | Card        | [21]       |
| Amazon           | Trainium               | AWSt1           | USA         | tensor     | Card        | [21]       |
| AMD              | MI210                  | AMD-MI210       | USA         | GPU        | Card        | [22]       |
| AMD              | MI250                  | AMD-MI250       | USA         | GPU        | Card        | [22]       |
| AMD              | MI300X                 | AMD-MI300X      | USA         | GPU        | Card        | [22]       |
| AMD              | MI325X                 | AMD-MI325X      | USA         | GPU        | Card        | [23]       |
| AMD              | MI350X                 | AMD-MI350X      | USA         | GPU        | Card        | [24]       |
| AMD              | MI355X                 | AMD-MI355X      | USA         | GPU        | Card        | [24]       |
| ARM              | Ethos N77              | Ethos           | UK          | tensor     | Chip        | [25]       |
| Aspinity         | AML100                 | AML100          | USA         | analog     | Chip        | [26], [27] |
| Aspinity         | AML200                 | AML200          | USA         | analog     | Chip        | [27], [28] |
| Axlera           | Axlera Test Core       | Axlera          | Netherlands | manycore   | Chip        | [29]       |
| Baidu            | Baidu Kunlun 200       | Baidu-K1        | China       | manycore   | Chip        | [30]–[32]  |
| Baidu            | Baidu Kunlun II        | Baidu-K2        | China       | manycore   | Chip        | [33]       |
| Biren Technology | br100                  | br100           | China       | GPU        | Card        | [34]–[36]  |
| Biren Technology | br104                  | br104           | China       | GPU        | Card        | [34]–[36]  |
| Blaize           | El Cano                | Blaze           | USA         | manycore   | Card        | [37]       |
| Cambricon        | MLU290-M5              | Cambricon-M5    | China       | GPU        | Card        | [38], [39] |
| Cambricon        | MLU370-X8              | Cambricon-X8    | China       | GPU        | Card        | [38], [40] |
| Canaan           | Kendrite K210          | Kendryte        | Singapore   | CPU        | Chip        | [41]       |
| Cerebras         | CS-1                   | CS-1            | USA         | manycore   | System      | [42]       |
| Cerebras         | CS-2                   | CS-2            | USA         | manycore   | System      | [43]       |
| Cerebras         | CS-3                   | CS-3            | USA         | manycore   | System      | [44]       |
| HyperX Logic     | HyperX                 | HyperX          | USA         | manycore   | Chip        | [45]       |
| d-Matrix         | Corsair                | d-Matrix        | USA         | manycore   | Card        | [46]       |
| Enflame          | Cloudblazer T10        | Enflame         | China       | CPU        | Card        | [47]       |
| FuriosaAI        | RNGD                   | FuriosaRNGD     | S. Korea    | tensor     | Card        | [48], [49] |
| Google           | TPU Edge               | TPUedge         | USA         | tensor     | System      | [50]       |
| Google           | TPU1                   | TPU1            | USA         | tensor     | Chip        | [51], [52] |
| Google           | TPU2                   | TPU2            | USA         | tensor     | Chip        | [51], [52] |
| Google           | TPU3                   | TPU3            | USA         | tensor     | Chip        | [51]–[53]  |
| Google           | TPU4i                  | TPU4i           | USA         | tensor     | Chip        | [53]       |
| Google           | TPU4                   | TPU4            | USA         | tensor     | Chip        | [54]       |
| Google           | TPU5e                  | TPU5e           | USA         | tensor     | Chip        | [55]       |
| Google           | TPU5p                  | TPU5p           | USA         | tensor     | Chip        | [55]       |
| Google           | TPU6e                  | TPU6e           | USA         | tensor     | Chip        | [55]       |
| Google           | TPU7                   | TPU7            | USA         | tensor     | Chip        | [55]       |
| GraphCore        | C2                     | GraphCore       | UK          | manycore   | Card        | [56], [57] |
| GraphCore        | C2                     | GraphCoreNode   | UK          | manycore   | System      | [58]       |
| GraphCore        | Colossus Mk2           | GraphCore2      | UK          | manycore   | Card        | [59]       |
| GraphCore        | Bow-2000               | GraphCoreBow    | UK          | manycore   | Card        | [60]       |
| GreenWaves       | GAP8                   | GAP8            | France      | manycore   | Chip        | [61], [62] |
| GreenWaves       | GAP9                   | GAP9            | France      | manycore   | Chip        | [61], [62] |
| Groq             | Groq Node              | GroqNode        | USA         | tensor     | System      | [63]       |
| Groq             | Tensor Streaming Proc. | Groq            | USA         | tensor     | Card        | [56], [64] |
| Gyrfalcon        | Gyrfalcon              | Gyrfalcon       | USA         | manycore   | Chip        | [65]       |
| Gyrfalcon        | Gyrfalcon              | GyrfalconServer | USA         | manycore   | System      | [66]       |
| Hailo            | Hailo-8                | Hailo-8         | Israel      | manycore   | Chip        | [67]       |
| Hailo            | Hailo-15H              | Hailo-15        | Israel      | manycore   | Chip        | [68]       |
| Horizon Robotics | Journey2               | Journey2        | China       | tensor     | Chip        | [69]       |
| Huawei HiSilicon | Ascend 310             | Ascend-310      | China       | manycore   | Card        | [70]       |
| Huawei HiSilicon | Ascend 910A            | Ascend-910A     | China       | manycore   | Card        | [71], [72] |
| Huawei HiSilicon | Ascend 910B            | Ascend-910B     | China       | manycore   | Card        | [71], [72] |
| Huawei HiSilicon | Ascend 910C            | Ascend-910C     | China       | manycore   | Card        | [71], [72] |
| IBM              | NorthPole              | NorthPole       | USA         | manycore   | Chip        | [73]–[75]  |
| IBM              | Spyre AIU              | Spyre           | USA         | manycore   | Card        | [76], [77] |
| Intel            | Arria 10 1150          | Arria           | USA         | FPGA       | Chip        | [78], [79] |
| Intel            | Mobileye EyeQ5         | EyeQ5           | Israel      | manycore   | Chip        | [37]       |
| Intel            | Flex140                | Flex140         | USA         | GPU        | Card        | [80]       |
| Intel            | Flex170                | Flex170         | USA         | GPU        | Card        | [80]       |
| Intel Habana     | Gaudi                  | Gaudi           | Israel      | tensor     | Card        | [81], [82] |

| Company            | Product                 | Label          | Country  | Technology | Form Factor | References   |
|--------------------|-------------------------|----------------|----------|------------|-------------|--------------|
| Intel Habana       | Goya HL-1000            | Goya           | Israel   | tensor     | Card        | [81]–[83]    |
| Intel Habana       | Gaudi2                  | Gaudi2         | Israel   | tensor     | Card        | [84], [85]   |
| Intel Habana       | Gaudi3                  | Gaudi3         | Israel   | tensor     | Card        | [84]         |
| Kulraj             | Coolidge                | Kalray         | France   | manycore   | Chip        | [86], [87]   |
| Kneron             | KL720                   | KL720          | USA      | tensor     | Chip        | [88]         |
| Maxim              | Max 78000               | Maxim          | USA      | tensor     | Chip        | [89]–[91]    |
| MemoryX            | MX3                     | MX3            | USA      | manycore   | Chip        | [92]         |
| Meta/Facebook      | MTIA                    | MTIA           | USA      | manycore   | Card        | [93], [94]   |
| Meta/Facebook      | MTIA2i                  | MTIA2i         | USA      | manycore   | Card        | [95], [96]   |
| Moore Threads      | MTT S50                 | MTT-S50        | China    | GPU        | Chip        | [97]         |
| Moore Threads      | MTT S2000               | MTT-S2000      | China    | GPU        | Chip        | [97]         |
| Mythic             | M1076                   | Mythic76       | USA      | analog     | Chip        | [98]–[100]   |
| Mythic             | M1108                   | Mythic108      | USA      | analog     | Chip        | [98]–[100]   |
| Neuchips           | Raptor                  | NeuChipsRaptor | Taiwan   | tensor     | Card        | [101]        |
| NovaMind           | NovaTensor              | NovaMind       | USA      | tensor     | Chip        | [102], [103] |
| NVIDIA             | Ampere A10              | A10            | USA      | GPU        | Card        | [104]        |
| NVIDIA             | Ampere A100             | A100           | USA      | GPU        | Card        | [105]        |
| NVIDIA             | Ampere A800             | A800           | USA      | GPU        | Card        | [106]        |
| NVIDIA             | Ampere A30              | A30            | USA      | GPU        | Card        | [104]        |
| NVIDIA             | Ampere A40              | A40            | USA      | GPU        | Card        | [104]        |
| NVIDIA             | Broadwell               | B100           | USA      | GPU        | Card        | [107]        |
| NVIDIA             | Broadwell               | B200           | USA      | GPU        | Card        | [107]        |
| NVIDIA             | DGX-A100                | DGX-A100       | USA      | GPU        | System      | [108]        |
| NVIDIA             | DGX-H100                | DGX-H100       | USA      | GPU        | System      | [109]        |
| NVIDIA             | HGX-B200                | HGX-B200       | USA      | GPU        | System      | [110]        |
| NVIDIA             | Hopper H100 PCIe        | H100           | USA      | GPU        | Card        | [111]        |
| NVIDIA             | Hopper H100 SXM         | H100SXM        | USA      | GPU        | Card        | [112]        |
| NVIDIA             | Hopper H100 NVL         | H100NVL        | USA      | GPU        | Card        | [111]        |
| NVIDIA             | H800 SXM                | H800SXM        | USA      | GPU        | Card        | [113]        |
| NVIDIA             | H800 SXM PCIe           | H800           | USA      | GPU        | Card        | [113]        |
| NVIDIA             | Hopper H200 SMX         | H200SXM        | USA      | GPU        | Card        | [114]        |
| NVIDIA             | Hopper H200 NVL         | H200NVL        | USA      | GPU        | Card        | [114]        |
| NVIDIA             | H20                     | H20            | USA      | GPU        | Card        | [113]        |
| NVIDIA             | Jetson AGX Xavier       | XavierAGX      | USA      | GPU        | System      | [115]        |
| NVIDIA             | Jetson NX Orin          | OrinNX         | USA      | GPU        | System      | [116], [117] |
| NVIDIA             | Jetson AGX Orin         | OrinAGX        | USA      | GPU        | System      | [116], [117] |
| NVIDIA             | Jetson Xavier NX        | XavierNX       | USA      | GPU        | System      | [115]        |
| NVIDIA             | DRIVE AGX L2            | AGX-L2         | USA      | GPU        | System      | [118]        |
| NVIDIA             | DRIVE AGX L5            | AGX-L5         | USA      | GPU        | System      | [118]        |
| NVIDIA             | L4                      | L4             | USA      | GPU        | Card        | [111]        |
| NVIDIA             | L40                     | L40            | USA      | GPU        | Card        | [119]        |
| NVIDIA             | L40S                    | L40S           | USA      | GPU        | Card        | [120]        |
| NVIDIA             | T4                      | T4             | USA      | GPU        | Card        | [121]        |
| NVIDIA             | Volta V100              | V100           | USA      | GPU        | Card        | [122], [123] |
| Perceive           | Ergo                    | Perceive       | USA      | tensor     | Chip        | [124]        |
| Preferred Networks | MN-Core1                | MN-C1          | Japan    | manycore   | Card        | [125]–[127]  |
| Preferred Networks | MN-Core2                | MN-C2          | Japan    | manycore   | Card        | [125], [128] |
| Quadratic          | q1-64                   | Quadratic      | USA      | manycore   | Chip        | [129]        |
| Qualcomm           | Cloud AI 100            | Qcomm          | USA      | GPU        | Card        | [130], [131] |
| Qualcomm           | QRB5165                 | RBS            | USA      | GPU        | System      | [132]        |
| Qualcomm           | QRB5165N                | RB6            | USA      | GPU        | System      | [133]        |
| Rebellions         | ATOM Max                | ATOM-Max       | S. Korea | tensor     | Card        | [134]–[136]  |
| SiMa.ai            | SiMa.ai                 | SiMa.ai        | USA      | tensor     | Chip        | [137]        |
| Syntiant           | NDP101                  | Syntiant1      | USA      | manycore   | Chip        | [138], [139] |
| Syntiant           | NDP250                  | Syntiant3      | USA      | manycore   | Chip        | [140]        |
| Tachyum            | Prodigy                 | Tachyum        | USA      | manycore   | Chip        | [141]        |
| Tenstorrent        | Greyskull               | Greyskull      | USA      | manycore   | Card        | [142]        |
| Tenstorrent        | Wormhole n300           | Wormhole       | USA      | manycore   | Card        | [143], [144] |
| Tenstorrent        | Blackhole               | Blackhole      | USA      | manycore   | Card        | [145], [146] |
| Tesla              | Full Self-Driving Comp. | TeslaFSD       | USA      | tensor     | System      | [147], [148] |
| Tesla              | Dojo D1                 | DojoD1         | USA      | manycore   | Chip        | [149], [150] |
| Texas Instruments  | TDA4VM                  | TexInSt        | USA      | manycore   | Chip        | [151]–[153]  |
| Toshiba            | 2015                    | Toshiba        | Japan    | manycore   | System      | [154]        |

For most of the accelerators, their descriptions and commentaries have not changed since last year so please refer to the previous papers of this survey project for descriptions and commentaries. Several new releases and a few departures are included in this update, and they are chronicled next.

- Amazon AWS has published much more information about their Inferentia and Trainium chips, which have been design by their in-house Annapurna Labs Division. These accelerators are multi-core tensor accelerators [22], and we can expect newer generations of these chips in coming years.
- During the past two years, AMD released the MI300 series of GPUs to compete head-to-head with Nvidia's data center GPUs, namely the MI300X, MI325X, MI350X, and MI355X. Each provided ample competition to the Nvidia counterparts [24], with FP6 and FP4 support and enhanced matrix engines.
- Nvidia continues to release a variety of data center GPUs in order to keep their lead in supplying GPUs for both training and inference. In the past two years, Nvidia released the L4 and L40S for cloud graphics and low-power

inference, several variants of the Hopper GPUs (H100 NVL, H200 SMX, and H200 NVL), and two variants of the Broadwell GPU (B100 and B200). The Hopper and Broadwell GPUs are true transformer/generative AI accelerators with larger matrix engines (TensorCores) along with FP6 and FP4 support. To address export restrictions, Nvidia also released detuned versions of their A100, H100, and H200 GPUs with the A800, H800, and H20 GPUs, respectively.

- Intel has been consolidating their AI accelerator efforts. Intel's Habana subsidiary released their Gaudi2 and Gaudi3 tensor accelerators, which have gained some good traction. Given that traction, Intel has cancelled their GPU training offerings (Xe-HPC, codename Ponte Vecchio and future Rialto Bridge GPU), while still offering the Flex line of data center inference GPUs. On the software side, Intel has integrated CPU, GPU, and FPGA programming into their OneAPI software stack.
- Alphabet Google has continued to refine and improve the performance of their cloud data center TPUs with the releases of their TPU5e, TPU5p, TPU6e, and TPU7 [55]. Each improves performance over the previous generation for their own and their clients' workloads.
- Cerebras has released their third generation Wafer Scale Engine (WSE) accelerator, CS-3, with impressive performance both as a single node and networked multi-node training [44].
- Tesla Motors released details about their Dojo1 data center training chip and system, disclosing an impressive design for very large scale training [149], [150].
- Meta/Facebook has also released many more details about the first and second generation Meta Training and Inference Accelerator (MTIA and MTIA2i), which are both focused on inference. They are both comprised of an 8-by-8 network meshed set of processing elements, and each processing element includes two RISC-V cores for computation. One of the two RISC-V cores includes a 64-element RISC-V vector extension [94], [95].
- Speaking of RISC-V cores, Tenstorrent has released their second and third iteration of their RISC-V based accelerators, the Wormhole and Blackhole. The Wormhole accelerator is comprised of 80 Tensix processor, each of which is comprised of five RISC-V cores, for a total of 400 RISC-V cores. The Blackhole processor expands this to 140 Tensix processors, each also with five RISC-V cores, for a total of 700 RISC-V cores. Blackhole also includes 16 higher performance RISC-V cores for on-device hosting and running Linux and another 52 smaller RISC-V cores for memory management, communications, and system management [145].
- Based on take-aways of the DARPA Synapse neuromorphic computing project from a decade ago, which included the development of the IBM TrueNorth accelerator [155]–[157], IBM has been developing and released the NorthPole inference accelerator [73]–[75]. It features 256 cores that can execute 8-bit, 4-bit, and 2-bit digital



Fig. 1: Peak performance vs. power scatter plot of publicly announced AI accelerators and processors.

arithmetic. It also has four on-chip networks to minimize communication hotspots. IBM has also announced that they will be releasing their AI Acceleration Unit (AIU) in 2026 as a PCIe card called Spyre [76], [77].

- arithmetic. It also has four on-chip networks to minimize communication hotspots. IBM has also announced that they will be releasing their AI Acceleration Unit (AIU) in 2026 as a PCIe card called Spyre [76], [77].
  - Japanese company Preferred Networks has released its second generation MN-Core2 chip, of which eight are integrated on each PCIe card [125], [128]. The accelerator is aimed at training in that it supports FP16, FP32, and FP64. Future versions will have an inference-focused version and a training-focused version.
  - China's Huawei released two new versions of the Ascend 910: the 910B and 910C [71], [72]. The original 910 has been labeled the 910A. These new Ascend accelerators have the same core design of the 910A, but they are now fabbed indigenously by SMIC (rather than TSMC in Taiwan due to American export restrictions).
  - China's Cambricon, known mainly for its Kiren smartphone GPUs, has also recently released data center GPUs including the MLU290-M5 and MLU370-X8 [38].
  - Moore Threads, another Chinese GPU company, has emerged with a series of GPUs that can be used for business computers, gaming, and AI inference [97].
  - HyperX Logic (formerly Coherent Logix) previous generation HX40416 accelerator features 416 processing elements in a mesh topology, and each PE can execute 4 multiply-accumulate operations per clock cycle [45]. HyperX Logic has been focused on space applications along with audio and video production with its programmable dataflow architecture, and has added AI inference applications to its application portfolio in recent years.

- The d-Matrix Corsair accelerator features arithmetic in SRAM memory cores and RISC-V control CPUs [46]. Each Corsair chiplet has 256 64-by-64 SRAM-arithmetic cell array cores, and there are four chiplets per Corsair package. A Corsair PCIe card has two Corsair packages, which totals 2048 cell array cores. This accelerator is aimed at small-batch, low-latency data center inference.
  - The FuriosaAI RNGD implements tensor contractions as a computational primitive for AI inference [49]. Each Tensor Unit has 64 slices, and each slice has a tensor engine, vector engine and transpose engine. The accelerator is coupled to HBM memory for high bandwidth memory access for executing inference on very large models.
  - Taiwan startup Neuchips introduced their Raptor N3000 AI accelerator chip, which is featured in their first product, the Evo accelerator PCIe card. The Raptor chip includes 10 matrix engines, two vector engines, and an embedding engine [158].
  - South Korean startup Rebellions released their ATOM Max accelerator for data center inference [135], [136].
  - Finally, Syntiant has expanded its very-low power analog accelerator offerings with their NDP250 chip and at audio processing and wake word detection [140].

Attrition among AI startups and corporate efforts continue to be part of normal business patterns. These accelerators have been removed from the survey because their accelerator(s) are no longer commercially available. Cornami has been removed because the company has moved their focus to embedded computing solutions for homomorphic encryption. AIotive has been removed because their accelerator product line is an



(a)



(b)



(c)



(d)



(e)

Fig. 2: Zoomed regions of peak performance vs. peak power scatter plot: **(a)** very low power, **(b)** embedded, **(c)** autonomous, **(d)** data center chips and cards, **(e)** data center systems.

RTL specification rather than a chip product. Three companies have been acquired or are closing down. The staff of Untether AI has been acquired by AMD, and the company is in the process of winding down [159], while Esperanto is also winding down with almost all of their employees having found other opportunities [160]. Further, AlphaICs does not appear to be in business anymore, so their entry has been removed. Finally, after delivering the impressive Aurora supercomputer at Argonne National Laboratory, Intel cancelled its Xe-HPC (codenamed Ponte Vecchio) and plans for other high end computational GPUs.

We are also anxiously awaiting more details about several accelerators that have been announced, including peak performance and peak power numbers. Among the major American GPU vendors, Nvidia has released the names of the next two generations of GPUs, the Ruben and Feynman GPUs, while AMD has announced the expected release their MI430X and MI450X in the first half of 2026. Similarly, Baidu has announced its third generation P800 Kunlun accelerator [161], while there is discussion that Huawei will be releasing Ascend 920 and 920C accelerators soon [113]. HyperX Logic has released a new space-focused accelerator called Midnight, for which we are hoping to see performance and power numbers. Horizon Robotics, which specializes in automotive inference accelerators has released Journey 5 and Journey 6 accelerators, but peak performance and power numbers are not yet available. Tesla has announced their Full Self-Driving Computer V2 (Gen5), but no details have been published yet. Finally, Q.ANT has released an optical accelerator, which is able to compute entire inference chains in the optical domain. This first version demonstrates the capability and opportunity of computing in the optical domain; their roadmap expects to meet and exceed the computational performance and energy efficiency of CMOS-based accelerators within the next few years.

#### IV. OBSERVATIONS AND TRENDS

In the past two years since the last iteration of this survey, more details have been published in conference papers, journal papers, and technical press articles that have made it possible to more accurately generalize the main categories of architectures that are being used for AI processors/accelerators. These are summarized in Figure 3, and the categories span from highly flexible CPUs on the left to statically deployed FPGAs and ASICs on the left.

While CPUs are very flexible and can execute all applications, they are not optimized to execute the highly parallel computations of inference and training. CPU vector engines are better suited for inference and training than CPUs, but they generally do not have memory coalescing features to pack strided memory accesses into dense vector accesses, which affects their parallel computational efficiency.

At the other end of the categories are FPGAs and ASICs. We do not see ASICs much because of the requirement for most AI applications to support multiple different AI models with the same compute hardware. Similarly, FPGAs are used in embedded applications when AI models have been chosen

for deployment, and some FPGAs even have tensor accelerator blocks included in the available computational blocks.

The bulk of high performance and efficient accelerators are parallel thread accelerators (GPUs), tensor array accelerators, and microcore mesh accelerators because they are designed specifically for high main memory bandwidth, highly parallel computation and highly parallel and complex data movement that are required for highly efficient and highly performing inference and training. Among these, parallel thread accelerators are the most flexibly of these parallel compute engines. They have memory coalescing capability along with parallel compute cores (Symmetric Multiprocessors – SMs - in Nvidia parlance) that are dynamically scheduled with compute kernels. Code redesign and recompilation of kernels is fairly quick. Tensor array accelerators microcore mesh accelerators are more statically scheduled in that one preloads the model parameters and code into the accelerator before executing inference or training on them. To change the model parameters or code, the new one needs to be loaded into the accelerator anew, which involves some latency. Kernel code redesign and recompilation involves both compilation and mapping of the model code and parameters onto the compute elements, which involves an optimization of code and resources thereby taking more time than just compiling the numerical kernel code. But this often results in more efficient execution and higher performance than parallel thread accelerators.

There are several more observations and comments for us to appreciate on Figure 1.

- Int8 continues to be the default numerical precision for embedded, autonomous and data center inference applications, and fp16/bf16 has become the default numerical precision for training. However, some favorable outcomes have come from training in fp8 and even fp4, particularly for generative AI models, which save both computational and data movement energy.
- In our re-evaluation and re-categorization of each of the accelerators in this survey, we were pleasantly surprised by the variety of architectural choices being made to experiment, find, and exploit competitive advantages for using their accelerator in certain applications and for certain models. This was predicted in Nowatzki, et al. [162], and it is encouraging to see it play out in commercial competition.
- In the data center domain, Nvidia continues to dominate the media coverage and sales for AI acceleration. However, AMD, Groq and Cerebras have become significant competitors to Nvidia, while many other commercial offerings are also gaining footholds. And it should be noted that the Groq accelerator is currently being manufactured using a 12-nm process and often outperforming accelerators at smaller circuit feature sizes, which supports the findings in [163]. (Groq has announced a second generation accelerator that will be fabbed at a smaller circuit feature size.) And both Groq and Cerebras accelerators are not GPU architectures.



|                          | CPU Core               | CPU Core Vector Engine | Parallel Thread Accelerator (GPUs) | Tensor Array Accelerator (TPU, Groq, TensorCore) | Microcore Mesh Accelerator (Cerebras, TensorCore) | Computational Block Accelerator (FPGA) | Custom Dataflow Accelerator (ASIC) |
|--------------------------|------------------------|------------------------|------------------------------------|--------------------------------------------------|---------------------------------------------------|----------------------------------------|------------------------------------|
| Technology label         | CPU                    | Vector                 | GPU                                | Tensor                                           | Manycore                                          | FPGA                                   | ASIC                               |
| ALUs per core            | 1-4                    | 8-32                   | 32-64                              | 8x8 to 256x256                                   | 1 to 4                                            | Various                                | App. specific                      |
| Cores per processor      | 4-64                   | 4-64                   | 8-128                              | 1 to 4                                           | 100s to 1M                                        | Various                                | App. specific                      |
| Parallel performance     | Low                    | Medium Low             | Medium High                        | High                                             | High                                              | High                                   | Very High                          |
| Comp. efficiency (Ops/W) | Low                    | Medium Low             | Medium                             | High                                             | High                                              | High                                   | Very High                          |
| Comp. flexibility        | Very High              | Medium Low             | Medium                             | Medium Low                                       | Medium                                            | Low                                    | Very Low                           |
| Computation scheduling   | Dynamic by instruction | Dynamic by instruction | Dynamic by kernel                  | Static by kernel                                 | Static by kernel                                  | Fixed                                  | Fixed                              |
| Code redesign            | Seconds                | Seconds                | Seconds/Minutes                    | Minutes                                          | Minutes/Hours                                     | Hours                                  | Months                             |

Increasing application specificity -> greater parallel performance -> narrower application/computational kernel enablement

Fig. 3: Range of AI accelerator computer architecture categories. Going from left to right, greater optimization of data movement between computations means data travels less distances between computations, and more computations executed in parallel. However, it also means less flexibility in operation types and programmability. CPU = Central Processing Unit; AVX = Advanced Vector eXtensions; SVE = Scalable Vector Extensions; GPU = Graphics Processing Unit; TPU = Tensor Processing Unit; CGRA = Course Grained Reconfigurable Architecture FPGA = Field Programmable Gate Array; ASIC = Application Specific Integrated Circuit

#### A. Non-CMOS Technologies

As part of this survey, we continue to track other technologies that could be used to implement AI accelerators. Among them are memristors, neuromorphic architectures, cryogenic computing, and optical computing. In all of these domains, research and development continues to show opportunity and hope to become competitive with current commercial offerings. The one new development that is worth noting is in the optical computing area. While we still wait for the release of a commercial accelerator from LightMatter, Lightelligence, and LightOn (we are assured that they are coming!), Q.ant has released a commercial optical accelerator, as we mentioned Section III. This is an exciting development, and we will be watching how they and other optical computing vendors compete among computational accelerators.

#### V. SUMMARY

This paper updates the Lincoln AI Computing Survey (LAICS) of AI accelerators that span from extremely low power through embedded and autonomous applications to data center class accelerators for inference and training. We presented the new full scatter plot along with zoomed in scatter plots for each of the major deployment/market segments, and we discussed the new additions for the year. We also included a categorization of AI computing hardware based on much new material that has been published for these AI accelerators.

#### VI. DATA AVAILABILITY

The data spreadsheets and references that have been collected for this study and its papers are posted at <https://github.com/areuther/ai-accelerators> after they have cleared the release review process.

#### ACKNOWLEDGEMENT

We express our gratitude to LaToya Anderson, Masahiro Arakawa, Bill Arcand, Bill Bergeron, David Bestor, Bob Bond, Alex Bonn, Chansup Byun, Vitaliy Gleyzer, Jeff Gottschalk, Michael Houle, Matthew Hubbell, Hayden Jananthan, David Martinez, Lauren Milechin, Sanjeev Mohindra, Paul Monticciolo, Julie Mullen, Andrew Prout, Stephan Rejto, Antonio Rosa, Charles Yee, and Marc Zissman for their support of this work. We are also grateful to Mark Gouker, Bob Atkins, and Livia Racz for the discussions that eventually were captured in the accelerator categorizations.

#### REFERENCES

- [1] S. Williams, A. Waterman, and D. Patterson, "Roofline: An insightful visual performance model for multicore architectures," *Commun. ACM*, vol. 52, pp. 65–76, 4 2009. [Online]. Available: <http://doi.acm.org/10.1145/1498765.1498785>
- [2] A. Reuther, P. Michaleas, M. Jones, V. Gadepally, S. Samsi, and J. Kepner, "Lincoln ai computing survey (laics) update," *2023 IEEE High Performance Extreme Computing Conference, HPEC 2023*, 2023.
- [3] ——, "Ai and ml accelerator survey and trends," in *2022 IEEE High Performance Extreme Computing Conference (HPEC)*. IEEE, 9 2022, pp. 1–10.
- [4] ——, "Ai accelerator survey and trends," in *2021 IEEE High Performance Extreme Computing Conference (HPEC)*, 9 2021, pp. 1–9.
- [5] ——, "Survey of machine learning accelerators," in *2020 IEEE High Performance Extreme Computing Conference (HPEC)*, 2020, pp. 1–12.
- [6] ——, "Survey and benchmarking of machine learning accelerators," in *2019 IEEE High Performance Extreme Computing Conference, HPEC 2019*. Institute of Electrical and Electronics Engineers Inc., 9 2019. [Online]. Available: <https://doi.org/10.1109/HPEC.2019.8916327>

- [7] C. S. Lindsey and T. Lindblad, "Survey of neural network hardware," in *SPIE 2492, Applications and Science of Artificial Neural Networks*, S. K. Rogers and D. W. Ruck, Eds., vol. 2492. International Society for Optics and Photonics, 4 1995, pp. 1194–1205. [Online]. Available: <http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=1001095>
- [8] Y. Liao, "Neural networks in hardware: A survey," Department of Computer Science, University of California, Tech. Rep., 2001. [Online]. Available: <http://citeserx.ist.psu.edu/viewdoc/summary?doi=10.1.1.460.3235>
- [9] J. Misra and I. Saha, "Artificial neural networks in hardware: A survey of two decades of progress," *Neurocomputing*, vol. 74, pp. 239–255, 12 2010. [Online]. Available: [https://www.sciencedirect.com/science/article/pii/S092523121000216X?casa\\_token=3W4P\\_OnheQ4AAAAA:De5DC960HrSpgh-XJJ0oeiqKyqa0ctWdh9wPv3mOtrEDX1yw-hWEiXQkY1vd97SEuUZ3WOYQ5g](https://www.sciencedirect.com/science/article/pii/S092523121000216X?casa_token=3W4P_OnheQ4AAAAA:De5DC960HrSpgh-XJJ0oeiqKyqa0ctWdh9wPv3mOtrEDX1yw-hWEiXQkY1vd97SEuUZ3WOYQ5g)
- [10] V. Sze, Y. Chen, T. Yang, and J. S. Emer, "Efficient processing of deep neural networks: A tutorial and survey," *Proceedings of the IEEE*, vol. 105, pp. 2295–2329, 12 2017. [Online]. Available: <https://doi.org/10.1109/JPROC.2017.2761740>
- [11] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, *Efficient Processing of Deep Neural Networks*. Morgan and Claypool Publishers, 2020. [Online]. Available: <https://doi.org/10.2200/S01004ED1V01Y202004CAC050>
- [12] H. F. Langroudi, T. Pandit, M. Indovina, and D. Kudithipudi, "Digital neuromorphic chips for deep learning inference: A comprehensive study," in *Applications of Machine Learning*, M. E. Zelinski, T. M. Taha, J. Howe, A. A. Awwal, and K. M. Iftekharuddin, Eds. SPIE, 9 2019, p. 9. [Online]. Available: <https://doi.org/10.1117/12.2529407>
- [13] Y. Chen, Y. Xie, L. Song, F. Chen, and T. Tang, "A survey of accelerator architectures for deep neural networks," *Engineering*, vol. 6, pp. 264–274, 3 2020. [Online]. Available: <https://doi.org/10.1016/j.eng.2020.01.007>
- [14] E. Wang, J. J. Davis, R. Zhao, H.-C. C. Ng, X. Niu, W. Luk, P. Y. K. Cheung, and G. A. Constantinides, "Deep neural network approximation for custom hardware," *ACM Computing Surveys*, vol. 52, pp. 1–39, 5 2019. [Online]. Available: <https://dl.acm.org/doi/10.1145/3309551>
- [15] S. Khan and A. Mann, "Ai chips: What they are and why they matter," Georgetown Center for Security and Emerging Technology, Tech. Rep., 4 2020. [Online]. Available: <https://cset.georgetown.edu/research/ai-chips-what-they-are-and-why-they-matter/>
- [16] U. Rueckert, *Digital Neural Network Accelerators*. Springer, Cham, 2020, pp. 181–202. [Online]. Available: [https://link.springer.com/chapter/10.1007%2F978-3-030-18338-7\\_12](https://link.springer.com/chapter/10.1007%2F978-3-030-18338-7_12)
- [17] T. Rogers and M. Khairy, "An academic's attempt to clear the fog of the machine learning accelerator war — sigarch," 8 2021. [Online]. Available: <https://www.sigarch.org/an-academics-attempt-to-clear-the-fog-of-the-machine-learning-accelerator-war/>
- [18] F. P. Sunny, E. Taheri, M. Nikdast, and S. Pasricha, "A survey on silicon photonics for deep learning," *ACM Journal on Emerging Technologies in Computing Systems*, vol. 17, 10 2021. [Online]. Available: <https://dl.acm.org/doi/10.1145/3459009>
- [19] R. Merritt, "Startup accelerates ai at the sensor," 2 2019. [Online]. Available: <https://www.eetimes.com/startup-accelerates-ai-at-the-sensor/>
- [20] T. Peng, "Alibaba's new ai chip can process nearly 80k images per second," 2019. [Online]. Available: <https://medium.com/syncedreview/alibabas-new-ai-chip-can-process-nearly-80k-images-per-second-63412dec22a3>
- [21] T. P. Morgan, "How aws can undercut nvidia with homegrown ai compute engines," 12 2023. [Online]. Available: <https://www.nextplatform.com/2023/12/04/how-aws-can-undercut-nvidia-with-homegrown-ai-compute-engines/>
- [22] —, "The third time charm of amd's instinct gpu," 6 2023. [Online]. Available: <https://www.nextplatform.com/2023/06/14/the-third-time-charm-of-amds-instinct-gpu/>
- [23] —, "Amd gives nvidia some serious heat in gpu compute," 10 2024. [Online]. Available: <https://www.nextplatform.com/2024/10/10/amd-gives-nvidia-some-serious-heat-in-gpu-compute/>
- [24] P. Alcorn, "Amd announces mi350x and mi355x ai gpus, claims up to 4x generational performance gain, 35x faster inference," 6 2025. [Online]. Available: <https://www.tomshardware.com/pc-components/gpus/amd-announces-mi350x-and-mi355x-ai-gpus-claims-up-to-4x-generational-gain-up-to-35x-faster-inference-performance>
- [25] D. Schor, "Arm ethos is for ubiquitous ai at the edge," 2 2020. [Online]. Available: <https://fuse.wikichip.org/news/3282/arm-ethos-is-for-ubiquitous-ai-at-the-edge/>
- [26] "Aspinity aml100." [Online]. Available: <https://www.aspinity.com/aml100/>
- [27] J. Hertz, "Aspinity takes on tinyml, claiming the industry's first fully analog ml chip," 2 2022. [Online]. Available: <https://www.allaboutcircuits.com/news/aspinity-takes-on-tinyml-claiming-industrys-first-fully-analog-ml-chip/>
- [28] "Aspinity aml200." [Online]. Available: <https://www.aspinity.com/aml200/>
- [29] S. Ward-Foxton, "Axelera demos ai test chip after taping out in four months," 5 2022. [Online]. Available: <https://www.eetimes.com/axelera-demos-ai-test-chip-after-taping-out-in-four-months/>
- [30] J. Ouyang, X. Du, Y. Ma, and J. Liu, "Kunlun: A 14nm high-performance ai processor for diversified workloads," in *2021 IEEE International Solid-State Circuits Conference (ISSCC)*, vol. 64, 2 2021, pp. 50–51.
- [31] R. Merritt, "Baidu accelerator rises in ai," 7 2018. [Online]. Available: <https://www.eetimes.com/baidu-accelerator-rises-in-ai/>
- [32] C. Duckett, "Baidu creates kunlun silicon for ai," 7 2018. [Online]. Available: <https://www.zdnet.com/article/baidu-creates-kunlun-silicon-for-ai/>
- [33] A. Shilov, "Baidu unveils kunlun ii ai chip: Rival for nvidia a100," 8 2021. [Online]. Available: <https://www.tomshardware.com/news/baidu-unveils-kunlun-ii-processor-for-ai>
- [34] O. Peckham, "Chinese startup biren details br100 gpu," 8 2022. [Online]. Available: <https://www.hpcwire.com/2022/08/22/chinese-startup-biren-details-br100-gpu/>
- [35] A. Shilov, "Chinese gpu firm biren plans ipo to better compete against nvidia," 7 2023. [Online]. Available: <https://www.tomshardware.com/news/biren-mulls-ipo>
- [36] —, "Chinese biren's new gpus have 77 billion transistors, 2 pflops of ai performance," 8 2022. [Online]. Available: <https://www.tomshardware.com/news/chinese-biren-rolls-out-new-gpus-with-77-billion-transistors-2-pflops-of-ai-performance>
- [37] M. Demler, "Blaize ignites edge-ai performance," The Linley Group, Tech. Rep., 9 2020. [Online]. Available: <https://www.blaize.com/wp-content/uploads/2020/09/Blaize-Ignites-Edge-AI-Performance.pdf>
- [38] A. Shilov, "China's cambricon posts first profit as demand for this nvidia rival's ai processors explodes," 1 2025. [Online]. Available: <https://www.tomshardware.com/tech-industry/artificial-intelligence/chinas-cambricon-posts-first-profit-as-demand-for-this-nvidia-rivals-ai-processors-explos>
- [39] "Cambricon mlu290-m5," 7 2025. [Online]. Available: <https://www.cambricon.com/index.php?m=content&c=index&a=lists&catid=340>
- [40] "Cambricon mlu370-x8," 7 2025. [Online]. Available: <https://www.cambricon.com/index.php?m=content&c=index&a=lists&catid=406>
- [41] L. Gwennap, "Kendryte embeds ai for surveillance," 3 2019. [Online]. Available: [https://www.linleygroup.com/newsletters/newsletter\\_detail.php?num=5992](https://www.linleygroup.com/newsletters/newsletter_detail.php?num=5992)
- [42] A. Hock, "Introducing the cerebras cs-1, the industry's fastest artificial intelligence computer," 11 2019. [Online]. Available: <https://www.cerebras.net/introducing-the-cerebras-cs-1-the-industrys-fastest-artificial-intelligence-computer/>
- [43] T. Trader, "Cerebras doubles ai performance with second-gen 7nm wafer scale engine," 4 2021. [Online]. Available: <https://www.hpcwire.com/2021/04/20/cerebras-doubles-ai-performance-with-second-gen-7nm-wafer-scale-engine/>
- [44] T. P. Morgan, "Cerebras goes hyperscale with third gen waferscale supercomputers," 3 2024. [Online]. Available: <https://www.nextplatform.com/2024/03/14/cerebras-goes-hyperscale-with-third-gen-waferscale-supercomputers/>
- [45] M. Demler, "Coherent logix configures edge ai," 12 2020. [Online]. Available: <https://www.edge-ai-vision.com/2020/12/coherent-logix-configures-edge-ai/>
- [46] S. Ward-Foxton, "D-matrix targets fast llm inference for 'real world scenarios' - ee times," 1 2025. [Online]. Available: <https://www.eetimes.com/d-matrix-targets-fast-llm-inference-for-real-world-scenarios/>
- [47] P. Clarke, "Globalfoundries aids launch of chinese ai startup," 12 2019. [Online]. Available: <https://www.eenewsanalog.com/news/globalfoundries-aids-launch-chinese-ai-startup>

- [48] C. Lam, “Furiosai’s rngd at hot chips 2024: Accelerating ai with a more flexible primitive,” 9 2024. [Online]. Available: [https://chipsandcheese.com/p/furiosais-rngd-at-hot-chips-2024-accelerating-ai-with-a-more-flexible-primitive?utm\\_source=publication-search](https://chipsandcheese.com/p/furiosais-rngd-at-hot-chips-2024-accelerating-ai-with-a-more-flexible-primitive?utm_source=publication-search)
- [49] H. Kim, Y. Choi, J. Park, B. Bae, H. Jeong, S. M. Lee, J. Yeon, M. Kim, C. Park, B. Gu, C. Lee, J. Bae, S. G. Bae, Y. Cha, W. Choe, J. Choi, J. Ha, H. Han, N. Hwang, S. Hwang, K. Jang, H. Je, H. Jeon, J. Jeon, H. Jeong, Y. Jung, D. Kang, H. Kim, M. Kim, M. Kim, S. Kim, S. Kim, W. Kim, Y. Kim, Y. Kim, Y. Ku, J. K. Lee, J. Lee, K. Lee, S. Lee, M. Noh, H. Oh, G. Park, S. Park, J. Seo, J. Seong, J. Paik, N. P. Lopes, and S. Yoo, “Tcp: A tensor contraction processor for ai workloads industrial product,” *Proceedings - International Symposium on Computer Architecture*, pp. 890–902, 2024.
- [50] “Edge tpu,” 2019. [Online]. Available: <https://cloud.google.com/edge-tpu/>
- [51] N. P. Jouppi, D. H. Yoon, G. Kurian, S. Li, N. Patil, J. Laudon, C. Young, and D. Patterson, “A domain-specific supercomputer for training deep neural networks,” *Commun. ACM*, vol. 63, p. 67–78, 6 2020. [Online]. Available: <https://doi.org/10.1145/3360307>
- [52] P. Teich, “Tearing apart google’s tpu 3.0 ai coprocessor,” 5 2018. [Online]. Available: <https://www.nextplatform.com/2018/05/10/tearing-apart-googles-tpu-3-0-ai-coprocessor/>
- [53] N. P. Jouppi, D. H. Yoon, M. Ashcraft, M. Gottscho, T. B. Jablin, G. Kurian, J. Laudon, S. Li, P. Ma, X. Ma, T. Norrie, N. Patil, S. Prasad, C. Young, Z. Zhou, D. Patterson, and G. Llc, “Ten lessons from three generations shaped google’s tpuv4i,” in *Proc. of 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)*. IEEE Computer Society, 6 2021, pp. 1–14.
- [54] O. Peckham, “Google cloud’s new tpu v4 ml hub packs 9 exaflops of ai,” 5 2022. [Online]. Available: <https://www.hpcwire.com/2022/05/16/google-clouds-new-tpu-v4-ml-hub-packs-9-exaflops-of-ai/>
- [55] T. P. Morgan, “With ‘ironwood’ tpu, google pushes the ai accelerator to the floor,” 4 2025. [Online]. Available: <https://www.nextplatform.com/2025/04/09/with-ironwood-tpu-google-pushes-the-ai-accelerator-to-the-floor/>
- [56] L. Gwennap, “Groq rocks neural networks,” *Microprocessor Report*, Tech. Rep., 1 2020. [Online]. Available: <http://groq.com/wp-content/uploads/2020/04/Groq-Rocks-NNs-Linley-Group-MPR-2020Jan06.pdf>
- [57] D. Lacey, “Preliminary ipu benchmarks,” 10 2017. [Online]. Available: <https://www.graphcore.ai/posts/preliminary-ipu-benchmarks-providing-previously-unseen-performance-for-a-range-of-machine-learning-applications>
- [58] “Dell dss8440 graphcore ipu server,” Graphcore, Tech. Rep., 2 2020. [Online]. Available: [https://www.graphcore.ai/hubfs/Leadgenassets/DSS8440IPUServerWhitePaper\\_2020.pdf](https://www.graphcore.ai/hubfs/Leadgenassets/DSS8440IPUServerWhitePaper_2020.pdf)
- [59] S. Ward-Foxton, “Graphcore takes on nvidia with second-gen ai accelerator,” 7 2020. [Online]. Available: <https://www.eetimes.com/graphcore-takes-on-nvidia-with-second-gen-ai-accelerator/>
- [60] M. Tyson, “Graphcore bow ipu introduces tsmc 3d wafer-on-wafer processor,” 3 2022. [Online]. Available: <https://www.tomshardware.com/news/graphcore-tsmc-bow-ipu-3d-wafer-on-wafer-processor>
- [61] “Gap application processors,” 2020. [Online]. Available: [https://greenwaves-technologies.com/gap8\\_gap9/](https://greenwaves-technologies.com/gap8_gap9/)
- [62] J. Turley, “Gap9 for ml at the edge,” 6 2020. [Online]. Available: <https://www.eejournal.com/article/gap9-for-ml-at-the-edge/>
- [63] N. Hemsoth, “Groq shares recipe for tsp nodes, systems,” 9 2020. [Online]. Available: <https://www.nextplatform.com/2020/09/29/groq-shares-recipe-for-tsp-nodes-systems/>
- [64] D. Abts, J. Ross, J. Sparling, M. Wong-VanHaren, M. Baker, T. Hawkins, A. Bell, J. Thompson, T. Khsai, G. Kimmell, J. Hwang, R. Leslie-Hurd, M. Bye, E. R. Creswick, M. Boyd, M. Venigalla, E. Laforge, J. Purdy, P. Kamath, D. Maheshwari, M. Beidler, G. Rosseel, O. Ahmad, G. Gagarin, R. Czekalski, A. Rane, S. Parmar, J. Werner, J. Sproch, A. Macias, and B. Kurtz, “Think fast: A tensor streaming processor (tsp) for accelerating deep learning workloads,” in *2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)*, 5 2020, pp. 145–158. [Online]. Available: <https://doi.org/10.1109/ISCA45697.2020.00023>
- [65] S. Ward-Foxton, “Gyrfalcon unveils fourth ai accelerator chip — ee times,” 11 2019. [Online]. Available: <https://www.eetimes.com/gyrfalcon-unveils-fourth-ai-accelerator-chip/>
- [66] “Solidrun, gyrfalcon develop arm-based edge optimized ai inference server,” 2 2020. [Online]. Available: <https://www.hpcwire.com/off-the-wire/solidrun-gyrfalcon-develop-edge-optimized-ai-inference-server/>
- [67] S. Ward-Foxton, “Details of halo ai edge accelerator emerge,” 8 2019. [Online]. Available: <https://www.eetimes.com/details-of-haloai-edge-accelerator-emerge/>
- [68] —, “Hailo adds vision processor socs for smart cameras,” 3 2023. [Online]. Available: <https://www.eetimes.com/hailo-adds-vision-processor-socs-for-smart-cameras/>
- [69] “Horizon robotics journey2 automotive ai processor series,” 2020. [Online]. Available: <https://en.horizon.ai/product/journey>
- [70] Huawei, “Ascend 310 ai processor,” 2020. [Online]. Available: <https://e.huawei.com/us/products/cloud-computing-dc/atlas/ascend-310>
- [71] T. P. Morgan, “Huawei’s hisilicon can compete with nvidia gpus in china,” 8 2024. [Online]. Available: <https://www.nextplatform.com/2024/08/13/huaweis-hisilicon-can-compete-with-nvidia-gpus-in-china/>
- [72] A. Shilov, “Huawei reportedly acquired two million ascend 910 ai chips from tsmc last year through shell companies,” 3 2025. [Online]. Available: <https://www.tomshardware.com/tech-industry/artificial-intelligence/huawei-reportedly-acquired-two-million-ascend-910-ai-chips-from-tsmc-last-year-through-shell-companies>
- [73] A. S. Cassidy, J. V. Arthur, F. Akopyan, A. Andreopoulos, R. Appuswamy, P. Datta, M. V. Debole, S. K. Esser, C. O. Otero, J. Sawada, B. Taba, A. Amir, D. Bablani, P. J. Carlson, M. D. Flickner, R. Gandhasri, G. J. Garreau, M. Ito, J. L. Klamo, J. A. Kusnitz, N. J. McClatchey, J. L. McKinstry, Y. Nakamura, T. K. Nayak, W. P. Risk, K. Schleupen, B. Shaw, J. Sivagnanam, D. F. Smith, I. Terrizzano, T. Ueda, and D. Modha, “Ibm northpole: An architecture for neural network inference with a 12nm chip,” in *2024 IEEE International Solid-State Circuits Conference (ISSCC)*, vol. 67, 2 2024, pp. 214–215, read Apr 23, 2025.
- [74] F. Akopyan, W. P. Risk, J. V. Arthur, A. S. Cassidy, M. V. Debole, C. O. Otero, J. Sawada, E. Colgan, M. E. Criscolo, P. V. Mann, H. Baier, K. Schleupen, A. Amir, A. Andreopoulos, R. Appuswamy, D. Bablani, P. J. Carlson, P. Datta, S. K. Esser, M. D. Flickner, R. Gandhasri, G. J. Garreau, M. Ito, J. L. Klamo, J. A. Kusnitz, N. J. McClatchey, N. McGlohon, J. L. McKinstry, Y. Nakamura, T. K. Nayak, J. Sivagnanam, D. F. Smith, R. Sousa, B. Taba, I. Terrizzano, T. Ueda, and D. S. Modha, “Breakthrough edge ai inference performance using northpole in 3u vpx form factor,” in *2024 IEEE High Performance Extreme Computing Conference (HPEC)*, 9 2024, pp. 1–5.
- [75] R. Appuswamy, M. V. Debole, B. Taba, S. K. Esser, A. S. Cassidy, A. Amir, A. Andreopoulos, D. Bablani, P. Datta, J. A. Kusnitz, N. J. McClatchey, N. McGlohon, J. L. McKinstry, T. K. Nayak, D. F. Smith, R. Sousa, I. Terrizzano, F. Akopyan, P. J. Carlson, R. Gandhasri, G. J. Garreau, N. M. Gonzalez, M. Ito, J. L. Klamo, Y. Nakamura, C. O. Otero, W. P. Risk, J. Sawada, K. Schleupen, J. Sivagnanam, M. Stallone, T. Ueda, M. D. Flickner, J. V. Arthur, R. Panda, D. D. Cox, and D. S. Modha, “Breakthrough low-latency, high-energy-efficiency lln inference performance using northpole,” in *2024 IEEE High Performance Extreme Computing Conference (HPEC)*, 9 2024, pp. 1–8.
- [76] T. P. Morgan, “Ibm shows off next-gen ai acceleration, on chip dpu for big iron,” 8 2024. [Online]. Available: <https://www.nextplatform.com/2024/08/27/ibm-shows-off-next-gen-ai-acceleration-on-chip-dpu-for-big-iron/>
- [77] C. Berry, “Ibm telum® ii processor and ibm spyre™ accelerator chip for ai,” in *2024 IEEE Hot Chips 36 Symposium (HCS)*, 2024, pp. 1–29.
- [78] M. S. Abdelfattah, D. Han, A. Bitar, R. DiCecco, S. O’Connell, N. Shanker, J. Chu, I. Prins, J. Fender, A. C. Ling, and G. R. Chiu, “Dla: Compiler and fpga overlay for neural network inference acceleration,” in *2018 28th International Conference on Field Programmable Logic and Applications (FPL)*, 8 2018, pp. 411–4117. [Online]. Available: <https://doi.org/10.1109/FPL.2018.00077>
- [79] N. Hemsoth, “Intel fpga architecture focuses on deep learning inference,” 7 2018. [Online]. Available: <https://www.nextplatform.com/2018/07/31/intel-fpga-architecture-focuses-on-deep-learning-inference/>
- [80] T. P. Morgan, “Different gpu horses for different datacenter courses,” 10 2022. [Online]. Available: <https://www.nextplatform.com/2022/10/04/different-gpu-horses-for-different-datacenter-courses/>
- [81] L. Gwennap, “Habana offers gaudi for ai training,” *Microprocessor Report*, Tech. Rep., 6 2019. [Online]. Available: <http://habana.ai/wp-content/uploads/2019/06/Habana-Offers-Gaudi-for-AI-Training.pdf>

- [82] E. Medina and E. Dagan, "Habana labs purpose-built ai inference and training processor architectures: Scaling ai training systems using standard ethernet with gaudi processor," *IEEE Micro*, vol. 40, pp. 17–24, 3 2020. [Online]. Available: <https://doi.org/10.1109/MM.2020.2975185>
- [83] L. Gwennap, "Habana wins cigar for ai inference," 2 2019. [Online]. Available: <https://www.linleygroup.com/mpr/article.php?id=12103>
- [84] R. Smith, "Intel introduces gaudi 3 ai accelerator: Going bigger and aiming higher in ai market," 4 2024. [Online]. Available: <https://www.anandtech.com/show/21342/intel-introduces-gaudi-3-accelerator-going-bigger-and-aiming-higher>
- [85] T. P. Morgan, "Intel pits new gaudi2 ai training engine against nvidia gpus," 5 2022. [Online]. Available: <https://www.nextplatform.com/2022/05/10/intel-pits-new-gaudi2-ai-training-engine-against-nvidia-gpus/>
- [86] B. D. de Dinechin, "Kalray's mppa® manycore processor: At the heart of intelligent systems," in *17th IEEE International New Circuits and Systems Conference (NEWCAS)*. IEEE, 6 2019. [Online]. Available: <https://www.european-processor-initiative.eu/dissemination-material/1259/>
- [87] P. Clarke, "Nxp, kalray demo coolidge parallel processor in 'bluebox,'" 1 2020. [Online]. Available: <https://www.eenewsanalog.com/news/nxp-kalray-demo-coolidge-parallel-processor-bluebox>
- [88] S. Ward-Foxton, "Kneron attracts strategic investors," 1 2021. [Online]. Available: <https://www.eetimes.com/kneron-attracts-strategic-investors/>
- [89] ———, "Maxim debuts homegrown ai accelerator in latest ulp soc," 11 2020. [Online]. Available: <https://www.eetimes.com/maxim-debuts-homegrown-ai-accelerator-in-latest-ulp-soc/>
- [90] A. Jani, "Maxim showcases efficient custom ai," 2 2021. [Online]. Available: [https://www.linleygroup.com/newsletters/newsletter\\_detail.php?num=6274&year=2021&tag=3](https://www.linleygroup.com/newsletters/newsletter_detail.php?num=6274&year=2021&tag=3)
- [91] M. Clay, C. Greco, M. Shirvaikar, and B. Richey, "Benchmarking the max78000 artificial intelligence microcontroller for deep learning applications," in *Real-Time Image Processing and Deep Learning 2022*, N. Kehtarnavaz and M. F. Carlsohn, Eds., vol. 12102. SPIE, 2022, pp. 47–52. [Online]. Available: <https://doi.org/10.1117/12.2622390>
- [92] S. Leibson, "Adding low-power ai/ml inference to edge devices," 4 2023, overviews the MemryX MX3 dataflow chip. The MX3 is intended to be daisy-chained to add performance to an embedded system. [Online]. Available: <https://www.eetimes.com/adding-low-power-ai-ml-interference-to-edge-devices/>
- [93] T. P. Morgan, "Meta platforms crafts homegrown ai inference chip, ai training next," 5 2023. [Online]. Available: <https://www.nextplatform.com/2023/05/18/meta-platforms-crafts-homegrown-ai-inference-chip-ai-training-next/>
- [94] A. Firoozshahian, J. Shajrawi, J. Fix, J. Coburn, K. Quinn, H. Yu, R. Levenstein, N. Sreedhara, R. Li, R. Nattoji, P. Kansal, K. Gondkar, A. Kamath, W. Wei, J. Montgomery, O. Wu, D. Jayaraman, M. Tsai, G. Grewal, L. Cheng, S. Dwarakapuram, H. Aepala, P. Chopda, S. Desai, B. Jakka, E. Wang, N. Avidan, B. Dreyer, A. Bikumandla, P. Ramani, A. Hutchin, A. K. Sengottuvvel, K. Narayanan, U. Diril, K. Thottempudi, A. Mathews, K. Nair, A. Narasimha, S. Gopal, E. K. Ardestani, B. Dodds, M. Naumov, M. Schatz, C. Gao, V. Rao, Y. Hao, J. Zhang, K. Noru, R. Komuravelli, M. Al-Sanabani, H. Reddy, K. Ho, A. Zehtabioskui, P. Venkatapuram, S. A. Asal, and A. Bjorlin, "Mtia: First generation silicon targeting meta's recommendation systems," *Proceedings - International Symposium on Computer Architecture*, pp. 1120–1132, 6 2023. [Online]. Available: <https://dl.acm.org/doi/pdf/10.1145/3579371.3589348>
- [95] J. Coburn, C. Tang, S. A. Asal, N. Agrawal, R. Chinta, H. Dixit, B. Dodds, S. Dwarakapuram, A. Firoozshahian, C. Gao, K. Gondkar, T. Graf, J. Hu, J. Huang, S. Hughes, A. Hutchin, B. Jakka, G. J. Chen, I. Kalyanaraman, A. Kamath, P. Kansal, E. Kazi, R. Levenstein, M. Maddury, A. Mastro, S. Medaiyese, P. Modi, J. Montgomery, S. Nadathur, A. Nagpal, A. Narasimha, M. Naumov, E. Ozer, J. Park, P. Ramani, H. Reddy, D. Reiss, D. Roy, S. Sekar, A. Sharma, P. Shetty, A. Sukumaran-Rajam, E. Tal, M. Tsai, S. Varshini, R. Wareing, O. Wu, X. Xie, J. Yang, H. Yu, T. Zargar, Z. Zeng, F. Zhang, A. Matthews, X. Jiao, J. Zhang, E. Menage, T. E. Stokke, and M. Sourouri, "Meta's second generation ai chip: Model-chip co-design and productionization experiences," in *Proceedings of the 52nd Annual International Symposium on Computer Architecture*. Association for Computing Machinery (ACM), 6 2025, pp. 1689–1702. [Online]. Available: <https://dl.acm.org/doi/pdf/10.1145/3695053.3731409>
- [96] M. Maddury, P. Kansal, and O. Wu, "Next gen mtia -recommendation inference accelerator," *2024 IEEE Hot Chips 36 Symposium, HCS 2024*, 2024.
- [97] A. Klotz, "New chinese office gpu can double as a budget 1080p gaming gpu — mtt s50 wields 2,048 musa cores, 8gb vram, 85w tgp," 8 2024. [Online]. Available: <https://www.tomshardware.com/pc-components/gpus/new-chinese-office-gpu-can-double-as-a-budget-1080p-gaming-gpu>
- [98] S. Ward-Foxton, "Mythic resizes its ai chip," 6 2021. [Online]. Available: <https://www.eetimes.com/mythic-resizes-its-analog-ai-chip/>
- [99] N. Hemsoth, "A mythic approach to deep learning inference," 8 2018. [Online]. Available: <https://www.nextplatform.com/2018/08/23/a-mythic-approach-to-deep-learning-inference/>
- [100] D. Fick, "Mythic @ hot chips 2018," 8 2018. [Online]. Available: <https://medium.com/mythic-ai/mythic-hot-chips-2018-637dfb9e38b7>
- [101] M. S. Smith, "Ces 2024: Neuchips demos low-power ai upgrade for pcs," 1 2024. [Online]. Available: <https://spectrum.ieee.org/neuchips-low-power-ai>
- [102] K. Freund, "Novumind: An early entrant in ai silicon," *Moor Insights and Strategy*, Tech. Rep., 5 2019. [Online]. Available: <https://moorinsightsstrategy.com/wp-content/uploads/2019/05/NovuMind-An-Early-Entrant-in-AI-Silicon-By-Moor-Insights-And-Strategy.pdf>
- [103] J. Yoshida, "Novumind's ai chip sparks controversy," 10 2018. [Online]. Available: <https://www.eetimes.com/novuminds-ai-chip-sparks-controversy/>
- [104] T. P. Morgan, "Nvidia rounds out "ampere" lineup with two new accelerators," 4 2021. [Online]. Available: <https://www.nextplatform.com/2021/04/15/nvidia-rounds-out-ampere-lineup-with-two-new-accelerators/>
- [105] R. Krashinsky, O. Giroux, S. Jones, N. Stam, and S. Ramaswamy, "Nvidia ampere architecture in-depth," 5 2020. [Online]. Available: <https://devblogs.nvidia.com/nvidia-ampere-architecture-in-depth/>
- [106] A. Shilov, "Nvidia's chinese a800 gpu's performance revealed," 5 2023. [Online]. Available: <https://www.tomshardware.com/news/nvidia-a800-performance-revealed>
- [107] T. P. Morgan, "How nvidia blackwell systems attack 1 trillion parameter ai models," 3 2024. [Online]. Available: <https://www.nextplatform.com/2024/03/19/how-nvidia-blackwell-systems-attack-1-trillion-parameter-ai-models/>
- [108] C. Campa, C. Kawalek, H. Vo, and J. Bessoudo, "Defining ai innovation with nvidia dgx a100," 5 2020. [Online]. Available: <https://devblogs.nvidia.com/defining-ai-innovation-with-dgx-a100/>
- [109] H. Mujtaba, "Nvidia unveils hopper gh100 powered dgx h100, dgx pod h100, h100 pcie accelerators," 3 2022. [Online]. Available: <https://wccftech.com/nvidia-unveils-hopper-gh100-powered-dgx-h100-dgx-pod-h100-h100-pcie-accelerators/>
- [110] "Nvidia hgx platform," 2025. [Online]. Available: <https://www.nvidia.com/en-us/data-center/hgx/>
- [111] T. P. Morgan, "Nvidia's four workhorses of the ai inference revolution," 3 2023. [Online]. Available: <https://www.nextplatform.com/2023/03/21/nvidias-four-workhorses-of-the-ai-inference-revolution/>
- [112] R. Smith, "Nvidia hopper gpu architecture and h100 accelerator announced: Working smarter and harder," 3 2022. [Online]. Available: <https://www.anandtech.com/show/17327/nvidia-hopper-gpu-architecture-and-h100-accelerator-announced>
- [113] T. P. Morgan, "The separate but equal ai realms of china and the us," 4 2025. [Online]. Available: <https://www.nextplatform.com/2025/04/23/the-separate-but-equal-ai-realms-of-china-and-the-us/>
- [114] "H200 tensor core gpu — nvidia," 2025. [Online]. Available: <https://www.nvidia.com/en-us/data-center/h200/>
- [115] R. Smith, "Nvidia gives jetson agx xavier a trim, announces nano-sized jetson xavier nx," 11 2019. [Online]. Available: <https://www.anandtech.com/show/15070/nvidia-gives-jetson-xavier-a-trim-announces-nanosized-jetson-xavier-nx>
- [116] B. Funk, "Nvidia jetson agx orin: The next-gen platform that will power our ai robot overlords unveiled," 3 2022. [Online]. Available: <https://hothardware.com/news/nvidia-jetson-agx-orin>
- [117] "Jetson agx orin for next-gen robotics," 2022. [Online]. Available: <https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/>
- [118] B. Hill, "Nvidia unveils ampere-infused drive agx for autonomous cars, isaac robotics platform with bmw partnership," 5 2022.

- [Online]. Available: <https://hothardware.com/news/nvidia-drive-agx-pegasus-orin-amperex-next-gen-autonomous-cars>
- [119] “Nvidia I40,” 2023. [Online]. Available: <https://www.techpowerup.com/gpu-specs/I40.c3959>
- [120] C. Robinson, “Nvidia I40s gpu for data center visualization launched,” 8 2023. [Online]. Available: <https://www.servethehome.com/nvidia-i40s-gpu-for-data-center-visualization-launched/>
- [121] E. Kilgariff, H. Moreton, N. Stam, and B. Bell, “Nvidia turing architecture in-depth,” 9 2018. [Online]. Available: <https://developer.nvidia.com/blog/nvidia-turing-architecture-in-depth/>
- [122] “Nvidia tesla v100 tensor core gpu,” 2019. [Online]. Available: <https://www.nvidia.com/en-us/data-center/tesla-v100/>
- [123] R. Smith, “16gb nvidia tesla v100 gets reprieve; remains in production,” 5 2018. [Online]. Available: <https://www.anandtech.com/show/12809/16gb-nvidia-tesla-v100-gets-reprieve-remains-in-production>
- [124] J. McGregor, “Perceive exits stealth with super efficient machine learning chip for smarter devices,” 4 2020. [Online]. Available: <https://www.forbes.com/sites/triasresearch/2020/04/06/perceive-exits-stealth-with-super-efficient-machine-learning-chip-for-smarter-devices/>
- [125] “Mn-core,” 2020. [Online]. Available: <https://projects.preferred.jp/mn-core/en/>
- [126] I. Cutress, “Preferred networks: A 500 w custom pcie card using 3000 mm<sup>2</sup> silicon,” 12 2019. [Online]. Available: <https://www.anandtech.com/show/15177/preferred-networks-a-500-w-custom-pcie-card-using-3000-mm2-silicon>
- [127] K. Namura, J. M. Kuhn, T. Adachi, H. Imachi, H. Kaneko, T. Kato, G. Watanabe, N. Tanaka, S. Kashihara, H. Miyashita, Y. Tomonaga, R. Okuta, T. Akiba, B. Vogel, S. Kitajo, F. Osawa, K. Takahashi, Y. Takatsukasa, K. Mizumaru, T. Yamauchi, J. Ono, A. Takahashi, T. Ahmed, Y. Doi, K. Hiraki, and J. Makino, “Mn-core-a highly efficient and scalable approach to deep learning,” *IEEE Symposium on VLSI Circuits, Digest of Technical Papers*, vol. 2021-June, 6 2021.
- [128] J. Makino, “Mn-core 2: Second-generation processor of mn-core architecture for ai and general-purpose hpc application,” *2024 IEEE Hot Chips 36 Symposium, HCS 2024*, 8 2024.
- [129] D. Firu, “Quadric edge supercomputer,” Quadric, Tech. Rep., 4 2019. [Online]. Available: <https://quadric.io/supercomputing.pdf>
- [130] S. Ward-Foxton, “Qualcomm cloud ai 100 promises impressive performance per watt for near-edge ai,” 9 2020. [Online]. Available: <https://www.eetimes.com/qualcomm-cloud-ai-100-promises-impressive-performance-per-watt-for-near-edge-ai/>
- [131] D. McGrath, “Qualcomm targets ai inferencing in the cloud,” 4 2019. [Online]. Available: <https://www.eetimes.com/qualcomm-targets-ai-inferencing-in-the-cloud/#>
- [132] S. Crowe, “Qualcomm robotics rb5 platform puts 5g, ai in developers’ hands,” 6 2020. [Online]. Available: <https://www.therobotreport.com/qualcomm-robotics-rb5-platform-puts-5g-ai-in-developers-hands/>
- [133] “Qualcomm robotics rb6 platform,” 2023. [Online]. Available: <https://www.qualcomm.com/products/internet-of-things/industrial/industrial-automation/robotics-rb6-platform>
- [134] G. Cozma, “Rebellions: From high frequency trading to ai acceleration,” 12 2024. [Online]. Available: <https://chipsandcheese.com/p/rebellions-from-high-frequency-trading>
- [135] S. Ward-Foxton, “Rebellions builds chiplet roadmap, merges with sapeon,” 10 2024. [Online]. Available: <https://www.eetimes.com/rebellions-builds-chiplet-roadmap-merges-with-sapeon/>
- [136] Y. Hong and D. Kim, “Performance and efficiency gains of npu-based servers over gpus for ai model inference,” *Systems*, vol. 13, 2025. [Online]. Available: <https://www.mdpi.com/2079-8954/13/9/797>
- [137] L. Gwennap, “Machine learning moves to the edge,” Microprocessor Report, Tech. Rep., 4 2020. [Online]. Available: <https://www.linleygroup.com/uploads/sima-machine-learning-moves-to-the-edge-wp.pdf>
- [138] D. McGrath, “Tech heavyweights back ai chip startup,” 10 2018. [Online]. Available: <https://www.eetimes.com/tech-heavyweights-back-ai-chip-startup/>
- [139] R. Merritt, “Startup rolls ai chips for audio,” 2 2018. [Online]. Available: <https://www.eetimes.com/startup-rolls-ai-chips-for-audio/>
- [140] S. Ward-Foxton, “Syntiant pitches latest low-power ai chip as llm companion,” 4 2024. [Online]. Available: <https://www.eetimes.com/syntiant-pitches-latest-low-power-ai-chip-as-llm-companion/>
- [141] A. Shilov, “Tachyum teases 128-core cpu: 5.7 ghz, 950w, 16 ddr5 channels,” 6 2022. [Online]. Available: <https://www.tomshardware.com/news/tachyum-teases-128-core-cpu-57-ghz-950w-16-ddr5-channels>
- [142] L. Gwennap, “Tenstorrent scales ai performance: Architecture leads in data-center power efficiency,” Microprocessor Report, Tech. Rep., 4 2020. [Online]. Available: <https://www.tenstorrent.com/wp-content/uploads/2020/04/Tenstorrent-Scales-AI-Performance.pdf>
- [143] D. Ignjatovic, D. W. Bailey, and L. Bajic, “The wormhole ai training processor,” *Digest of Technical Papers - IEEE International Solid-State Circuits Conference*, vol. 2022-February, pp. 356–358, 2 2022.
- [144] A. Shilov, “Tenstorrent launches wormhole ai processors: 466 fp8 tflops at 300w,” 7 2024. [Online]. Available: <https://www.anandtech.com/show/21482/tenstorrent-launches-wormhole-ai-processors-466-fp8-tflops-at-300w>
- [145] T. Mann, “Tenstorrent details its risc-v packed blackhole chips.” 8 2024. [Online]. Available: [https://www.theregister.com/2024/08/27/tenstorrent\\_ai\\_blackhole/](https://www.theregister.com/2024/08/27/tenstorrent_ai_blackhole/)
- [146] “Blackhole,” 7 2025. [Online]. Available: <https://tenstorrent.com/en/hardware/blackhole>
- [147] E. Talpes, D. D. Sarma, G. Venkataramanan, P. Bannon, B. McGee, B. Floering, A. Jalote, C. Hsiong, S. Arora, A. Gorti, and G. S. Sachdev, “Compute solution for tesla’s full self-driving computer,” *IEEE Micro*, vol. 40, pp. 25–35, 3 2020. [Online]. Available: <https://doi.org/10.1109/MM.2020.2975764>
- [148] “Fsd chip - tesla,” 2020. [Online]. Available: [https://en.wikichip.org/wiki/tesla\\_\(car\\_company\)/fsd\\_chip](https://en.wikichip.org/wiki/tesla_(car_company)/fsd_chip)
- [149] E. Talpes, D. D. Sarma, D. Williams, S. Arora, T. Kunjan, B. Floering, A. Jalote, C. Hsiong, C. Poorna, V. Samant, J. Sicilia, A. K. Nivarti, R. Ramachandran, T. Fischer, B. Herzberg, B. McGee, G. Venkataramanan, and P. Banon, “The microarchitecture of dojo, tesla’s exa-scale computer,” *IEEE Micro*, vol. 43, pp. 31–39, 5 2023.
- [150] T. P. Morgan, “Inside tesla’s innovative and homegrown “dojo” ai supercomputer,” 8 2022. [Online]. Available: <https://www.nextplatform.com/2022/08/23/inside-teslas-innovative-and-homegrown-dojo-ai-supercomputer/>
- [151] S. Ward-Foxton, “Ti’s first automotive soc with an ai accelerator launches,” 2 2021. [Online]. Available: <https://www.eetimes.com/tis-first-automotive-soc-with-an-ai-accelerator-launches/>
- [152] “Tda4vm jacinto processors foradas and autonomous vehicles,” Texas Instruments, Tech. Rep., 3 2021. [Online]. Available: <https://www.ti.com/lit/gpn/tda4vm>
- [153] M. Demler, “Ti jacinto accelerates level 3 adas,” 3 2020. [Online]. Available: [https://www.linleygroup.com/newsletters/newsletter\\_detail.php?num=6130&year=2020&tag=3](https://www.linleygroup.com/newsletters/newsletter_detail.php?num=6130&year=2020&tag=3)
- [154] R. Merritt, “Samsung, toshiba detail ai chips,” 2 2019. [Online]. Available: <https://www.eetimes.com/samsung-toshiba-detail-ai-chips/>
- [155] M. Feldman, “Ibm finds killer app for truenorth neuromorphic chip,” 9 2016. [Online]. Available: <https://www.top500.org/news/ibm-finds-killer-app-for-truenorth-neuromorphic-chip/>
- [156] S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, C. di Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner, and D. S. Modha, “Convolutional networks for fast, energy-efficient neuromorphic computing,” *Proceedings of the National Academy of Sciences of the United States of America*, vol. 113, pp. 11 441–11 446, 10 2016. [Online]. Available: <https://doi.org/10.1073/pnas.1604850113>
- [157] F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jackson, and D. S. Modha, “Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,” *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 34, pp. 1537–1557, 10 2015. [Online]. Available: <https://doi.org/10.1109/TCAD.2015.2474396>
- [158] “Raptor-n3000,” 2025. [Online]. Available: <https://www.neuchips.ai/raptor-n3000>
- [159] S. Ward-Foxton, “Untether ai shuts down, engineering team joins amd,” 6 2025. [Online]. Available: <https://www.eetimes.com/untether-ai-shuts-down-engineering-team-joins-amd/>
- [160] —, “Ai startup esperanto winds down silicon business,” 7 2025. [Online]. Available: <https://www.eetimes.com/ai-startup-esperanto-winds-down-silicon-business/>
- [161] C. Pan and B. Goh, “China’s baidu says its kunlun chip cluster can train deepseek-like models — reuters,” 4 2025. [Online]. Available: <https://www.reuters.com/world/china/chinas-baidu-says-its-kunlun-chip-cluster-can-train-deepseek-like-models-2025-04-25/>

- [162] T. Nowatzki, V. Gangadhar, K. Sankaralingam, and G. Wright, "Domain specialization is generally unnecessary for accelerators," *IEEE Micro*, vol. 37, pp. 40–50, 6 2017.
- [163] M. Davies and K. Sankaralingam, "Defying moore: Envisioning the economics of a semiconductor revolution through 12nm specialization," *Communications of the ACM*, vol. 68, pp. 108–119, 7 2025. [Online]. Available: <https://dl.acm.org/doi/pdf/10.1145/3711920>