

JLR

# THE CHIPLET CHALLENGE

INTER-IIT TECH  
MEET 12.0 PS  
TEAM 15



Parts 1.1, 1.2, 2.1, 2.2.b, 2.3 of the PS posed  
major questions to be answered

With a background of over 15 years  
experience & Know How, Sendinblue team  
strives for excellence while ensuring long  
term partnerships to achieve success

01

**Why Chiplets? And Where?**

02

**Microarchitectures**

03

**Interconnects**

04

**Safety & Security**

05

**Thermal Management**

With a background of over 15 years experience & Know How, Sendinblue team strives for excellence while ensuring long term partnerships to achieve success

With a background of over 15 years experience & Know How, Sendinblue team strives for excellence while ensuring long term partnerships to achieve success

# 01 Why Chiplets? And Where?

- 1. 1 Adv. of Chiplets over SoCs**
- 1.2 Considered Domains
- 1.3 Narrowed Scope

# 01 Why Chiplets? And Where?

## 1. 1 Adv. of Chiplets over SoCs

- Process Node Size Optimization
- Modularity & Upgradability
- Overcoming the Reticle Limit

**Cost Efficiency** 

**Processing Power** 

**Manufacturing Flexibility** 



With a background of over 15 years experience & Know How, Sendinblue team strives for excellence while ensuring long term partnerships to achieve success

With a background of over 15 years experience & Know How, Sendinblue team strives for excellence while ensuring long term partnerships to achieve success

# 01 Why Chiplets? And Where?

- 1. 1 Adv. of Chiplets over SoCs**
- 1.2 Considered Domains**
- 1.3 Narrowed Scope**

# 01 Why Chiplets? And Where?

## 1.2 Considered Domains

- BMS (Battery Management System)
- Motor Controller
- ADAS (Advanced Driver Assistance System)
- Infotainment System

Domains where the advantages of chiplets *might* add the most value

# 01 Why Chiplets? And Where?

## 1.3 Narrowed Scope

- ADAS (Advanced Driver Assistance System)
- Infotainment System

**Increasing Feature Adoption Rates**

Upgradability & Modularity

**Diverse Functional Blocks**

Process Node Size Optimization

Parts 1.1, 1.2, 2.1, 2.2.b, 2.3 of the PS posed  
major questions to be answered

With a background of over 15 years  
experience & Know How, Sendinblue team  
strives for excellence while ensuring long  
term partnerships to achieve success

01

**Why Chiplets? And Where?**

02

**Microarchitectures**

03

**Interconnects**

04

**Safety & Security**

05

**Thermal Management**

With a background of over 15 years experience & Know How, Sendinblue team strives for excellence while ensuring long term partnerships to achieve success

With a background of over 15 years experience & Know How, Sendinblue team strives for excellence while ensuring long term partnerships to achieve success

## 02 Microarchitectures

### 2.1 Identifying Functional Blocks

- CPU, GPU, NPU, Accelerators
- I/O Blocks

### 2.2 ...

# 02 Microarchitectures

## 2.1 Identifying Functional Blocks I/O Blocks

- **Type of Sensors / Sensor ECUs (Communication Protocols)**
  - CAN (Controller Area Network)
  - LIN (Local Interconnect Network)
  - Ethernet
- **Type Of In-Vehicle Architecture**
  - Domain Architectures
  - Zonal Architectures



# 02 Microarchitectures

## 2.1 Identifying Functional Blocks I/O Blocks

### Transition from Domain to Zonal

- Reduces Harnessing Complexity
- Vehicle Cost & Weight
- Software Defined Vehicles
- Central Processing



PHYSICAL RESTRUCTURE | ZONES

# 02 Microarchitectures

## 2.1 Identifying Functional Blocks I/O Blocks

### Domain V/s Zonal

- Latency, Gateways & Switches
- Backbone Communication
- Automotive Ethernet



Domain



Zonal

#### COMMUNICATION IN ZONAL NETWORK

##### Data Flows

- ① Edge-to-Edge
- ② Edge-to-Centre
- ③ Within-Zone



##### Communication interfaces in-zone

- CAN (FD), typ <8
- Ethernet
- (optional) FlexRay, CAN XL
- LIN considered an application interface (i.e. contained in zone)

# 02 Microarchitectures

## 2.1 Identifying Functional Blocks I/O Blocks

### Challenges of implementing Zonal Architectures

- Safety / Security concerns  
(Latency, Bandwidth, etc.)
- All-Ethernet is a big step and is costlier

# 02 Microarchitectures

## 2.1 Identifying Functional Blocks I/O Blocks

### Proposed In-Vehicle Architectures

- **ADAS (Domain)**
  - Safety Criticality
  - High-bandwidth & Low-latency
  - Requirement of raw sensor data
- **Infotainment (Zonal)**  
(Head Unit + Heads-Up Display + Instrument Cluster)
  - Large no. of sensors and ECU modules
  - Need for streamlined communication
  - Edge-processing capabilities

Parts 1.1, 1.2, 2.1, 2.2.b, 2.3 of the PS posed  
major questions to be answered

With a background of over 15 years  
experience & Know How, Sendinblue team  
strives for excellence while ensuring long  
term partnerships to achieve success

01

**Why Chiplets? And Where?**

02

**Microarchitectures**

03

**Interconnects**

04

**Safety & Security**

05

**Thermal Management**

# I/O Chiplets

## ADAS

Cameras - CSI2  
Lidar - Ethernet

## Infotainment

Cameras - CSI2  
Audio - Ethernet  
GPS - RS232  
USB and General Purpose I/O



# General Compute Tile

- Operating system does task scheduling in General Compute Tile.
- Works as task scheduler and assigns workloads to domain specific accelerators.
- Allocates memory for use in computation and launches transfer of input data into caches of domain specific accelerators (DSA).
- Launches computation kernel.



# Drawbacks of using General Compute Tiles in domain specific applications

- Not ideal for parallel processing.
- Degraded speed.

# Graphics Processing Tile

- Consists of thousand's of cores and multiple threads to parallelise data processing.
- Ideal for Image and video processing resulting in higher throughput.



# Neural Acceleration Tile

- Specially built to handle neural network algorithms.
- Runs on the principle of synaptic weights (strength of connection between two neurons).
- Simulates neurons in circuit layer, realises storage and computation integration by synaptic weights.



# HBM (High Bandwidth Memory)

- Common memory to store operating system and programs for algorithms.
- High bandwidth and low power consumption.
- Vertically stacked memory chiplets for shorter information commute.



# Hardware Accelerators

## ISP (Image Signal Processing)

- Specialised hardware for demosaicing, noise filters and other image filters.

## DSP (Digital Signal Processing)

- Specialised hardware to perform operations like filtering, Fourier transform, codec encoding and other DSP algorithms.

## Video Encoder Decoder

- Specialised hardware encoding and decoding videos.

## Vision Accelerator

- Specialised hardware for computer vision algorithms.

# ADAS chiplet based microarchitecture

## Cameras

- Data from cameras reach CSI2 I/O.
- Data is sent to ISP for preprocessing.
- Next data is sent to General Compute Tile which directs other compute tiles for further processes.
- General Compute Tile directs Graphics Processing Tile and Neural Acceleration Tile to perform YOLO algorithm.



# ADAS chiplet based microarchitecture

## Cameras

- The weights and biases for neural networks that are stored in HBMs are sent to caches of Neural Acceleration Tile for YOLO algorithm.

- The output data is sent for sensor fusion.



# ADAS chiplet based microarchitecture

## Lidar

- Data from Lidar sensors reach Ethernet I/O.
- Lidar preprocessing is done on software to make it robust from environmental parameters.
- General Compute Tile sends data to Graphics Processing Tile for preprocessing.
- Data is then sent to Graphics Processing Tiles and Neural Acceleration Tiles for algorithms.



# Infotainment chiplet based microarchitecture

## Audio

- Audio signal reaches Ethernet I/O via Ethernet AVB.
- Audio reaches General Compute Tile.
- Audio is sent from here to DSP, Graphics Processing Tile and Neural Acceleration Tile for preprocessing and NLP.



# Infotainment chiplet based microarchitecture

## GPS

- GPS signals from a satellite receiver are wirelessly received through the RS-232 I/O.
- The data is sent to General Compute Tile where necessary decisions are made.



# Throughput

## Node size

- Decrease in nm technology has scope for decrease in critical path delay of the circuit.
- Selective node sizes for each chiplet for better throughput without compromising total cost.
- Tested RISC module with Synopsys Design Compiler for 90nm and 65nm node sizes.
- Critical path decreased from 43.46ns to 9.16ns for clock period of 62.5ns.

|                                                         |       |          |
|---------------------------------------------------------|-------|----------|
| data arrival time                                       | 43.46 |          |
| clock CLOCK (rise edge)                                 | 62.50 | 62.50    |
| clock network delay (ideal)                             | 0.00  | 62.50    |
| clock uncertainty                                       | 0.00  | 62.50    |
| msrv32_csr_file_0/MC/minstret_out_reg[63]/CK (QDFHFLX1) | 0.00  | 62.50 r  |
| library setup time                                      | -0.63 | 61.87    |
| data required time                                      |       | 61.87    |
|                                                         |       | -----    |
| data required time                                      |       | 61.87    |
| data arrival time                                       |       | → -43.46 |
|                                                         |       | -----    |
| slack (MET)                                             | 18.40 |          |

|                                                        |       |         |
|--------------------------------------------------------|-------|---------|
| data arrival time                                      | 0.00  | 9.16 †  |
| clock CLOCK (rise edge)                                | 62.50 | 62.50   |
| clock network delay (ideal)                            | 0.00  | 62.50   |
| clock uncertainty                                      | 0.00  | 62.50   |
| msrv32_csr_file_0/MC/minstret_out_reg[63]/CK (DFQM2RA) | 0.00  | 62.50 r |
| library setup time                                     | -0.01 | 62.49   |
| data required time                                     |       | 62.49   |
|                                                        |       | -----   |
| data required time                                     |       | 62.49   |
| data arrival time                                      |       | → -9.16 |
|                                                        |       | -----   |
| slack (MET)                                            | 53.33 |         |

# Throughput

## Topology

- In order to optimize throughput we have to operate D2D links at highest frequency possible.
- Interconnects' length must be limited to achieve acceptable error rates during high frequency transmission.
- This can be done by only connecting adjacent chiplets.
- However, with such restricted conditions, the shape and arrangement of chiplets has a significant impact on the performance of the Inter-Chiplet Interconnect Network.

# Throughput

## Topology

- In a regular mesh arrangement, each non-border chiplet is connected to 4 other chiplets.
- In HexaMesh, the arrangement is optimised by arranging chiplets in a circle around the central chiplet in a honeycomb pattern where each non-border chiplet is connected to 6 other chiplets.



HexaMesh Architecture

Mesh Architecture

# HexaMesh vs. Mesh

## RapidChiplet Simulation Results



# Obstacle Detection

## Why obstacle detection?

- Fundamental to **safety** in automotive systems
- Forms the backbone of **Collision avoidance** and **Adaptive cruise control** systems
- At the Heart of Autonomous Vehicles



Algorithm?? YOLO

# Functional Flow Block Diagram



A RISC V Based hardware accelerator for running the YOLO algorithm

# Camera Sensor

- Consists of CMOS image sensors arranged in a grid.
- Arrangement of sensors gives a **BAYER** pattern
- A **Mosaiced** image is produced for further processing.



## ISP

- **Demosaicing** using Bi-linear interpolation
- Denoising Filters



# Compute Unit and Neural accelerator

- A RISC Based processor functions as the main Compute
- Navigates the data flow between the neural accelerators and memory
- RISC V triggers the YOLO execution.
- The dedicated hardware units are optimized for convolution and pooling

## Simulink Results



# Scope and Benefits of using Chiplets

- YOLO network is too extensive to be implemented on embedded CPU's. Dedicated **neural accelerators** significantly reduce **latency**.
- Chiplets offer **modularity** and facilitate **scalability**
  - Accuracy of the algorithm can be tuned just by **changing** the **weights** and **bias** instead of replacing the entire chip
  - Changes in the Framework over which YOLO is trained can be easily handled by the processor, Eg Darknet19 -> 53
  - Even if in future an algorithm requires more computation or a different neural accelerator then we can change the specific chiplet instead of changing the entire chip.
- Additionally we can use India's own Shakti Micro-processor, as it is aimed at HPC and parallel processing.

Parts 1.1, 1.2, 2.1, 2.2.b, 2.3 of the PS posed  
major questions to be answered

With a background of over 15 years  
experience & Know How, Sendinblue team  
strives for excellence while ensuring long  
term partnerships to achieve success

01

**Why Chiplets? And Where?**

02

**Microarchitectures**

03

**Interconnects**

04

**Safety & Security**

05

**Thermal Management**

# Interconnects

- In chiplet-based designs, inter chiplet communication is done using Die-to-Die (D2D) interface.
- D2D interfaces generally consist of the **physical layer (PHY)** and the **controller**.
- PHY handles the electrical signaling while the controller handles the protocol and the communication between the subsystems of a die.
- The correct choice of interconnect is crucial to ensure efficient data transfer, reliability and interoperability.



# Factors to be Considered

- **Performance:** *Throughput, latency* and *power efficiency* must be optimized but while keeping the trade-offs (*higher cost, shorter reach*) in mind.
- **Protocol:** Choice of the protocol (among *PCIe, CXL, CCIX, AMBA CHI*) used for interconnection is also critical to ensure seamless communication among the chiplets. Standard protocols should be a preference for heterogeneous integration. The interconnect chosen should be able to implement the selected protocol.
- **Packaging:** The Packaging technology being used limits the choice of interconnects.

# Standard vs Advanced Packaging

- Advanced packaging generally offers higher bandwidth and power efficiency but at a higher manufacturing cost and shorter reach



Standard (2D) Packaging



Advanced (2.5D) Packaging

# Factors to be Considered

- **Market Viability:** The chosen interconnect should be future proof and when *interoperability* is vital, it must also be a widely accepted standard in the market.
  - Proprietary interconnects like AMD's Infinity Fabric, NVidia's NVLink C2C and TSMC's LIPINCON are cannot be used as they are designed for homogeneous integration and **open standards** are necessary for interoperability.
  - Custom proprietary interconnects may also be designed for specific in-house applications where interoperability is not a necessity, like *splitting* a large die which is hard to fabricate into multiple smaller homogeneous chiplets, or *scaling*.

# Comparision of Open Standard PHYs

- High Bandwidth Memory (HBM) only supports 2.5D Packaging and XSR only supports 2D Packaging.

| PHY                    | HBM3 | OpenHBI | BoW<br>(Advanced) | UCle<br>(Advanced) | XSR<br>(Serial) | BoW<br>(Laminate) | UCle<br>(Standard) |
|------------------------|------|---------|-------------------|--------------------|-----------------|-------------------|--------------------|
| Packaging              | 2.5D | 2.5D    | 2.5D              | 2.5D               | 2D              | 2D                | 2D                 |
| Bandwidth<br>(Tbps/mm) | 1.50 | 2.30    | 5.12              | 5.00               | 1.75            | 1.03              | 1.8                |
| Latency<br>(ns)        | 4    | 4       | 2                 | 2                  | 15              | 2                 | 2                  |
| D2D Reach<br>(mm)      | 2    | 2       | 5                 | 2                  | 50              | 25                | 25                 |

# UCle 1.1

- Backed by industry leading companies
- Outstanding metrics
- Evolution of Intel's AIB standard
- Automotive enhancements like *runtime health monitoring & field repairability*
- Plans to support 3D packaging in the future
- Aims for a plug-and-play level of interoperability.
- PCIe and CXL Protocols natively mapped and can any other protocol and to *multiplex* multiple protocols in the same package.
- Suitable as a general interconnect for compute, accelerator and I/O chiplets.

|            | UCle<br>(Standard) | UCle<br>(Advanced) |
|------------|--------------------|--------------------|
| Packaging  | 2D                 | 2.5D               |
| Latency    | 2 ns               | 2 ns               |
| Throughput | 1.8 Tbps/mm        | 5 Tbps/mm          |
| Power      | 0.5 pJ/bit         | 0.25 pJ/bit        |



# Bunch of Wires (BoW)

- Contributions from companies like *Keysight*, *IBM* and *Blue Cheetah*
- Defines only the physical layer. Higher level protocols are not defined
- Suitable for applications needing lower level control (e.g. analog to digital)

# High Bandwidth Memory (HBM)

- Already in-use by high end GPUs and AI accelerators by *Intel* and *AMD*
- Very high performance interconnect, the latest version, **HBM3E** promises a bandwidth of over 1.2 Tb/s
- Suitable for interconnection of digital chiplets to memory chiplets

# eXtra Short Reach (XSR) SerDes

- Among the only open standards for serial chiplet interconnects.
- Provides up to **112G** data rate per lane with a reach of **50 mm** on laminate based (2D) packaging
- Suitable for networking and optical applications

# Interconnects In Our Architectures

- **IC1 - PCIe over UCIe:** The most widely supported standard for peripherals and *interoperability* is crucial to get i/o off the shelf.
- **IC2 - CCIX over UCIe:** CCIX allows for a *symmetric & coherent* interface which is greatly scalable.
- **IC3 - JEDEC's HBM3E:** High-bandwidth interface for on-chip memory with pre-established IPs for easy development.
- **IC4 - BoW or UCIe:** chiplet specific protocols are used to ensure optimal operation on a per-chiplet basis.



# Photonic Interconnects

## Memory to Processing Unit Communication

- Works using wavelength division multiplexing along with an optical tunable splitter
- Requires :
  - 2.5D Packaging
  - Silicon photonic devices (modulator, filter and photodetector)
  - Waveguide
- Operation under vibrational conditions needs to be taken care of for automotive applications.



(a) off-resonance state



(b) on-resonance state

# Photonic Interconnects

**Offers high-bandwidth and high-reach 2.5D interconnection**

- It has lesser propagation delay (11.4 ps/mm) compared to electrical connections (131 ps/mm).
- Data can be sent to more compute tiles simultaneously.
- 2 benefits:
  - Shorter bus size (waveguide) with more data transmission.
  - More number of cores covered in one-hop (number of cache transmission clocks decreased).

Parts 1.1, 1.2, 2.1, 2.2.b, 2.3 of the PS posed  
major questions to be answered

With a background of over 15 years  
experience & Know How, Sendinblue team  
strives for excellence while ensuring long  
term partnerships to achieve success

01

**Why Chiplets? And Where?**

02

**Microarchitectures**

03

**Interconnects**

04

**Safety & Security**

05

**Thermal Management**

# Functional Safety

## ISO 26262 and ASILs

International Safety standard for automatic safety in automobiles.

## Functional Safety Islands

Rising electronics in automobiles require dedicated on-board safety islands.



# Off-the-shelf IPs



Synopsys ARC EM22FS Safety Processor



Cast Inc. EMSA5-FS Functional Safety Embedded RISC-V Processor

# Hardware Security

## Lightweight Encryption - ASCON

- AEAD: AES-G/CM
- Hash: SHA-256

## Off-the-shelf IPs

- Rambus ASCON-IP-41



ASCON-IP-41 Block Diagram

# Ethernet Security

## Why Ethernet Security?

- Man-in-the-middle Attacks
- Sensor data integrity
- Side-channel Attacks
- DoS Attacks



Synopsys MACsec Security Modules for Ethernet

# Die-to-die Security: PCIe Encryption

## Why PCIe Security?

- Side-channel Attacks
- Physical Attacks
- Channel Isolation
- Authenticity and Integrity



Synopsys DesignWare IP for PCI Express

Parts 1.1, 1.2, 2.1, 2.2.b, 2.3 of the PS posed  
major questions to be answered

With a background of over 15 years  
experience & Know How, Sendinblue team  
strives for excellence while ensuring long  
term partnerships to achieve success

- 01
- 02
- 03
- 04
- 05

# Why Chiplets? And Where? Microarchitectures Interconnects Safety & Security Thermal Management

# The Transition

Problems encountered while transitioning from 2D to 2.5 and 3D:

- Heat dissipation
- Electromigration
- Stress
- Thermal expansion.



# Traditional Method

Chiplets sit on the interposer, connected via microbumps, a lid encompassing it, connected via TIMs.

A heat sink sits on top, with the whole assembly on a substrate and then on a PCB



# Thermal Modelling Theory: Navier-Stokes Eq

Mass Conservation (1)

$$\frac{\partial \rho}{\partial x} + \nabla \cdot (\rho u) = 0$$

The Energy Equation for a Liquid (2)

$$\frac{\partial(\rho u)}{\partial t} + \nabla \cdot (\rho hu) = \nabla \cdot [(k + k_t) \nabla T] + S_h$$

$k_t$  Conductivity due to turbulent transport

$S_h$  Volumetric heat

The Momentum Equation (3)

$$\frac{\partial(\rho u)}{\partial t} + \rho(u \cdot \nabla u) = -\nabla p + \nabla \cdot \tau + \rho g$$

The Energy Equation for a Solid (4)

$$\frac{\partial(\rho u)}{\partial t} = \nabla \cdot (k \nabla T) + S_h$$

# THERMAL MODELLING

Assumptions :

- The control volume is filled with static air.
- Gravity acts downward.
- The TIM layers and the microbumps are approximated to cuboidal packages.
- Radiation is not ignored.
- Cooling fan used is ADDA 8381HB AT1, has a rated power of 2.88 W.



| Object     | Material         | Thermal conductivity |
|------------|------------------|----------------------|
| Chiplet    | GaAs             | 46 W/mK              |
| Interposer | Silicon          | 148 W/mK             |
| TIM        | SiO <sub>2</sub> | 1.5 W/mK             |
| Lid        | Copper           | 400 W/mK             |
| Substrate  | Epoxy            | 0.3 W/mK             |

Thermal Conductivities of the various materials used

# OBSERVATION AND RESULT

Can cool upto 80W of heat at a peak temperature of 120°C

Initial power outputs (0.33, 1.5, 7.5, 12 W)

Peak temperature

--- 33.689°C.



# OBSERVATION AND RESULT

Fin shapes considered - plate, cylindrical, columnar, square and staggered(pin) fins.

The heat dissipation => proportional to the surface area => staggered fins

Taller => Better heat dissipation.

| Column 1    | Column 2 |
|-------------|----------|
| plate       | 33.698   |
| square      | 29.89    |
| columnar    | 29.67    |
| cylindrical | 31.556   |
| staggered   | 27.2     |

Comparison of Different Fins



| Height (mm) | Temperature (°C) |
|-------------|------------------|
| 28          | 33.698           |
| 26          | 34.849           |
| 24          | 36.761           |
| 22          | 37.522           |
| 20          | 38.31            |
| 18          | 42               |
| 16          | 47               |



- Appropriate for powers of 50-60W
- Economically viable

# FIRST MODIFICATION

- To minimize thermal resistance caused by the roughness and waviness because of the voids and holes:
- Ceramics group as TIM.
- Isotropic, high thermal conductivity ceramic insulators such as AlN (Aluminum Nitride)
  - Intermediate layers of the 3D stack
  - Mechanical strength is crucial but lower thermal conductivity
- hBN (Hexagonal Boron Nitride)
  - Reducing peak temperatures at hot spots
  - Higher thermal conductivity, softer and machinable
- Reduces hotspot temperatures by about 20 percent.
- HOPC/Np Cu  
(High-performance organic polymer composite)
- Carbon nanotubes



# SECOND MODIFICATION

- PCM instead of TIM
- Compared to traditional package sample using TIM, this modification reduces the thermal resistance from 0.46 to 0.1 °C/W under a 3-W power input, for a thickness of 1.6 mm (about 0.06 in), and with appreciable space for improvement.



# MICROFLUIDIC COOLING

An Analytical model based on RC circuit for microchannels.

$$R_{th\_uch} = \frac{1}{\frac{d}{dt}(c_p \cdot \left(1 - e^{-hA \frac{d(c_p)}{dt}}\right))}$$

$$D_H = \frac{2 \cdot a \cdot b}{(a + b)}$$

$$h = \frac{k_f \cdot Nu}{D_H}$$

$$Nu = \frac{0.065 \cdot (D_H/L) \cdot Re \cdot Pr}{1 + 0.04 \cdot 1 + 0.04 \cdot [(D_H/L) \cdot Re \cdot P]^{\frac{2}{3}}}$$



The coolant used was deionized water and the fan used was a Sanyo Denki 40 × 40 × 15 mm nonlinear axial fan.

The relationship between flow rate and temperature is non-linear; a drop below a critical flow rate results in a disproportionate increase in temperature.

- **Complexity:** Microfabrication techniques like photolithography and etching.
- **Leakage and clogging:** Microparticles or chemical reactions within the coolant.
- **Flow control:** Achieving a uniform flow
- **Material compatibility:** The coolant and microchannel materials must be compatible to avoid chemical reactions, corrosion, or degradation.
- **Pressure drop:** High flow rates in microchannels can lead to significant pressure drops, requiring powerful pumps and potentially affecting system efficiency.

JUNCTION TEMPERATURES OVER THE CHIPLETS AT THE GIVEN FLOW RATE FOR THE FIRST SCENARIO (OTL—OVER-THE-LIMIT: 300 °C+)

| Volumetric flow rate [CCM] | CPU junction temperature min – max [°C] | GPU junction temperature min – max [°C] | Memory junction temperature min – max [°C] |
|----------------------------|-----------------------------------------|-----------------------------------------|--------------------------------------------|
| 0                          | OTL                                     | OTL                                     | OTL                                        |
| 10                         | 107 – 150                               | 56.6 – 123                              | 42.5 – 64.2                                |
| 100                        | 49.2 – 60.1                             | 33.5 – 46.2                             | 24.6 – 30.6                                |



Fig. 5. Temperature distribution along the dies in the first scenario. (Red arrow indicates the direction of the fluid flow.)

# JET IMPINGEMENT COOLING

Microfluidic Cooling (Above 100\$ per chip)

Jet Impingement cooling (Around 50\$ per chip)

This technique, wielding high-velocity jets of liquid or gas, directly attacks concentrated heat zones with laser-like precision.

## Advantages

- Minimal space
- Precise
- Cost-effectiveness

## Disadvantages

- Noise
- Surface Erosion
- It may not be sufficient for applications with high overall heat generation.



# SPRAY COOLING

## Advantages

- Uniform and Gentle Heat Transfer
- Scalability and Adaptability
- Low-Noise Operation
- Cost-Effectiveness (30\$ per chip)

## Disadvantages

- Lower Heat Transfer Capacity
- Water Consumption
- Condensation Risk - On the chiplet surface, posing potential risks for corrosion or electrical short circuits. Careful design and material selection



(a) Spray chamber



(b) heating body



(c) test section