

# Efficient Verification of Multi-Die Systems using Multi-Die Co-Simulation Framework

Manvendra Singh, Lawish Kumar Deshmukh, Gaurav Jain

Renesas Electronic India Pvt. Ltd.

Noida, Uttar Pradesh

{manvendra.singh.yk|lawish.deshmukh.sx|Gaurav.jain.ub@renesas.com}

**Abstract-** The increasing adoption of multi-die designs in High-Performance Computing (HPC) poses significant verification challenges. These integrated systems combine multiple dies within a single package to function as a unified System-on-Chip (SoC), offering scalability, high bandwidth, and energy efficiency. However, verifying the correctness and functionality of these complex systems is a daunting task. This paper presents a novel approach using Multi-Die Co-Simulation Framework, a parallel computing technique that enables efficient verification of complex multi-die systems. Our work demonstrates the effectiveness of this technology in reducing simulation time, improving verification efficiency, and enhancing scalability, making it an ideal solution for next-generation HPC systems.

## INTRODUCTION

The evolution of high-performance computing has ushered in an era where multi-die, chiplet-based designs are supplanting traditional monolithic dies. The evolution from monolithic dies to chiplet-based architectures represents a significant advancement in integrated circuit development. Traditionally, monolithic dies presented limitations in terms of scalability and the adoption of new process technologies. In contrast, chiplet technology allows engineers to integrate distinct silicon blocks (dies), thereby optimizing cost, performance, and manufacturing yield. This methodology aligns with the progression outlined by Moore's Law, achieving improvements primarily through architectural innovation rather than solely increasing transistor density.

Although chiplet based architectures offer scalability, efficient bandwidth utilization, and lower power consumption, they also pose considerable challenges for system design verification. This paper presents a parallel computing strategy that utilizes distributed simulation technology to tackle the complexities associated with multi-die verification. The methodology, encountered challenges, and measurable outcomes arising from the implementation of this approach are discussed, with an emphasis on its influence on verification efficiency and coverage.



#### PROBLEM STATEMENT

The proliferation of multi-die packages in HPC brings formidable verification challenges. A typical multi-die SoC comprises several dies (e.g., DIE1, DIE2, DIE3), interconnected using standards such as PCIe TX/RX. Ensuring functional correctness in such systems requires robust verification strategies that can efficiently simulate complex inter-die signaling, synchronization, and system-level use cases.

- Sign-off individual dies thoroughly before integration on Package
- Focus on inter-die signal integration and communication
- Address complex functionalities spanning multiple dies
- Simulate interactions between dies in the system environment
- Ensure system-level use cases are effectively verified



#### MULTI DIE DESIGN VERIFICATION CHALLENGES

The verification of multi-die designs is hindered by several challenges, including:

- Complex testbench development: Creating a combined testbench for multiple dies is time-consuming and adds significant overhead to the verification cycle.
- High tool memory requirements: As the size of the design increases, the memory requirements for verification tools become prohibitively high.
- Simulation performance: Simulating multiple dies together results in a multifold increase in simulation time, making it difficult to verify the entire system.
- Lack of reusability and flexibility: The verification process is not easily reusable or flexible across different SoCs, leading to inefficiencies and increased development time.
- Communication Synchronization between chiplets: Challenge in high-speed communication (PCIe) and synchronization for data transfer between the two Chiplets.

#### MONOLITHIC SoC VS. MULTI-DIE SIMULATION

Monolithic SoC simulation typically involves a single executable and a unified simulation environment. In contrast, a multi-die SoC simulation requires multiple concurrent executables, each managing a separate die and associated testbench, coordinated through inter-die connection configurations.

### Monolithic SoC vs Multi Die Simulation



We explored two types of die-to-die simulation environments: homogeneous and heterogeneous. A homogeneous distributed sim environment involves dies/designs with the same design database, while a heterogeneous distributed sim environment involves dies/designs with distinctive design databases.

### Types of Die-to-Die simulation



### SOLUTION

Multi-Die Co-Simulation Framework addresses the challenges with conventional methods. Our approach utilizes a native framework that enables parallel simulation of each die, allowing multiple participating executables to run simultaneously for both homogeneous and heterogeneous dies. This parallel die simulation technology connects the individual die simulations, enabling the effective verification of multi-die system use cases. By adopting this approach, we can:

- Reduce simulation time: By simulating each die in parallel, we can significantly reduce the overall simulation time.

- Improve verification efficiency: Our approach enables the reuse of verification IP and testbenches across different SoCs, reducing development time and increasing verification efficiency.
- Enhance scalability: The Multi-Die Co-Simulation Framework can handle large, complex designs, making it an ideal solution for next-generation HPC systems.



We have applied Multi-Die Co-Simulation Framework to verify critical system scenarios in HPC systems, including PCIe communication, Inter Processor communication across dies, Functional Safety error handling and debug of connected chiplets. The industrial application of this technology is significant, as it enables the efficient verification of next-generation HPC systems, reducing development time and increasing overall system reliability.

## RESULTS

With the presented co-simulation framework, we have verified critical multi-die system scenarios. Examples of a few scenarios:

- PCIe Communication: Chiplets are connected through PCIe. These connections are established in the co-simulation framework. Data transfer over the PCIe link from one chiplet to another is verified.
- Inter Processor Communication: Core in one chiplet exchanges message with core in other chiplet.
- FuSa error handling: In a multi-die system top level safety can be managed by one chiplet. Fatal errors from remaining chiplets needs to be handled by this chiplet. The same have been tested.
- Debug: Chiplets in multi-die systems are connected through debugger. It is required that debugger can correctly access the debug components.

Above scenarios are representation of co-simulation application. The scope is not limited to these scenarios.

Using the presented co-simulation framework we could:

- left shift bring up of multi-die system verification by 4 to 6 weeks
- significantly reduce the simulation run time resulting into faster turnaround time
- reuse the existing chiplet testbenches
- save a huge amount of disk space which would otherwise be needed with conventional methods
- establish preliminary performance metrics for die-to-die communication



## LEARNINGS AND ROAD AHEAD

- Must have same shared filesystem access across executables
- An optimized IT network can help improve multi-die simulation performance
- Sync interval mechanism has to be robust to improve performance
- UCIE Phy synchronization in case you are using the end to end design
- Local Sync mechanism was required to ensure really parallel execution
- Regression enablement – optimal use of LSF resources
- Scoreboard/Checker across multi-die simulations

## CONCLUSION

In conclusion, this paper showcases the effectiveness of Multi-Die Co-Simulation Framework in verifying complex multi-die systems. Our results demonstrate the effectiveness of this technology in reducing simulation time, improving verification efficiency, and enhancing scalability. The industrial application of this technology is significant, and we believe that it has the potential to revolutionize the verification of complex multi-die systems in HPC and other fields.

This paper successfully demonstrated multi-die SoC verification capability by repurposing existing infrastructure, enabling comprehensive system verification while maintaining testbench reusability.

## REFERENCES

- [1] UCIe Consortium, “Universal Chiplet Interconnect Express (UCIE) Specification.”
- [2] Cadence Xcelium Distributed Simulation Documentation
- [3] Synopsys VCS Distributed Simulation Documentation