



睿思芯科

**synopsys®**  
Silicon to Software™

# Full Stack Software Hardware Co-design

## Pygmy RISC-V AI SoC on Zebu

Cissy Yuan ( RIVAI )   Xing Wang (SYNOPSYS)

# AIoT – Fast Growing, Fragmented Market



Customized architectures achieve much better compute efficiency

# Fast Evolution of AI Calls for SW HW Co-design



Computing Centric TPU  
Customized ASIC for Tensor Flow  
15-30x faster than GPGPU, GPCPU

Memory Centric  
Customized ASIC  
Computing close to memory  
3-10x faster

Save wasted computing/memory  
SW HW Co-design  
3-7x energy efficiency

SW HW Co-design can bring best cost-performance

[Cissy.Yuan@RIVAI.AI](mailto:Cissy.Yuan@RIVAI.AI)

# SW HW Co-design on Pygmy, Heterogenous Multicore AI SoC



Vector AI engines have customized RISC-V ISA  
64bit multicores boot full Linux with SMP



Compile application to customized ISA.  
AI applications directly program vector engines.

**RISC-V extended ISA enabled full stack SW HW co-design**

[Cissy.Yuan@RIVAI.AI](mailto:Cissy.Yuan@RIVAI.AI)

# Co-design Requires Challenging Full Stack Simulation & Debug

Enormous flow effort at functional, performance, power simulation & debug

## ❖ Long Runtime

Boot Linux kernel at block-level takes **24 hours, 20 times slower** at chip level.

**Co-Simulation with precise CPU behavioral model is 1.5 times slower.**

**Hours to days manually** port ASIC RTL to FPGA.

**2-6 hours** just to generate bit file.

## ❖ Hard to debug

Difficult to dump out at useful time period.

Hard to reproduce bugs.

Visibility is very low.

Zebu is a powerful weapon to overcome these challenges

[Cissy.Yuan@RIVAI.AI](mailto:Cissy.Yuan@RIVAI.AI)



# ZeBu Supports All High Value Emulation Use Cases

Differentiated ZeBu Technologies drive emulation efficiency

## ZeBu Use Cases



ZeBu Server 4

OS Boot and  
Driver/FW for  
IP and systems

Full Chip RTL  
Verification

Performance  
Validation

Software-driven  
Power Analysis

Power Management  
Validation (UPF)

System Validation  
using ICE or Virtual  
Integration

Simulation  
Acceleration

Gate-Level  
Emulation

Performance

Hybrid / Virtual Host

Streaming

Unified, Parallel and  
Incremental Compile

Virtual Integrations

Triggers

## ZeBu Technologies

Transactors and Models

Speed Adaptors

Replay

# ZeBu Pygmy Emulation Environment



# Pygmy RISC-V AI Chip on ZeBu Demonstration

# Pygmy Linux Boot Debug Case Study



# Pygmy Performance Debug Case Study

With ZEMI3, performance issues can be debugged at higher level more efficiently



# Analyze Power using RTL: Hours instead of Weeks

*RTL average power for millions of cycles expands beyond legacy emulation flows for 10,000s of cycles*



20khz (18min)\*

Dump Waveform for all power essential signals (Registers)

1 Hr x 150 Grid Jobs\*

Waveform reconstruction for all signals and generate SAIF

RTL Power Estimation  
for Average Power

Average power for millions of cycles

\*1 Manhattan frame, 2B gates design on ZS4

# Conclusions

- SW HW co-design brings best cost-performance in fast growing AIoT market.
- ZeBu delivered highest efficiency for full stack Functionality Verification, Performance Debug and Power Analysis.
- RISC-V extended ISA enabled full stack SW HW co-design flow.
- 软硬件联合开发能获得最好的性价比， Zebu 助力实现最快的全栈迭代，这一切得益于RISC-V

磨刀不误砍柴工



# Thank You

We are hiring ☺ [info@RIVAI.AI](mailto:info@RIVAI.AI)