

# CircuitNet: an open-source dataset for machine learning applications in electronic design automation (EDA)

Zhuomin CHAI<sup>1,2</sup>, Yuxiang ZHAO<sup>1</sup>, Yibo LIN<sup>1\*</sup>, Wei LIU<sup>2</sup>,  
Runsheng WANG<sup>1</sup> & Ru HUANG<sup>1</sup>

<sup>1</sup>School of Integrated Circuits, Peking University, Beijing 100871, China;

<sup>2</sup>School of Physics and Technology, Wuhan University, Wuhan 430072, China

Received 28 July 2022/Revised 20 August 2022/Accepted 30 August 2022/Published online 13 September 2022

**Citation** Chai Z M, Zhao Y X, Lin Y B, et al. CircuitNet: an open-source dataset for machine learning applications in electronic design automation (EDA). Sci China Inf Sci, 2022, 65(12): 227401, <https://doi.org/10.1007/s11432-022-3571-8>

The electronic design automation (EDA) community has been actively exploring machine learning (ML) for very large-scale integrated computer-aided design (VLSI CAD). Many studies explored learning-based techniques for cross-stage prediction tasks in the design flow to achieve faster design convergence. Although building ML models usually requires a large amount of data, most studies can only generate small internal datasets for validation because of the lack of large public datasets. In this essay, we present the first open-source dataset called **CircuitNet** for ML tasks in VLSI CAD.

VLSI circuit design can be divided into front-end design and back-end design. The front-end design implements the functionality of the circuit, and then, the back-end design transforms the circuit into manufacturable geometries, i.e., layouts. In advanced technology nodes, the back-end design is time-consuming because of iterative information feed-forward and feed-backward between the design stages during optimization. To accelerate this process, cross-stage prediction was introduced to replace the original long feedback loops between design stages with local loops within the design stages. As a promising method for fast and accurate cross-stage prediction, ML has been explored for various early-stage prediction tasks in the design flow, including routability and IR drop [1].

Despite the active research on ML for CAD, there remain some challenges in this field. There is almost no public dataset dedicated to ML for CAD applications because of license restrictions and domain-specific expertise for data generation. Meanwhile, the existing datasets obtained from CAD contests are often incomplete and not designed for ML applications [2]. The lack of public datasets raises challenges such as difficulty in benchmarking and reproducing previous work, limited research scope from limited data access, and a high bar for new researchers, which slows down further advancements in this field. To this end, we present

the first open-source dataset, **CircuitNet**, which provides holistic support for cross-stage prediction tasks in back-end design with diverse samples.

*Dataset overview.* The statistics of the dataset are summarized in Figure 1. We followed two steps to generate the dataset: data collection and feature extraction.

Data collection consisted of two stages: logic synthesis and physical design. In logic synthesis, the RISC-V designs were mapped from register transfer level (RTL) designs to gate-level netlists in the 28 nm technology node with Synopsys Design Compiler. Then, the physical design transformed the netlists into layouts with Cadence Innovus. We improved the diversity of the dataset by introducing different settings in logic synthesis and physical design, as shown in Figure 1(a). These settings contributed to variations in utilization, routing resources, macro locations, etc., reflecting diverse situations in the back-end design flow. Each design has 2160 settings, and all the designs have 12960 runs of the back-end design flow. Eventually, we obtained 10242 layouts after excluding the failed runs.

In feature extraction, features were extracted at various design stages to support different cross-stage prediction tasks, as shown in Figure 1(b). We included both graph-like features (i.e., gate-level netlists) and image-like features (i.e., two-dimensional feature maps extracted from the physical layouts, as the design information can be naturally represented by image-like data by dividing a layout into tiles and regarding each tile as a pixel). These features are widely adopted in the state-of-the-art routability and IR drop prediction models [3–5].

*Dataset evaluation.* To evaluate the effectiveness of CircuitNet, we further conducted experiments on three prediction tasks: congestion, design rule check (DRC) violations, and IR drop. Each experiment adopted a method from recent studies [3–5] and evaluated its result on CircuitNet with the same evaluation metrics as in the original studies. These

\* Corresponding author (email: [yibolin@pku.edu.cn](mailto:yibolin@pku.edu.cn))

| Design       | Netlist statistics |        |                               | Synthesis variations |                 |
|--------------|--------------------|--------|-------------------------------|----------------------|-----------------|
|              | #Cells             | #Nets  | Cell area ( $\mu\text{m}^2$ ) | #Macros              | Frequency (MHz) |
| RISCY-a      | 44836              | 80287  | 65739                         |                      |                 |
| RISCY-FPU-a  | 61677              | 106429 | 75985                         | 3/4/5                |                 |
| zero-riscy-a | 35017              | 67472  | 58631                         |                      | 50/200/500      |
| RISCY-b      | 30207              | 58452  | 69779                         |                      |                 |
| RISCY-FPU-b  | 47130              | 84676  | 80030                         | 13/14/15             |                 |
| zero-riscy-b | 20350              | 45599  | 62648                         |                      |                 |

  

| Physical design variations |                  |                     |                                |
|----------------------------|------------------|---------------------|--------------------------------|
| Utilizations (%)           | #Macro placement | #Power mesh setting | Filler insertion               |
| 70/75/80/85/90             | 3                | 8                   | After placement /after routing |

(a)



(b)

**Figure 1** (Color online) (a) Statistics of designs and variations introduced during data collection; (b) available features, their extraction stages, and prediction tasks in the experiments.

methods utilized image-like features to train a generative model, such as fully convolutional networks (FCNs) and U-Net, by formulating the prediction task into an image-to-image translation task. A detailed manual about the setup of these experiments is available on our webpage. Herein, we briefly introduce our results.

First, for congestion prediction, we used the normalized root-mean-square-error and structural similarity index measure as metrics to evaluate pixel-level accuracy. The corresponding results for an FCN based method [3] were 0.040 and 0.80, respectively. Second, for DRC violation prediction, we considered the area under the curve (AUC) of the receiver operating characteristic (ROC) curve and that of the precision-recall (PR) curve as the metrics for imbalanced learning. The corresponding results for an FCN based method [4] were 0.95 and 0.63, respectively. Finally, for IR drop prediction, we evaluated the AUC of the ROC curve and that of the PR curve. The corresponding results for a U-Net based method [5] were 0.94 and 0.83. Overall, our results are relatively consistent with the original publications and demonstrate the effectiveness of CircuitNet.

*Usage.* We separated the features shown in Figure 1(b) and stored them in different directories to enable custom applications. We provided scripts for preprocessing and combining different features for training and testing used in the

above experiments as references.

*Access methods.* The user guide and the download link for CircuitNet can be accessed from <https://circuitnet.github.io>.

## References

- Huang G, Hu J, He Y, et al. Machine learning for electronic design automation: a survey. *ACM Trans Des Autom Electron Syst*, 2021, 26: 1–46
- Bustany I S, Chinnery D, Shinnerl J R, et al. ISPD 2015 benchmarks with fence regions and routing blockages for detailed-routing-driven placement. In: Proceedings of the 2015 Symposium on International Symposium on Physical Design, 2015. 157–164
- Liu S T, Sun Q, Liao P Y, et al. Global placement with deep learning-enabled explicit routability optimization. In: Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE), 2021. 1821–1824
- Xie Z Y, Huang Y-H, Fang G-Q, et al. RouteNet: routability prediction for mixed-size designs using convolutional neural network. In: Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2018. 1–8
- Chhabria V A, Zhang Y Q, Ren H X, et al. MAVIREC: ML-aided vectored IR-drop estimation and classification. In: Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE), 2021. 1825–1828