

# HeaRT: A Hierarchical Circuit Reasoning Tree-Based Agentic Framework for AMS Design Optimization

Souradip Poddar<sup>1</sup>, Chia-Tung Ho<sup>2</sup>, Ziming Wei<sup>1</sup>, Weidong Cao<sup>3</sup>, Haoxing Ren<sup>2</sup>, David Z. Pan<sup>1</sup>

<sup>1</sup>ECE Department, The University of Texas at Austin    <sup>2</sup>NVIDIA Corporation    <sup>3</sup>The George Washington University  
{souradippddr1@, dpan@ece.}utexas.edu

## ABSTRACT

Conventional AI-driven AMS design automation algorithms remain constrained by their reliance on high-quality datasets to capture underlying circuit behavior, coupled with poor transferability across architectures, and a lack of adaptive mechanisms. This work proposes HeaRT, a foundational reasoning engine for automation loops and a first step toward intelligent, adaptive, human-style design optimization. HeaRT consistently demonstrates reasoning accuracy >97% and Pass@1 performance >98% across our 40-circuit benchmark repository, even as circuit complexity increases, while operating at <0.5× real-time token budget of SOTA baselines. Our experiments show that HeaRT yields  $\gtrsim 3\times$  faster convergence in both sizing and topology design adaptation tasks across diverse optimization approaches, while preserving prior design intent.

## 1 INTRODUCTION

Analog and mixed-signal (AMS) circuit design remains challenging to automate due to its fully custom flows, intricate trade-offs in deep sub-micron technologies, and the high cost of re-optimization when specifications change. Conventional Bayesian Optimization (BO) methods [1–3] offer strong sample efficiency but fail to scale effectively in complex, high-dimensional design spaces. Recent learning-based approaches, particularly reinforcement learning [4–13], demonstrate improved scalability for larger circuits, yet suffer from poor sample efficiency and prohibitive simulation costs. Moreover, their reliance on manually encoded design knowledge for circuit partitioning limits autonomy and scalability. Being purely data-driven, these models exhibit an inherent *black-box* nature [14], hindering their ability to capture circuit intuition or physical causality. Consequently, they often fail to generalize across architectures or incremental design updates, while lacking explainability and eroding designer trust in result quality (Fig. 1(a)).

Most recently, Large Language Model (LLM)-based methods [14–24] have shown great potential in advancing AMS design automation. By leveraging their cognitive reasoning and agentic capabilities, these models can emulate key aspects of human design workflows, offering a promising pathway toward more intelligent and autonomous analog design systems. Yet, current Vanilla LLM reasoning for AMS design remains largely opaque and often inconsistent, undermining both their credibility and practical effectiveness (Fig. 1(b)). Moreover, existing LLM-aided approaches lack mechanisms to adaptively balance design reuse and redesign within topology-sizing co-optimization. Consequently, they frequently re-optimize entire circuits from scratch when specifications change, leading to *catastrophic forgetting* [25] of valuable prior design knowledge. In real AMS workflows, many subcircuits are already layout-planned, variation-optimized [26–30], or even silicon-proven, making such full re-optimization impractical. This lack of architectural and contextual awareness causes redundant computation, poor sample efficiency, and inconsistent reliability, ultimately hindering LLM deployability in industrial design flows.



Figure 1: An illustration of comparisons of traditional black-box optimization algorithm, vanilla LLM-based approach, and the proposed HeaRT framework for the AMS design task.

To fully harness the potential of LLMs enriched with human design knowledge for AMS automation, we propose HeaRT, an analytically guided, agentic foundation reasoning framework that enables reasoning-grounded downstream applications. HeaRT draws inspiration from the hierarchical abstraction principles underlying human circuit design, a perspective largely underexplored in current LLM-based approaches. By constructing a human-design-inspired Hierarchical Circuit Reasoning Tree, HeaRT enables efficient, real-time, and context-aware reasoning, producing query-conditioned reasoning traces that improve interpretability and debugging [31] (Fig. 1(c)). Our key contributions are summarized as follows:

- We develop HeaRT, a novel analytically-guided multi-level, agentic reasoning framework that performs top-down KCL- and graph-guided circuit decomposition with hierarchical organization, followed by bottom-up, context-aware knowledge consolidation into a persistent hierarchical knowledge graph that anchors subsequent LLM reasoning.
- We introduce a task-specific, rank-based retrieval mechanism that selects and plugs in suitable topologies for context-aware, performance-driven reconfiguration and sizing refinement, while maintaining electrical correctness.
- We rigorously evaluate HeaRT across a 40-circuit AMS benchmark spanning diverse circuit types and complexity tiers, where it consistently achieves >97% reasoning accuracy and >98% Pass@1, while operating at <0.5× the real-time token budget of existing baselines. HeaRT further delivers over 3× faster convergence on both sizing and topology-adaptation tasks across multiple optimization algorithms, preserving prior design intent under evolving specifications.
- This pioneers a successful effort toward advancing AMS design automation frameworks into an intelligent, adaptive and practical paradigm capable of context-aware decision-making across topology and sizing optimization tasks.



Figure 2: (a) Overall workflow of the HeaRT framework. (b) Illustrative 8-bit SAR ADC example. (c) Hierarchical circuit reasoning tree generated by HeaRT for the SAR ADC.

## 2 PRELIMINARIES

### 2.1 LLM-Aided AMS Design Automation

The emergence of reasoning-enabled LLMs endowed with multi-modal understanding and agentic decision-making capabilities [32–40] has redefined autonomous problem-solving across diverse domains, motivating their evaluation within EDA. However, LLMs exhibit inherent challenges such as hallucinations [41], instability, and inconsistency, compounded by their lack of traceable reasoning. These issues conflict with the precision, determinism, and verifiability required in circuit design, undermining their credibility for practical deployment [31]. Moreover, data scarcity in the AMS domain [31, 42], driven by industrial confidentiality, intellectual property restrictions, and a limited open-source culture, forces researchers to construct their own small, task-specific private datasets, hindering large-scale, shared development [43]. Consequently, recent efforts [14–24] remain task-constrained, data-hungry, or architecture-specific, while still struggling with interpretability and debuggability, restricting their broader applicability. While [44] is a step toward enhancing designer interpretability by converting SPICE netlists into schematic images, it still lacks autonomous circuit reasoning. These gaps underscore the need for robust, analytically grounded reasoning frameworks to enhance the trustworthiness and applicability of modern LLMs within EDA.

### 2.2 Knowledge Retention and Design Locality

To complement performance-oriented, Figure of Merit (FoM)-based evaluation, we introduce the *Prior Circuit Knowledge Retention Index* (PCKRI) to quantify how well an optimization preserves prior design intent and design locality in scenarios where incremental design adaptation is preferred over full re-optimization. We define PCKRI as the product of two retention components: *Topology Retention Score* (TRS) and *Design Variable Retention Score* (DVRS), given by:

$$\text{PCKRI} = \text{TRS} \times \text{DVRS}, \text{ where} \quad (1)$$

$$\text{TRS} = 1 - d_{\text{edit}}(G^{(0)}, G) / |E^{(0)}|, \quad (2)$$

$$\text{DVRS} = 1/M \sum_{j=1}^M \exp \left[ -k \min(1, |\log_{10}(x_j/x_j^{(0)})|) \right].$$

Here, TRS captures structural similarity, and thus the degree of topology retention, with  $d_{\text{edit}}(G^{(0)}, G)$  denoting the minimum number of edge insertions/deletions required to transform the reference circuit graph  $G^{(0)}$  into the optimized topology  $G$ , normalized by reference circuit edge count  $|E^{(0)}|$ . DVRS, in turn, quantifies parameter retention within a retained architecture section with  $M$  design variables, where  $x_j^{(0)}$  and  $x_j$  represent the initial and modified design variable values, respectively, and  $k$  controls sensitivity to deviation. The  $\min(\cdot, 1)$  cap prevents single large deviations from disproportionately influencing the overall score. A PCKRI value of 1 denotes complete structural and parametric reuse (*i.e.*, perfect knowledge retention), whereas values approaching 0 imply total knowledge loss. Thus, PCKRI serves as a direct measure of design effort reuse, and also reflects reliability when prior silicon-proven or variation-tolerant designs exist.

## 3 THE HEART FRAMEWORK

Current LLM-based approaches for AMS design still fall short of the explainability and structured reasoning exhibited by human design flows, and their opaque, inconsistent reasoning limits practical deployment and trustworthiness. Inspired by the cognitive process through which human designers analyze, abstract, and interpret circuits, we propose HeaRT as a hierarchical reasoning framework that leverages human-inspired design philosophy to achieve transparent, context-consistent circuit understanding for downstream agentic tasks. As depicted in Fig. 2(a), HeaRT begins with graph-based current path tracing and subcircuit extraction, followed by hierarchical organization, and culminates in bottom-up architectural interpretation with context-aware knowledge consolidation. By combining top-down circuit decomposition with bottom-up, semantically consistent knowledge consolidation, HeaRT grounds its reasoning in the structured abstraction workflow characteristic of human circuit designers. By integrating bottom-up hierarchical reasoning with the full circuit netlist as a global context at every stage, our framework fosters multi-perspective understanding [45–47]. The framework of HeaRT operates through two complementary phases: an offline, one-time knowledge-building stage that constructs the hierarchical circuit reasoning tree (Section 3.1), and an online, real-time agentic retrieval stage that leverages standard LLM



**Figure 3:** An illustrative example of HeaRT’s analytical-guided circuit decomposition.

reasoning capabilities while grounding responses in this analytical knowledge base, achieving grounded, consistent, and efficient reasoning beyond single-shot, full-circuit inference (Section 3.2). The following subsections describe each phase in further detail. To enhance reproducibility and facilitate further research and development in related directions, we plan to release HeaRT as an open-source framework soon.

### 3.1 Analytical-Guided Hierarchical Circuit Reasoning Tree Construction

This phase constitutes a one-time offline process that constructs a DC current-compliant knowledge graph directly from raw SPICE netlists of a design, building a hierarchically structured circuit reasoning tree that serves as the analytical backbone for all subsequent agentic retrieval tasks.

**3.1.1 Graph-Guided Reasoning for Tree Construction.** Firstly, we transform the SPICE-level circuit netlist into a bipartite graph representation  $G = (V_D, V_N, E)$ , where the device nodes  $V_D$  represents circuit components (e.g., transistors, resistors, capacitors), and net nodes  $V_N$  correspond to unique net names. Each net node is labeled by its type as SUPPLY\_PORT, SIGNAL\_PORT, or INTERNAL\_NET. Each edge  $e \in E$  encodes a unique device-terminal-net connection. For splitting-oriented purposes, we reasonably simplify MOSFETs as three-terminal devices (D, G, S), omitting body connections. Then, the DC current path are extracted from  $G$  as constraints for LLM-assisted circuit subcircuit extraction, together with the associated metadata as described in Algorithm 1. Fig. 3 shows an example of decomposing the flattened netlist into subcircuits via DC current path guidance. The extracted subcircuits,  $S_{\text{decomp}}$ , along with their adjacent nodes, are input to the LLM to construct meaningful subcircuit nodes reflecting circuit functionality. We leverage metadata for each subcircuit—unique subcircuit\_ID, role\_hint, architectural context, and port annotations—as local informative cues, complemented by the full circuit netlist for global context.

**3.1.2 Bottom-Up Lightweight Analysis.** This bottom-up consolidation phase, progressing from leaf to root, leverages local and global

---

### Algorithm 1 : Reasoning-Grounded Circuit Decomposition via DC Current Path Guidance

---

```

Require: Raw SPICE netlist  $N$ 
Ensure: Subcircuits  $S_{\text{decomp}}$  enriched with contextual metadata
1: function EXTRACTDCSUBCIRCUITS( $G$ )
2:    $G_{\text{DC}} \leftarrow$  remove all non-DC-conductive terminals (MOS gates, capacitor nodes), then drop isolated nodes
3:   Initialize  $\mathcal{R}_{\text{VDD}} \leftarrow \emptyset$ ,  $\mathcal{R}_{\text{GND}} \leftarrow \emptyset$ ,  $D_{\text{DCAlive}} \leftarrow \emptyset$ 
4:   Perform multi-source BFS on  $G_{\text{DC}}$  from (VDD, GND)
5:    $\mathcal{R}_{\text{VDD}}, \mathcal{R}_{\text{GND}} \leftarrow$  terminals reachable from VDD and GND, respectively
6:    $D_{\text{DC alive}} \leftarrow \{ d = (n_1, n_2) \in D \mid (n_1 \in \mathcal{R}_{\text{VDD}} \wedge n_2 \in \mathcal{R}_{\text{GND}}) \vee (n_2 \in \mathcal{R}_{\text{VDD}} \wedge n_1 \in \mathcal{R}_{\text{GND}}) \}$ 
7:    $S_{\text{DC}} \leftarrow$  connected components formed by DC-alive devices
8:   return  $S_{\text{DC}}$ 
9: end function

10: function PROCESSRESIDUALANDACCOMPONENTS( $G, S_{\text{DC}}$ )
11:    $G_{\text{res}} \leftarrow G \setminus S_{\text{DC}}$ 
12:    $S_{\text{AC}} \leftarrow$  connected components of  $G_{\text{res}}$ 
13:   return  $S_{\text{AC}}$ 
14: end function

15:  $G(V_D, V_N, E) \leftarrow \text{NETLISTTOBIPARTITE}(N)$   $\triangleright$  device–net bipartite graph construction
16: Annotate each  $n \in V_N$  as SUPPLY_RAIL, SIGNAL_PORT, or INTERNAL_NET  $\triangleright$  initial net-level metadata
17:  $S_{\text{DC}} \leftarrow \text{EXTRACTDCSUBCIRCUITS}(G)$   $\triangleright$  DC Connectivity Constraints
18:  $S_{\text{AC}} \leftarrow \text{PROCESSRESIDUALANDACCOMPONENTS}(G, S_{\text{DC}})$ 
19:  $\mathcal{P} \leftarrow \text{BUILD_PROMPT}(N, S_{\text{DC}}, S_{\text{AC}}, \text{CoT}, \text{In-Context Exemplars})$ 
20:  $O_{\text{LLM}} \leftarrow \text{LLM}(\mathcal{P})$   $\triangleright$  LLM Reasoning for Circuit Decomposition
21:  $S_{\text{decomp}} \leftarrow \text{Parse}(O_{\text{LLM}})$ 
22: return  $S_{\text{decomp}}$ 

```

---

metadata as informative cues to build context-consistent understanding through the hierarchy. At each node, our agent analyzes the subcircuit’s function and its relation to the parent, describing its role within the broader circuit and identifying any local feedback loops among its immediate children, labeling each with a unique loop ID. Simultaneously, the node’s netlist and its signal and supply ports are synthesized to maintain coherent, context-aware representations across abstraction levels. Finally, all functional relations are organized into the hierarchical circuit reasoning tree for retrieval during inference.

### 3.2 Agentic Retrieval

This phase performs online, real-time agentic traversal and retrieval, grounding LLM reasoning in the analytical-guided circuit reasoning tree built during the offline setup. The retrieval operations are bounded by the tree depth, enabling efficient and scalable exploration within the knowledge graph for consistent, context-grounded inference. To further reduce token expenditure while preserving high reasoning accuracy, we employ context engineering strategies [40], including context compression, coordinated local-global context handling, and selective relevant reuse.

**3.2.1 Topology Knowledge Database Management.** As illustrated in Fig. 2(a), we propose a topology-keyed knowledge database in which circuits are organized by functional category, with rows keyed by individual topologies and columns corresponding to their class-specific performance metrics (e.g., 3-dB bandwidth, DC gain,

**Table 1: Comparison of LLM-Based Methods for Analog and Mixed-Signal (AMS) Design.**

| Framework                 | Multiple Types <sup>1</sup> | Analytical-Guided Reasoning | Explainable Reasoning Traces | Performance-Driven Retrieval <sup>2</sup> | Targeted Opt. Tasks      | Commercial PDKs <sup>3</sup> |
|---------------------------|-----------------------------|-----------------------------|------------------------------|-------------------------------------------|--------------------------|------------------------------|
| Artisan [15]              | –                           | –                           | –                            | •                                         | TR & Sizing              | •                            |
| ADO-LLM [16]              | •                           | –                           | –                            | –                                         | Sizing Only              | •                            |
| AnalogCoder [17]          | •                           | –                           | –                            | –                                         | TG                       | –                            |
| Atelier [18]              | –                           | –                           | –                            | •                                         | TG <sup>†</sup> & Sizing | •                            |
| AnalogXPert [19]          | –                           | ○                           | –                            | –                                         | TG <sup>‡</sup>          | –                            |
| AnalogGenie/Lite [20, 21] | •                           | –                           | –                            | –                                         | TG & Sizing              | –                            |
| LEDRO [22]                | –                           | –                           | –                            | –                                         | Sizing Only              | •                            |
| AnaFlow [14]              | •                           | –                           | ○                            | –                                         | Sizing Only              | •                            |
| <b>HeaRT (This Work)</b>  | •                           | •                           | •                            | •                                         | TR & Sizing              | •                            |

<sup>1</sup> Indicates support for multiple AMS circuit families, <sup>2</sup> Indicates whether the method supports target performance-conditioned topology selection/retrieval, <sup>3</sup> Demonstrated evaluation on commercial PDKs, <sup>†</sup> Topology generation via retrieval followed by template-based mutation rather than synthesizing from scratch, <sup>‡</sup> Multiple retrieved topologies combined to generate larger composite architectures, TR: Topology Retrieval/Selection, TG: Topology Generation, • Supported, ○ Partial, – Not Supported.

phase margin, etc., for OPAMPs; temperature coefficient, startup margin, etc., for bandgap references, and so on). Each stored topology  $\mathcal{T}$  is systematically pre-ranked along every metric column, and the corresponding metric-specific rank  $r_i(\mathcal{T})$  is stored in each table entry, providing a structured basis for fast look-up and task-specific, context-aware, rank-based retrieval during downstream applications. In our current implementation, these stored rankings are derived from human expert annotation, which provides strong designer-intent alignment and reliable semantic accuracy. The database structure itself remains flexible and can incorporate alternative scoring sources in future iterations.

**3.2.2 Reasoning-Guided and Query-Conditioned Traversal.** In this stage, the agent performs a query-conditioned traversal of the hierarchical circuit reasoning tree. A single LLM pass first assigns query-conditioned relevance weights to every edge, using CoT and in-context reasoning, and attaches short natural-language rationales to guide subsequent exploration. Before traversal, each node  $v$  is evaluated using the following branch-admission criterion, which determines whether its children should be enqueued during BFS:

$$\text{Branch\_Cut}(v) = \left\{ \begin{array}{l} \max_{u \in \text{Children}(v)} w_{(v,u)} < \tau_{\text{stop}} \\ \text{or } \max_{u \in \text{Children}(v)} w_{(v,u)} - \min_{u \in \text{Children}(v)} w_{(v,u)} < \epsilon \end{array} \right\}, \quad (3)$$

where  $w_{(v,u)}$  is the query-conditioned relevance of child  $u$ ,  $\tau_{\text{stop}}$  halts expansion when all children are weak, and  $\epsilon$  stops expansion when all children are similarly strong (i.e., no dominant direction). If condition 3 is satisfied at a node  $v$ , all of its children are marked as non-admissible and therefore never enqueue into the BFS queue. A BFS from the root then enqueues children only when the Branch\_Cut criterion is not met. Traversal thus naturally terminates either at suppressed nodes or true leaves. The resulting root-to-terminal paths form the reasoning traces, and each terminal node anchors a query-focused subtree that defines the scoped search region for downstream tasks such as localized sizing optimization or objective-driven retrieval.

**3.2.3 Objective-Driven Retrieval for Topology Optimization.** Given a topology optimization task, HeaRT performs query-conditioned BFS traversal with terminal-node scoping to identify candidate subcircuits. It then applies an objective-driven retrieval process to update these candidates according to the user’s design specifications. The set of performance objectives to be optimized,  $\mathcal{S}$ , is automatically identified through LLM-based reasoning over natural language queries, with all remaining metrics collected into  $\mathcal{U}$ .

Through an agentic function call, HeaRT computes an aggregate rank-based Figure of Merit ( $\text{FoM}_{\text{rank}}$ ), defined as:

$$\min_{\mathcal{T} \in \phi(\mathcal{U})} \text{FoM}_{\text{rank}}(\mathcal{T}) = \sum_{i \in \mathcal{S}} w_i \times r_i(\mathcal{T}), \quad (4)$$

where  $w_i$  denotes the metric weight and  $\phi(\mathcal{U})$  the feasible search region determined by requirement-tone analysis over  $\mathcal{U}$ :

$$\phi(\mathcal{U}) = \{\mathcal{T} \mid |r_j(\mathcal{T}) - \bar{r}_j| \leq 3, \forall j \in \mathcal{U} \text{ deemed strict,}\} \quad (5)$$

with  $\bar{r}_j$  the prior rank on the  $j$ th metric; if no reference exists or the tone is relaxed, the metric is treated as “don’t-care,” yielding a full search space. Topologies that minimize this score represent the best candidates for the given design goal (top- $k$  when multiple candidates achieve near-best scores), enabling swift, objective-driven retrieval for topology selection, as demonstrated in Fig. 2(a).

## 4 EXPERIMENTAL RESULTS

### 4.1 Experimental Setup

We evaluate HeaRT’s reasoning accuracy using a benchmark repository of 40 AMS designs, spanning diverse types and complexities, as summarized in Fig. 4. For reasoning-accuracy benchmarking, we compare HeaRT against state-of-the-art LLM-based reasoning frameworks for AMS design—ADO-LLM [16], and AnaFlow [14], both re-implemented following their papers due to lack of open-sourced code, and LEDRO [22]—as well as proprietary frontier models—GPT-4o, Claude-4.5 Sonnet, Gemini-2.5 Pro, and GPT-5 as our baselines, using accuracy and real-time token efficiency as the primary evaluation metrics. To illustrate HeaRT’s downstream applicability, we select two of the largest system-level circuits in the repository—(i) a supply-insensitive relaxation oscillator and (ii) an analog front-end (AFE)—each evaluated under its corresponding design-adaptation scenario (Fig. 6). For both cases, we measure design-quality improvements using the performance FoM and the PCKRI index (Eq. (1)), where higher values indicate better outcomes. Our experiments use a 200-simulation budget and set  $k = 3$  in the PCKRI formulation. All evaluations employ GPT-5 for the Stage-1 knowledge graph construction while downstream interaction through the Interface Agent use lighter LLM variants (e.g., GPT-4o). Our schematic simulation results are based on the TSMC180nm process technology with SPICE in Cadence Analog Design Environment (ADE) on a 4-core 3.3GHz CPU Linux system.

Table 2: Aggregate reasoning accuracy and real-time token usage across circuit-complexity tiers in our benchmark repository.

| Framework                      | Simple                 |            |              | Medium                 |            |              | Hard                   |            |              | Avg. Tokens<br>(Real-Time) |  |
|--------------------------------|------------------------|------------|--------------|------------------------|------------|--------------|------------------------|------------|--------------|----------------------------|--|
|                                | Reasoning Accuracy (%) |            |              | Reasoning Accuracy (%) |            |              | Reasoning Accuracy (%) |            |              |                            |  |
|                                | Min                    | Max        | Pass@1       | Min                    | Max        | Pass@1       | Min                    | Max        | Pass@1       |                            |  |
| ADO-LLM <sup>†</sup> [16]      | 72.45                  | 88.19      | 43.50        | 49.24                  | 66.71      | 29.21        | 9.26                   | 26.39      | 6.26         | ≥3000                      |  |
| LEDRO <sup>†</sup> [22]        | 73.50                  | 89.77      | 46.23        | 50.25                  | 67.10      | 31.49        | 10.87                  | 28.44      | 6.90         | ~3200                      |  |
| AnaFlow [14]                   | 77.31                  | 91.58      | 57.91        | 66.88                  | 88.69      | 52.93        | 40.99                  | 62.85      | 30.63        | ≥3000×4 <sup>‡</sup>       |  |
| GPT-4o                         | 74.88                  | 90.42      | 48.37        | 62.57                  | 85.63      | 41.10        | 18.14                  | 41.36      | 9.70         | 2620                       |  |
| Claude-4.5 Sonnet <sup>‡</sup> | 76.41                  | 91.34      | 52.90        | 61.61                  | 87.37      | 46.01        | 35.67                  | 54.33      | 29.98        | 2690                       |  |
| Gemini-2.5 Pro <sup>‡</sup>    | 77.20                  | 91.52      | 56.32        | 65.99                  | 87.95      | 51.70        | 41.23                  | 64.71      | 33.33        | 2760                       |  |
| GPT-5 <sup>‡</sup>             | 79.67                  | 91.86      | 71.04        | 73.14                  | 90.75      | 64.67        | 68.55                  | 87.52      | 41.29        | 2480                       |  |
| <b>HeART</b>                   | <b>99.95</b>           | <b>100</b> | <b>99.95</b> | <b>99.67</b>           | <b>100</b> | <b>99.95</b> | <b>97.24</b>           | <b>100</b> | <b>98.85</b> | <b>1176</b>                |  |

<sup>†</sup> These frameworks were optimized for earlier-generation models such as GPT-3.5 Turbo and LLaMA-3 70B, and thus underperform compared to the latest SOTA proprietary models: such as Claude-4.5 Sonnet, Gemini-2.5 Pro, and GPT-5. <sup>‡</sup> Multi-agent Phase-1 (the core reasoning stage) requires at least 4 sequential LLM calls, inflating real-time token usage.

## 4.2 HeART’s Circuit Reasoning Performance

Table 1 provides a comparative overview of existing LLM-based methodologies for AMS design, summarizing their respective capabilities and limitations. To quantify reasoning accuracy, we define four evaluation metrics: (A1) class identification, (A2) subcircuit identification and KCL-compliant splitting, (A3) feedback loop and purpose recognition, and (A4) coverage of critical domain-specific terminology and design descriptors. A1 is evaluated as binary correctness, whereas A2–A4 are computed as coverage ratios reflecting the proportion of correctly satisfied elements; the per-query reasoning accuracy is computed as the average of A1–A4. Each query is evaluated over 10 independent trials. For each circuit, we record the reasoning accuracy and Pass@1 score. Circuit complexity tiers are defined based on device count: Simple (<20 devices), Medium (20–40 devices), and Hard (>40 devices). Table 2 reports the aggregated results and the corresponding real-time average token usage across our benchmark repository. Note: HeART’s consistently higher accuracies relative to all baselines stem from significantly stronger A2 and A3 scores—enabled by its analytically guided decomposition and hierarchical circuit-reasoning tree—while its two-phase setup further improves its real-time token efficiency.



Figure 4: Statistical summary of our curated dataset repository.

## 4.3 Integration within Optimization Loops

We integrate HeART into different classic optimization baseline algorithms to evaluate its context-aware decision-making ability,



Figure 5: Scenario 1: FoM and PCKRI vs. simulation count.

namely 1) UCB-Based Bayesian Optimization (BO), 2) Differential Evolution (DE), 3) the RL-inspired sizing algorithm DNN-Opt [7], and 4) the multi-agent RL-based sizing algorithm (MA-RL) [10, 13]. These baselines were selected to represent different categories of optimization approaches. In the experiments that follow, we limit HeART to a one-time invocation at initialization. We illustrate the FoM and PCKRI performance profiles using UCB-based BO. For a fair convergence comparison under consistent initialization conditions, we manually configure the standalone optimizers with scenario-specific starting points. Since HeART can autonomously decide when to reuse or disregard prior designs, this setup serves as a conservative evaluation of its adaptive capability.

**4.3.1 Scenario 1: Performance-Shift-Driven Scoped Sizing Optimization.** As shown in Fig. 6(a), the updated design objective requires a 10% reduction in frequency error for a supply-insensitive relaxation oscillator, while maintaining the same oscillation frequency and relaxing the IQ tradeoff, with all other key specifications remaining approximately unchanged. During inference-time query-conditioned traversal, HeART accurately identifies and scopes the comparator front-end for targeted refinement—reflected in the reasoning trace in Fig. 5—and applies the sizing optimizer while keeping the remainder of the circuit fixed. As shown, this targeted sizing optimization accelerates convergence and achieves an FoM@200 of  $\sim 1.93 \times$  the baseline, while also delivering a strong PCKRI improvement of  $\sim 3.6 \times$  over the standalone optimizer.



Figure 6: Schematics of (a) Relaxation oscillator and (b) Analog front-end circuit with their corresponding design scenarios.



Figure 7: Scenario 2 performance plots: (a) Impact of knowledge-guided scoping and adaptive topology reconfiguration on FoM convergence, (b) Knowledge retention (PCKRI) vs. number of simulations, and (c) Impact of HeaRT across various optimization algorithms.

Table 3: Impact of HeaRT across classic optimizer baselines.

| Optimizer      | Eval. Metric      | Baseline | HeaRT Optimized |              |
|----------------|-------------------|----------|-----------------|--------------|
|                |                   |          | Sizing          | Topo+ Sizing |
| BO             | FoM               | 240.21   | 413.78          | <b>555</b>   |
|                | #Sims to Converge | >200     | <b>37</b>       | <b>43</b>    |
|                | PCKRI             | 0.24     | 0.65            | 0.60         |
| DE             | FoM               | 230.21   | 381.82          | <b>551</b>   |
|                | #Sims to Converge | >200     | <b>63</b>       | <b>68</b>    |
|                | PCKRI             | 0.11     | 0.59            | 0.60         |
| DNN-Opt [7]    | FoM               | 269.73   | 467.93          | <b>556.6</b> |
|                | #Sims to Converge | >200     | <b>31</b>       | <b>38</b>    |
|                | PCKRI             | 0.33     | 0.71            | 0.60         |
| MA-RL [10, 13] | FoM               | 293.07   | 463.42          | <b>556.3</b> |
|                | #Sims to Converge | >200     | <b>28</b>       | <b>39</b>    |
|                | PCKRI             | 0.21     | 0.62            | 0.60         |

4.3.2 Scenario 2: Knowledge-Based, Performance-Driven Retrieval-Guided Topology Reconfiguration and Sizing. As illustrated in Fig. 6(b), this scenario targets lower input-referred noise (higher SNR) and reduced area under relaxed offset constraints in an analog front-end. As shown in the reasoning trace of Fig. 7, HeaRT correctly identifies the first gain stage as the dominant noise contributor and scopes and prioritizes this architectural section for modification. HeaRT's rank-based retrieval identifies an inverter-based architecture as the highest-ranked alternative and generates sized netlists for both designs, pushing beyond conventional sizing-only workflows and breaking architectural bottlenecks. Figs. 7(a)-(b) show the

resulting FoM and PCKRI performance profiles, while Fig. 7(c) and Table 3 summarizes the impact across different baseline optimizers. As shown, HeaRT achieves an FoM improvement of  $\geq 60\%$  through scoped sizing and a  $\geq 1.9\times$  gain through its retrieval-guided topology reconfiguration, with PCKRI  $\geq 0.59$  and significantly faster convergence than standalone baselines, highlighting its optimizer-agnostic capabilities.

## 5 CONCLUSION

In this paper, we propose HeaRT, a novel hierarchical circuit reasoning tree-based agentic framework for AMS design optimization. HeaRT combines a top-down circuit- and graph-guided decomposition with a bottom-up knowledge consolidation. It effectively handles practical design-adaptation scenarios involving both topology and sizing optimization, two critical steps in AMS design. It is also design-intent friendly, allowing custom natural-language instructions and constraints to guide its retrieval process. Our experimental results on a large set of AMS benchmarks are very promising. Future directions include enriching the reasoning tree with simulation metadata and real-time analysis, as well as exploring new applications from automated circuit understanding, such as layout-constraint extraction. We believe our HeaRT framework points an important and practical direction for advancing AMS design automation through context-aware decision-making across diverse optimization tasks.

## REFERENCES

- [1] Wenlong Lyu et al. Batch Bayesian optimization via multi-objective acquisition ensemble for automated analog circuit design. In *Proc. ICML*, 2018.
- [2] Shuhan Zhang, Fan Yang, et al. An Efficient Batch-Constrained Bayesian Optimization Approach for Analog Circuit Synthesis via Multiobjective Acquisition Ensemble. *IEEE TCAD*, 2022.
- [3] Xilu Wang et al. Recent Advances in Bayesian Optimization. *ACM Computing Surveys*, 55(13s):1–36, 2023.
- [4] Keertana Settaluri et al. AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs. In *Proc. DATE*, pages 490–495, 2020.
- [5] Hanrui Wang et al. GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning. In *Proc. DAC*, 2020.
- [6] Yaguang Li, Yishuang Lin, Meghna Madhusudan, et al. A Circuit Attention Network-Based Actor-Critic Learning Approach to Robust Analog Transistor Sizing. In *Proc. MLCAD*, 2021.
- [7] Ahmet F. Budak et al. DNN-Opt: An RL inspired optimization for analog circuit sizing using deep neural networks. In *Proc. DAC*, 2021.
- [8] Ahmet F Budak et al. APOSTLE: Asynchronously parallel optimization for sizing analog transistors using DNN learning. In *Proc. ASPDAC*, pages 70–75, 2023.
- [9] Minjeong Choi et al. Reinforcement Learning-based Analog Circuit Optimizer using gm/ID for Sizing. In *Proc. DAC*, 2023.
- [10] Jinxin Zhang et al. Automated Design of Complex Analog Circuits with Multiagent based Reinforcement Learning. In *Proc. DAC*, 2023.
- [11] Youngmin Oh et al. CRONuS: Circuit Rapid Optimization with Neural Simulator. In *Proc. DATE*, pages 1–6, 2024.
- [12] Handa Sun et al. EVDMARL: Efficient Value Decomposition-based Multi-Agent Reinforcement Learning with Domain-Randomization for Complex Analog Circuit Design Migration. In *Proc. DAC*, 2024.
- [13] Jiarui Bao et al. Multiagent Based Reinforcement Learning(MA-RL): An Automated Designer for Complex Analog Circuits. *IEEE TCAD*, 2024.
- [14] Mohsen Ahmadzadeh, Kaichang Chen, et al. (Invited Paper) AnaFlow: LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing. In *Proc. ICCAD*, 2025.
- [15] Zihao Chen et al. Artisan: Automated Operational Amplifier Design via Domain-specific Large Language Model. In *Proc. DAC*, 2024.
- [16] Yuxuan Yin, Yu Wang, Boxun Xu, and Peng Li. ADO-LLM: Analog Design Bayesian Optimization with In-Context Learning of Large Language Models. In *Proc. ICCAD*, 2024.
- [17] Yao Lai et al. AnalogCoder: Analog Circuit Design via Training-Free Code Generation. In *Proc. AAAI*, 2025.
- [18] Jinyi Shen, Zihao Chen, Ji Zhuang, et al. Atelier: An Automated Analog Circuit Design Framework via Multiple Large Language Model-Based Agents. *IEEE TCAD*, 2025.
- [19] Haoyi Zhang et al. AnalogXpert: Automating Analog Topology Synthesis by Incorporating Circuit Design Expertise into Large Language Models. In *Proc. ISEDA*, 2025.
- [20] Jian Gao et al. AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies. In *Proc. ICLR*, 2025.
- [21] Jian Gao, Weidong Cao, and Xuan Zhang. AnalogGenie-Lite: Enhancing Scalability and Precision in Circuit Topology Discovery through Lightweight Graph Modeling. In *Proc. ICML*, 2025.
- [22] Dimple Vijay Kochar et al. LEDRO: LLM-Enhanced Design Space Reduction and Optimization for Analog Circuits. In *Proc. ICLAD*, 2025.
- [23] Haohang Xu, Chengjie Liu, Qihang Wang, et al. Image2Net: Datasets, Benchmark and Hybrid Framework to Convert Analog Circuit Diagrams into Netlists. In *Proc. ISEDA*, 2025.
- [24] Weiyu Chen et al. AnalogTester: A Large Language Model-Based Framework for Automatic Testbench Generation in Analog Circuit Design. In *Proc. ISEDA*, 2025.
- [25] James Kirkpatrick et al. Overcoming catastrophic forgetting in neural networks. *Proceedings of the national academy of sciences*, 2017.
- [26] Ahmet F Budak et al. Joint Optimization of Sizing and Layout for AMS Designs: Challenges and Opportunities. In *Proc. ISPD*, 2023.
- [27] Ahmet F Budak et al. Practical Layout-Aware Analog/Mixed-Signal Design Automation with Bayesian Neural Networks. In *Proc. ICCAD*, 2023.
- [28] Wei Shi et al. RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL. In *Proc. MLCAD*, 2022.
- [29] Jian Gao et al. RoSE: Robust Analog Circuit Parameter Optimization with Sampling-Efficient Reinforcement Learning. In *Proc. DAC*, 2023.
- [30] Weidong Cao et al. RoSE-Opt: Robust and Efficient Analog Circuit Parameter Optimization With Knowledge-Infused Reinforcement Learning. *IEEE TCAD*, 2024.
- [31] Chengjie Liu and Jun Yang. IC + AI: a no-human-in-loop design paradigm? *Science China Information Sciences*, 2025.
- [32] Jason Wei et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. *Proc. NeurIPS*, 35:24824–24837, 2022.
- [33] Lei Wang et al. A survey on large language model based autonomous agents. *Frontiers of Computer Science*, 2024.
- [34] Caigao Jiang et al. LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch. *arXiv preprint arXiv:2410.13213*, 2024.
- [35] Zhiheng Xi et al. The rise and potential of large language model based agents: A survey. *Science China Information Sciences*, 2025.
- [36] Junde Wu, Jiayuan Zhu, Yuyuan Liu, et al. Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools. *arXiv preprint arXiv:2502.04644*, 2025.
- [37] Fengli Xu, Qianyue Hao, Zefang Zong, et al. Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models. *arXiv preprint arXiv:2501.09686*, 2025.
- [38] Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, et al. Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models. *arXiv preprint arXiv:2503.09567*, 2025.
- [39] Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Zhang, et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. *arXiv preprint arXiv:2501.12948*, 2025.
- [40] Philipp Schmid. The New Skill in AI is Not Prompting, It's Context Engineering, 2025.
- [41] Lei Huang et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. *ACM Transactions on Information Systems*, 2025.
- [42] Kaiyan Chang, Kun Wang, Nan Yang, et al. Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework. In *Proc. DAC*, pages 1–6, 2024.
- [43] Qiufeng Li, Shu Hong, Jian Gao, Xuan Zhang, et al. AnalogFed: Federated Discovery of Analog Circuit Topologies with Generative AI. *arXiv preprint arXiv:2507.15104*, 2025.
- [44] Ryoga Matsuo et al. Schematico – An LLM for Netlist-to-Schematic Conversion. In *Proc. MLCAD*, 2025.
- [45] Alex Wilf et al. Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities. *arXiv preprint arXiv:2311.10227*, 2023.
- [46] Sanghyun Park et al. Thinking with Many Minds: Using Large Language Models for Multi-Perspective Problem-Solving. *arXiv preprint arXiv:2501.02348*, 2025.
- [47] Yunyao Zhang et al. From Ambiguity to Verdict: A Semiotic-Grounded Multi-Perspective Agent for LLM Logical Reasoning. *arXiv preprint arXiv:2509.24765*, 2025.