

# Mixed-Signal Edge Accelerator for Real-Time mmWave Platform Vibration Compensation

NIKHIL POOLE

JUNE 18, 2021

# Agenda

- Chip Overview
  - Bird's Eye View
  - Introduction
  - Chip Architecture
- Analog Front-End: Designs and Layouts
- Digital Processing Engine: Designs and Layouts
- Complete Mixed-Signal Chip

# Agenda

- **Chip Overview**
  - Bird's Eye View
  - Introduction
  - Chip Architecture
- Analog Front-End: Designs and Layouts
- Digital Processing Engine: Designs and Layouts
- Complete Mixed-Signal Chip

# Chip Overview

## Bird's Eye View



# Active mmWave Edge Perception: Real-Time Motion Compensation

## ❖ Motivation

- Inherent energy-performance tradeoff has plagued traditional edge perception/post-processing pipelines.
- mmWave sensors are often non-stationary, impeding sensing resolution/capacity (e.g. wearable devices).
- Objective: custom mmWave IC for **real-time, high-resolution active sensing on the edge**, providing 100+ TOPs performance + sub-Watt power consumption.

## ❖ Approach

- On-chip inertial sensor fusion for non-blind vibratory motion correction/signal deconvolution.
- Operate at the radar frame level (50-100 ms time windows of data) for **real-time correction**.



## ❖ Proposed Processing Pipeline



## ❖ Early Correction Results



# Agenda

- **Chip Overview**
  - Bird's Eye View
  - Introduction
  - Chip Architecture
- Analog Front-End: Designs and Layouts
- Digital Processing Engine: Designs and Layouts
- Complete Mixed-Signal Chip

# Chip Overview

## Introduction



# Problem/Motivation

- Recent years have witnessed several simultaneous advances in the realm of sensing and perception.
  - Design of capable, accurate, and dynamically configurable CMOS sensors.
  - Optimization of powerful edge processing engines.
  - Development of increasingly generalizable machine learning models.
- Up to now, however, energy-performance tradeoff has plagued edge perception attempts.
  - Tensor core GPUs and modern autonomous vehicles require 100s of W of power for their 10-100 TOPs performance - this power consumption is not feasible for wearable, battery-powered devices.
  - Exacerbated for 3D sensors (radar, LIDAR, etc.).
  - Memory access cost: edge processing of large amounts of data requires extensive memory use, which places a strict lower bound on energy consumption.
- Cannot offload all data to the cloud or central processing core.
  - Data transfer bottleneck + latency.
  - Want real-time processing, response, and adaptation.
- Goal: real-time edge sensing providing 100+ TOPs performance with sub-Watt power consumption.**



# Real-Time mmWave Platform Motion Filtering

- Sensors are often non-stationary and thus, the radar platform is subject to arbitrary or deterministic motion which degrades its detection/tracking performance.
  - E.g. head-mount systems or wearable devices in motion capture applications.
  - How do we account for the effect of such motion on the RX signal returns ***in real time*** to generate a clean, deconvolved image of the environment?
- Simple example: vibration on the radar platform, consisting of a superposition of fundamental tones.
- **Sensor fusion for non-blind vibration correction/signal deconvolution.**
  - Active filtering and low-frequency denoising using simultaneous data from a high-resolution IMU to improve the radar/sensor response.
- Use an off-the-shelf TI AWR1843 mmWave FMCW radar sensor IC, with a DCA1000EVM used to extract the raw ADC LVDS data on each RX channel.



# Real-Time Motion Correction Concept



# Effect of Platform Vibration on Radar Returns

- The frequency spectrum exhibits pairs of  $f_v$ -Hz separated spectral lines given by the Bessel function expansion of the downconverted IF signal.



# Proposed Real-Time Frame-Level Correction Algorithm



# Agenda

- **Chip Overview**
  - Bird's Eye View
  - Introduction
  - **Chip Architecture**
- Analog Front-End: Designs and Layouts
- Digital Processing Engine: Designs and Layouts
- Complete Mixed-Signal Chip

# Chip Overview

## Chip Architecture



# Eventual Complete Chip Architecture



# IMU Processing Pipeline (Analog Front-End)

- Active biquad low-pass filter
  - Remove DC component from measured accelerations + high-frequency noise (assumes correct sensor orientation but eliminates need for an on-chip AHRS filter).
  - Second-order cascaded  $G_m$ -C filter; upper corner frequency tuned via an off-chip analog bias input.
- Amplifier
  - Programmable gain (controlled by two digital inputs).
  - Single-ended to differential converter.
- Frequency detection PLL
  - Continuous-time comparator at input.
  - Lock on to the primary vibration tone and generate an LPF DC output voltage proportional to the detected frequency.
  - Does not require fast settling.
- Envelope detector
  - Record the acceleration amplitude for Q-factor estimation.
- Sample/hold + ADCs
  - Digitize detected frequency and amplitude for IIR deconvolution filter construction.
  - For a frame-level processing window of 100 ms, sampling rate only needs to be 10 Hz.
- Multi-notch IIR deconvolution filter
  - Notch locations and Q-factor determined via detected parameters.



# Radar Data Processing Pipeline (DSP Engine)

- Store a window of 5-8 frames' worth of data for a single range bin and a single RX channel.
  - 255 chirps/frame, 16-bit I/Q data → 8 KB SRAM needed.
  - For the ideal radar accelerator, we would operate on all 256-512 range bins in parallel (requires at least a 2 MB SRAM), which would be possible if integrated on the same chip as the RF front end.
    - However, we are limited by I/O's, and a serialized input would require at least a 13 GB/sec high-speed serial link.
  - For this simple demonstration, therefore, we can just do one range bin at a time.
- Extract phase and apply an FIR HPF to eliminate the DC component.
- Complex FFT along slow-time (Doppler) dimension.
- Point-wise multiplication with deconvolution kernel.
- Complex IFFT to retrieve time-domain phasors.
- Unwrapped phase subtraction to restore Doppler velocity component.
- Serialized output: for a single range bin, this yields ~600 Mbps output serial link (doable with Skywater I/O specifications).



# Power Regulation + Timing

- Clock generation circuitry
  - Low-frequency oscillator.
  - Divided down for IMU sampling.
  - Frequency multiplier PLL for radar data collection.
- Power regulation circuitry
  - 3.3 V LDO
    - Initial IMU filtering and normalization
    - Raw ADC data input from radar.
  - 1.8 V bandgap-based reference
    - ADC reference voltage



# Chip Architecture for Current Tapeout

- Generate only the deconvolution kernel frequency response vector to be applied to the radar data off-chip.



# Agenda

- Chip Overview
  - Bird's Eye View
  - Introduction
  - Chip Architecture
- **Analog Front-End: Designs and Layouts**
- Digital Processing Engine: Designs and Layouts
- Complete Mixed-Signal Chip

# Analog Front-End Designs and Layouts



# Chip Architecture

Requires a high-gain fully differential OTA and a low-gain transconductor.



# Core Analog Circuit

## Fully Differential OTA



# Fully Differential OTA: Design

- Design goals:
  - High output impedance (driving primarily fF-range capacitive loads).
  - Wide input common-mode range spanning the maximum expected signal amplitude (at least 0.7 V - 1.5 V).
  - Open-loop gain: ~60 dB to achieve close to ideal behavior.
    - Consistent gain across input signal range.
    - Static closed-loop gain error: 0.1%.
  - Bias current: 10  $\mu$ A with  $g_m/I_D \sim 10$  S/A.
  - Bandwidth: not critical → aim for at least 5 kHz.
  - Unity-gain feedback phase margin: > 70 degrees.
  - Noise: not critical → aim for less than 10 mV<sub>rms</sub> total integrated output noise in circuit bandwidth.
  - Speed: not critical, since working at low frequencies.
    - 0.1% settling time: < 1  $\mu$ s
  - Robust across process (SS, FF, TT), voltage ( $V_{DD} \pm 10\%$ ), and temperature (-40C - 125C).

# Fully Differential OTA: Design

Transconductor Core



# Fully Differential OTA: Design

## Bias Circuitry



# Fully Differential OTA: Design

- Fully differential design with common-mode feedback.
  - Output common mode selectable as an input, intended to be used from 0.7 V to 1.3 V.
  - CMF provides half the tail bias current.
- Folded cascode design to maximize input common-mode range.
- Load compensation used to attain desired phase margin (2 pF load → negligible area).
- Bias network consists of a series of Sooch cascode current mirrors.
- Transistor type and unit width/length matching.
- Capacitive neutralization has little effect (input  $C_{gd}$  does not contribute a significant pole in this case) → not included in final design.
- Long-channel lengths ( $4.8 \mu\text{m}$ ) to maximize gain at expense of speed/bandwidth.
- Bias current reference ( $i_{biasn}$ ):  $10 \mu\text{A}$ .
- Low  $V_T$  transistors used on the input and CMF differential pairs to achieve the desired input common-mode range.
- Bias network area estimate:  $1300 \mu\text{m}^2$ .
- Transconductor core area estimate:  $2900 \mu\text{m}^2$ .



# Fully Differential OTA: AC Response



# Fully Differential OTA: Input Common-Mode Sweep



# Fully Differential OTA: Temperature Sweep



# Fully Differential OTA: $V_{DD}$ Sweep

- Open-loop gain reduction at lower supply voltages due to cascode design.



# Fully Differential OTA: Noise Analysis

- Frequency range of interest: 1 Hz - 10 MHz (bandwidth does not exceed this).
- Total integrated input-referred noise:  $147.1 \mu\text{V}_{\text{rms}}$ .
- Total integrated output noise:  $14.8 \text{ mV}_{\text{rms}}$ .
- Maximum peak-to-peak output signal of 1.2 V yields a peak SNR (dynamic range) of 35.2 dB.



# Fully Differential OTA: Transient Response

- 0.1% settling time:  $0.573 \mu\text{s}$ .
  - For a small-signal step.
- Closed-loop gain static error: 0.25%.
- Testbench circuit shown below.



# Fully Differential OTA: Summary

- All specifications listed at an input/output common-mode voltage of 1.1 V.

| Specification                  | Design Target | Achieved      |
|--------------------------------|---------------|---------------|
| Open-loop gain                 | 60 dB         | ~61 dB        |
| Static closed-loop gain error  | 0.1%          | 0.25%         |
| 0.1% settling time             | < 1 $\mu$ s   | 0.573 $\mu$ s |
| Output current                 | 10 $\mu$ A    | 10 $\mu$ A    |
| Input common-mode range        | 0.7 V - 1.5 V | 0.6 V - 1.8 V |
| Bandwidth                      | 5 kHz         | ~8.0 kHz      |
| Total integrated output noise. | 10 mV         | 14.8 mV       |
| Phase margin                   | > 70 degrees  | ~74.5 degrees |
| PVT Robustness                 | ✓             | ✓             |

# Fully Differential OTA: Layout

- Common-centroid and interdigitation techniques used for all transistors and compensation capacitors with proper shielding through the use of dummy devices.
- RC parasitic extraction post-simulation results match pre-layout specifications almost exactly.

NMOS input differential pair:  
(unit transistor: 3/1.2)

-ABBAABAA-  
-BAABBAAB-  
-BAABBAAB-  
-ABBAABAA-

PMOS cascode with bias circuitry (top):  
(unit transistor: 3/4.8)

-ADDEEAA|BEEEDDB-  
-CEEDDC|CCDDEEC-  
-BDDEEBB|AAEEDDA-

PMOS cascode with bias circuitry (bottom):  
(unit transistor: 3/4.8)

-AAA-----BBB-  
-EEE---CCC---DDD-  
-DDD---CCC---EEE-  
-BBB-----AAA-

NMOS cascode with bias circuitry:  
(unit transistor: 3/4.8)

KKKKKKKKK | KKKKKKKKK  
KKL---JJHG | GGJJJ---KKK  
-HHGGHHJJ | JJHGGGHH-  
-DAAACFFFF | FFCBBBEEBM  
-FCCCBEEEE | EEBAAADDA-  
-CFFCCCAD | DDDDABBEE-  
-EBBBADD | DDACCCFFC-  
-ADAAAABEE | EEEEBCCCF-  
-BEEBBBCF | FFFFCAAAD-  
-HHGGHHJJ | JJHGGGHH-  
KKK---JJJGG | GHJJ---LKK  
KKKKKKKKK | KKKKKKKKK

Common-mode feedback NMOS differential pairs:  
(unit transistor: 1/0.3)

-ABBADCCD-  
-DCCDABBA-

Common-mode feedback PMOS active loads:  
(unit transistor: 2/0.8)

-CDDCCDDC-  
-ABBAABAA-  
-BAABBAAB-  
-DCCDDCDC-

NMOS current mirror biasing:  
(unit transistor: 3/4.8)

-CDDC-  
-ABBA-  
-BAAB-  
-DCCD-

NMOS bias diode-connected device:  
(unit transistor: 3/4.8)

-AAAAAAA-  
-AAAAAAA-  
-AAAAAAA-  
-AAAAAAA-  
-AAAAAAA-  
-AAAAAAA-

Compensation capacitors:  
(unit capacitor: 10x10)

-----  
-ABAB-  
-BABA-  
-ABAB-  
-BABA-  
-----



# Fully Differential OTA: Post-RC-Extraction AC Response



# Fully Differential OTA: Post-RC-Extraction Input Common-Mode Sweep



# Core Analog Circuit

## $G_m$ -C Filter Transconductor Stage



# $G_m$ -C Filter Transconductor Stage: Design

- Design goals:
  - Wide input common-mode range spanning the maximum expected signal amplitude (at least 0.7 V - 1.5 V).
  - Does not need high open-loop gain since the transconductor will be used as an integrator stage in the  $G_m$ -C filter.
    - However, need consistent gain across input signal range.
  - Off-chip capacitors will be used, so do not need to operate at unreasonably low currents.
    - Nevertheless, want low  $G_m$ : to achieve a low cutoff frequency for the filter.
  - Bandwidth: not critical → aim for at least 1 kHz unloaded bandwidth to allow for further tuning of lower corner in actual filter.
  - Noise: not critical → aim for less than 10 mV<sub>rms</sub> total integrated output noise in circuit bandwidth.
  - Speed: not critical, since working at low frequencies.
  - Robust across process (SS, FF, TT), voltage ( $V_{DD} \pm 10\%$ ), and temperature (-40C - 125C).

# $G_m$ -C Filter Transconductor Stage: Design



# $G_m$ -C Filter Transconductor Stage: Design

- Fully differential design with common-mode feedback.
  - Output common mode selectable as an input, intended to be used from 0.7 V to 1.3 V.
  - CMF provides half of the tail bias current.
- Long-channel ( $4 \mu\text{m}$ ) current mirror bias transistors for high output impedance.
- All channel lengths  $\geq 1 \mu\text{m}$  to avoid short channel effects (bandwidth/speed not critical).



- Bias current reference:  $10 \mu\text{A}$ .
- Normal  $V_T$  transistors used everywhere to achieve consistent performance across the desired input common-mode range.
- Bias current generated from central biasing hub.
- Approximate area estimate:  $80 \mu\text{m}^2$ .

# $G_m$ -C Filter Transconductor Stage: AC Response



# $G_m$ -C Filter Transconductor Stage: Input Common-Mode Sweep



# $G_m$ -C Filter Transconductor Stage: Temperature Sweep



# $G_m$ -C Filter Transconductor Stage: $V_{DD}$ Sweep



# $G_m$ -C Filter Transconductor Stage: Noise Analysis

- Frequency range of interest: 1 Hz - 100 kHz (bandwidth does not exceed this).
- Total integrated input-referred noise:  $443.1 \mu\text{V}_{\text{rms}}$ .
- Total integrated output noise:  $11.9 \text{ mV}_{\text{rms}}$ .
- Maximum peak-to-peak output signal of 1.2 V yields a peak SNR (dynamic range) of 37.1 dB.



# $G_m$ -C Filter Transconductor Stage: Summary

- All specifications listed at an input common-mode voltage of 1.1 V and an output common-mode voltage of 1.1 V.

| Specification                                  | Design Target | Achieved      |
|------------------------------------------------|---------------|---------------|
| Open-loop gain                                 | Not critical  | ~42 dB        |
| Output current                                 | ~1 nA         | 1 nA          |
| Input common-mode range                        | 0.7 V - 1.5 V | 0.4 V - 1.5 V |
| Unloaded bandwidth                             | 1 kHz         | ~2.0 kHz      |
| Total integrated output noise (up to 100 kHz). | 10 mV         | 11.9 mV       |
| Phase margin                                   | > 70 degrees  | ~76 degrees   |
| PVT Robustness                                 | ✓             | ✓             |

# $G_m$ -C Filter Transconductor Stage: Layout

- Common-centroid and interdigititation techniques used for all transistors and compensation capacitors.
- Design changed slightly to increase the operating current (improves mismatch) at the expense of gain (not critical).
- RC parasitic extraction post-simulation results match pre-layout specifications almost exactly.

NMOS differential pairs (input pair + both common mode feedback pairs):  
(unit transistor: 2/1)

-ABBAABBA-

NMOS current mirror biasing:  
(unit transistor: 1/4)

-CGGB--GGGA-  
-DGGG--EGGF-

PMOS loads/current mirror biasing:  
(unit transistor: 1/1)

-AB-C--DE--FG-



# $G_m$ -C Filter Transconductor Stage: Post-RC-Extraction AC Response



# Fully Differential OTA: Post-RC-Extraction Input Common-Mode Sweep



# Analog Processing Block

## Input SE-Differential Converter/Amplifier



# Input SE-Differential Converter/Amplifier: Design



# Input SE-Differential Converter/Amplifier: Design

- Single-ended input from off-chip analog IMU.
  - 0 V - 1.8 V output from the accelerometer.
  - Off-chip AC coupling/high-pass filter and input common-mode bias reference voltage.
  - Output common-mode voltage reference also an off-chip input.
- Gain-of-1 input stage translates the single-ended input into a differential signal to minimize noise/offsets during filtering.
- Maximum input amplitude into the chip: 0.5 V.
- Second stage is a variable-gain amplifier/attenuator.
  - Two digital inputs select between a gain of 0.5, 1, 2, or 4.
- Transmission gates also provide reset capability.
- Total area estimate:  $8000 \mu\text{m}^2$ .



Transmission  
gate



# Input SE-Differential Converter/Amplifier: Transient Response

- Source and feedback capacitor switches activated in a gain-of-2 configuration.



# Input SE-Differential Converter/Amplifier: Layout



# Analog Processing Block

## Active Low-pass $G_m$ -C Biquad Filter



# Active Low-Pass $G_m$ -C Biquad Filter: Design



# Active Low-Pass $G_m$ -C Biquad Filter: Design

- Input vibration frequency range: 10 Hz - 50 Hz
- Goal: filter out high-frequency noise in the 500 Hz - kHz range.
  - Final design uses higher operating currents ( $10 \mu\text{A}$ ) to minimize mismatch and effect of noise, with large off-chip timing capacitors.
  - Original design used low operating currents in the individual transconductor stages ( $\ll 1 \mu\text{A}$ ) and on-chip timing capacitors.
- Want a second-order response for increased roll-off.
- Total area estimate:  $16000 \mu\text{m}^2$ .



# Active Low-Pass $G_m$ -C Biquad Filter: AC Response

- Achieves cutoff frequency of  $\sim 600$  Hz, as desired.



# Active Low-Pass $G_m$ -C Biquad Filter: Layout



# Chip Architecture

Continuous-time comparator to convert varying amplitude signal into square wave (with jitter).



# Core Analog Circuit

## Continuous-Time Comparator



# Continuous-Time Comparator: Design



# Continuous-Time Comparator: Design

- Goal: convert the vibration signal into a square wave that can be passed into the PLL frequency detector.
- Design targets:
  - Relatively low input offset voltage: this will depend on noise as well, but we do not need this to be precise ( $< 1 \text{ mV}$ ).
  - Reasonable response time (not critical given the input frequency).
- Differential input, single-ended output with a bias current of  $10 \mu\text{A}$ .
  - Limiting the operating current saves power.
- Uses internal hysteresis to avoid bouncing.
- Output buffer to increase the gain.
- Bias current from same source as OTAs.
- Total area estimate:  $40 \mu\text{m}^2$ .



# Continuous-Time Comparator: AC Response



# Continuous-Time Comparator: Transient Response



**Response time: ~20 ns.**



# Continuous-Time Comparator: DC Response



# Continuous-Time Comparator: Summary

| Specification         | Design Target | Achieved   |
|-----------------------|---------------|------------|
| Open-loop gain        | Very high     | ~100 dB    |
| Input offset voltage  | < 1 mV        | ~5 $\mu$ V |
| Bias current          | 10 $\mu$ A    | 10 $\mu$ A |
| Response time         | < 1 $\mu$ s   | ~20 ns     |
| 10-90% rise/fall time | < 1 $\mu$ s   | ~1 ns      |

# Continuous-Time Comparator: Layout

NMOS input differential pair:  
(unit transistor: 1/1)

-ABBAABBA-

NMOS current mirror biasing:  
(unit transistor: 1/4)

-AB|BA-  
-BA|AB-

NMOS differential-to-single-ended conversion mirror:  
(unit transistor: 1/1)

-BAAB-

PMOS latch:  
(unit transistor: 1/1)

-CABD|CABD-  
-DBAC|DBAC-

PMOS output stage:  
(unit transistor: 1/1)

-BA|AB-  
-AB|BA-

Output inverter: single PMOS and NMOS.



# Chip Architecture

Requires a differential input, single-ended output OTA.



# Core Analog Circuit

## Single-Ended Output OTA



# Single-Ended Output OTA: Design

- Design goals:
  - High output impedance (driving primarily fF-range capacitive loads).
  - Wide input common-mode range spanning the maximum expected single-ended input signal amplitude (at least 0.7 V - 1.2 V).
  - Open-loop gain: ~60 dB to achieve close to ideal behavior.
    - Static closed-loop gain error: 0.1%.
  - Bias current:  $10 \mu\text{A}$  with  $g_m/I_D \sim 10 \text{ S/A}$ .
  - Bandwidth: not critical → aim for at least 1 kHz.
  - Unity-gain feedback phase margin: > 70 degrees.
  - Noise: not critical → aim for less than  $10 \text{ mV}_{\text{rms}}$  total integrated output noise in circuit bandwidth.
  - Speed: not critical, since working at low frequencies.
    - 0.1% settling time:  $< 1 \mu\text{s}$
  - Robust across process (SS, FF, TT), voltage ( $V_{DD} \pm 10\%$ ), and temperature (-40C - 125C).

# Single-Ended Output OTA: Design



# Single-Ended Output OTA: Design

- Differential input, single-ended output.
- Folded cascode design to maximize input common-mode range.
  - Input common-mode range also enhanced via complementary input differential pairs.
- Load compensation used to attain desired phase margin (5 pF load → reasonable area).
  - Layout uses stacked MiM capacitors on the metal 3, metal 4, and metal 5 layers to double the capacitance per unit area ( $2 \text{ fF}/\mu\text{m}^2$ ).
- Bias network consists of a series of Sooch cascode current mirrors.
- Transistor type and unit width/length matching.
- Long-channel lengths ( $4.8 \mu\text{m}$ ) to maximize gain at expense of speed/bandwidth.
- Bias current reference:  $10 \mu\text{A}$ .
- Low  $V_T$  transistors used on the upper PMOS bias transistors and differential pair to achieve the desired input common-mode range.
- Bias current generated from central biasing hub.
- Bias network area estimate:  $1700 \mu\text{m}^2$ .
- Transconductor core area estimate:  $2500 \mu\text{m}^2$ .



# Single-Ended Output OTA: AC Response



# Single-Ended Output OTA: Input Common-Mode Sweep



# Single-Ended Output OTA: Temperature Sweep



# Single-Ended Output OTA: $V_{DD}$ Sweep



# Single-Ended Output OTA: Noise Analysis

- Frequency range of interest: 1 Hz - 10 MHz (bandwidth does not exceed this).
- Total integrated input-referred noise:  $538.4 \mu\text{V}_{\text{rms}}$ .
- Total integrated output noise:  $15.6 \text{ mV}_{\text{rms}}$ .
- Maximum peak-to-peak output signal of 1.2 V yields a peak SNR (dynamic range) of 34.7 dB.



# Single-Ended Output OTA: Transient Response

- 0.1% settling time:  $1.852 \mu\text{s}$ .
  - For a small-signal step.
- Closed-loop gain static error: 1.60%.



# Single-Ended Output OTA: Summary

- All specifications listed for the input common-mode voltage range of 0.7 V - 1.4 V.

| Specification                  | Design Target | Achieved      |
|--------------------------------|---------------|---------------|
| Open-loop gain                 | 60 dB         | ~60 dB        |
| Static closed-loop gain error  | 1.00%         | 1.60%         |
| 0.1% settling time             | < 10 $\mu$ s  | 1.852 $\mu$ s |
| Bias current                   | 10 $\mu$ A    | 10 $\mu$ A    |
| Input common-mode range        | 0.7 V - 1.4 V | 0.7 V - 1.4 V |
| Bandwidth                      | 1 kHz         | ~1.5 kHz      |
| Total integrated output noise. | 10 mV         | 15.6 mV       |
| Phase margin                   | > 60 degrees  | >60 degrees   |
| PVT Robustness                 | ✓             | ✓             |

# Single-Ended Output OTA Post-RC-Extraction: AC Response



# Single-Ended Output OTA Post-RC-Extraction: Input Common-Mode Sweep



# Single-Ended Output OTA: Layout

- Stacked MiM cap used for large compensation capacitance (5 pF) with proper shielding via dummies.

NMOS input differential pair:  
(unit transistor: 3/4.8)

-ABBAABAA-  
-BAABBAABB-  
-BAABBAABB-  
-ABBAABAA-

PMOS input differential pair:  
(unit transistor: 3/4.8)

-AB-  
-BA-  
-BA-  
-AB-

PMOS cascode biasing (top):  
(unit transistor: 3/4.8)

-ADD--AA|BB--DDB-  
-C--DDDC|CCDD--C-  
-BDD--BB|AA--DDA-

PMOS cascode circuitry (top):  
(unit transistor: 3/4.8)

-BAABBA|ABBAAB-  
-ABBAAB|BAABBA-

PMOS cascode with bias circuitry (bottom):  
(unit transistor: 3/4.8)

-AAA---E---BBB-  
-EEE---CCC---DDD-  
-DDD---CCC---EEE-  
-BBB---D---AAA-

NMOS cascode with bias circuitry:  
(unit transistor: 3/4.8)

-GGGCCBBGG|GGBBCGGC-  
-BCCGGGGBB|BGGGGCCG-  
HDAACFFFF|FFCBBBEEBJ  
-FCCCBEEEE|EEBAAADDA-  
-CFCCCCADD|DDDDABBBE-  
-EBBADDDD|DDACCCFFC-  
-ADAAABEE|EEEEEBCCF-  
-BEEBBBCFF|FFFFCAAADH  
-GCCGGGGBB|BGGGGCCB-  
-CGGCCBBGG|GGBBCGGG-

NMOS current mirror biasing:  
(unit transistor: 3/4.8)

-ABBAABBA-

NMOS bias diode-connected device:  
(unit transistor: 3/4.8)

-AAAAAAA-  
-AAAAAAA-  
-AAAAAAA-  
-AAAAAAA-  
-AAAAAAA-  
-AAAAAAA-  
-AAAAAAA-  
-AAAAAAA-

Compensation capacitor:  
(unit capacitor: 24x26)

--  
-AA-  
-AA-  
--



# Chip Architecture

Detect vibration frequency from input square wave.



# Analog Processing Block

## Low-Frequency Phase-Locked Loop



# Low-Frequency PLL: Design

- Input vibration frequency range: 10 Hz - 50 Hz.
- PLL purpose: estimate the frequency of the vibration signal for later digitization and incorporation into the deconvolution kernel estimator.
- Design goals:
  - Need to be able to lock on to a frequency in this range → challenging in a CMOS IC given the necessity for a low loop filter time constant.
    - Also challenging to verify given long transient simulation times!
  - We expect very infrequent large steps in the vibration frequency, so the settling time of the PLL loop in response to large steps does not need to be fast (about a second is sufficient).
  - Primary purpose is to eliminate any jitter in the incoming frequency waveform to be able to accurately estimate the frequency of the vibration.
    - Need to track only low-frequency changes in signal frequency. If we can verify that the loop settles in response to large steps, then this target goal is satisfied.

# Low-Frequency PLL: Design

## Ring Oscillator



## Frequency Divider



Current-Starved Inverter



Phase-Frequency Detector / Charge Pump /  
Low-Pass Filter (PFD-CP-LPF)

Stanford University

# Current-Starved Ring Oscillator: Initial Design

- Uses a current-starved inverter as the primary stage.
  - Oscillation frequency is thus easily controlled by the input control voltage, which biases both the upper PMOS and lower NMOS transistors.
- Output inverters increase the rise/fall time of the output waveform.



# Current-Starved Ring Oscillator Initial Design: Linearity

- Original design used longer channel lengths for the inverters with lower currents.
- Fewer frequency divider stages needed.
- Linear range: 0.7 V - 1.0 V.



# Current-Starved Ring Oscillator: Final Design

- Final design increases the current to minimize mismatch (higher oscillation frequency) at the expense of more frequency divider stages in the PLL feedback loop.
- pF-magnitude MiM capacitor included after each stage to lower the oscillator frequency.
- Transistors stacked to achieve long channel lengths (unit transistor length of  $2 \mu\text{m}$ ).



# Current-Starved Ring Oscillator Stage: Layout

- Final design increases the current to minimize mismatch (higher oscillation frequency) at the expense of more frequency divider stages in the PLL feedback loop, with a pF-magnitude MiM capacitor included after each stage to lower the oscillator frequency.
- Transistors stacked to achieve long channel lengths (unit transistor length of  $2 \mu\text{m}$ ).
- Linear range: 0.7 V - 1.1 V.



# Current-Starved Ring Oscillator: Layout



# Frequency Divider: Design



# Frequency Divider: Transient Analysis



# Frequency Divider: Layout



# Phase-Frequency Detector / Charge Pump / Low-Pass Filter (PFD-CP-LPF): Initial Design

- Charge pump controlled via the output signals from the PFD.
- Pump current of 10 nA and a synthesized large capacitance ensures a low loop time constant.
- Includes a capacitance multiplier for the loop filter to synthesize the large capacitance, as well as series resistor compensation to improve the phase margin.
  - Given a resistor ratio of  $K$  the total capacitance seen at the negative terminal is  $K^*C_2$ .
  - Design uses  $K = 10^5$ .
- Reduces leakage via a transmission gate between capacitance multiplier terminals.
- Charge injection minimized since output is taken at op-amp negative terminal branch.
- Switched diodes mitigate leakage through the charge pump switches.



# Phase-Frequency Detector / Charge Pump / Low-Pass Filter (PFD-CP-LPF): Initial Design Transient Response



# Phase-Frequency Detector / Charge Pump / Low-Pass Filter (PFD-CP-LPF): Final Design

- Final design eliminates the capacitance multiplier, requiring a larger timing capacitance for the charge pump ( $nF$  range).
- Final design also increases the bias current to minimize the effect of mismatch.
- The series resistor provides a compensation LHP zero to improve stability, while the parallel capacitor (one-tenth the size of the charge pump capacitor, not large enough to create a low-frequency pole detrimental to stability) damps out the control voltage spikes that occur each time a current is injected into the loop filter (charge injection and clock feedthrough from the charge pump switches).
- Entire RC compensation LPF network will be placed off-chip for ease of tuning (and chip area reduction).



# Phase-Frequency Detector / Charge Pump / Low-Pass Filter (PFD-CP-LPF): Layout

NMOS current switch: discrete transistor.

NMOS anti-leakage diodes:  
(unit transistor: 1/1)

-A--B-

PMOS current switch: discrete transistor.

PMOS anti-leakage diodes:  
(unit transistor: 2/1)

-A--B-

PMOS current mirror:  
(unit transistor: 1/4)

-ABBAABBA-  
-BAABBAAB-

NMOS current mirror:  
(unit transistor: 1/4)

-CCCA-CC-  
-CC-BCCC-



# Low-Frequency PLL: Initial Design

## Ring Oscillator



Current-Starved Inverter

## Frequency Divider



Phase-Frequency Detector / Charge Pump / Low-Pass Filter (PFD-CP-LPF)

Stanford University

# Low-Frequency PLL: Initial Design Transient Response



# Low-Frequency PLL: Final Design

## Ring Oscillator



## Frequency Divider



Current-Starved Inverter



Phase-Frequency Detector / Charge Pump / Low-Pass Filter (PFD-CP-LPF)

Stanford University

# Low-Frequency PLL: Final Design Transient Response



# Low-Frequency PLL: Layout



**Detect magnitude of vibration for deconvolution kernel quality factor estimation. Requires a differential-to-single-ended converter at the input. The peak detector reset pulse is generated by a pulse generator circuit.**

## Chip Architecture



# Analog Processing Block

## Differential-to-Single-Ended Converter



# Differential-to-Single-Ended Converter: Design

- Capacitive feedback provides a gain-of-1 differential to single-ended conversion.
- pF-magnitude MiM capacitors used (reasonable area).
- Transmission gates provide reset functionality.



# Differential-to-Single-Ended Converter: Layout

- Layout uses matched (common-centroid) MiM capacitors for the two pairs on the positive and negative terminals.



# Core Analog Circuit

## Peak Detector Reset Pulse Generator Circuit



# Peak Detector Reset Pulse Generator Circuit: Design

- Peak detector reset is triggered by a falling edge on the ADC sample signal, hence the active low trigger input to the circuit (*trigb*).
- Pulse width of two clock cycles with a two cycle delay from the input trigger edge.



# Peak Detector Reset Pulse Generator Circuit: Layout

- Simple standard cell layout.



# Analog Processing Block

## Peak/Envelope Detector Circuit



# Peak/Envelope Detector Circuit: Design

- Uses a current mirror as the rectifier element.
- An output buffer improves the drive capability given a variable capacitive load.
- The reset functionality resets the peak detector when we want to measure a new signal amplitude.



# Peak/Envelope Detector Circuit: Transient Response



# Peak/Envelope Detector Circuit: Layout



# Peak/Envelope Detector Circuit: Post-RC-Extraction Transient Response



# Chip Architecture

Sample the detected frequency and amplitude.



# Analog Processing Block

## Sample-and-Hold Circuit



# Sample-and-Hold Circuit: Design

- Uses an open-loop design.
- Includes clock feed-through and charge injection cancellation.
  - The  $C_{\text{hold}}$  -  $\text{clk}$  switch combination is included on both input terminals to achieve this cancellation.
  - Not exact since the impedance seen looking into each is different (negative terminal capacitor not grounded), but yields sufficient performance.



# Sample-and-Hold Circuit: Transient Response



# Sample-and-Hold Circuit: Layout

Sample transistors:

-AA--BBBB--AA-  
-BB--AAAA--BB-

Compensation capacitors:  
(unit capacitor: 10x10)

-----  
-AB-  
-BA-  
-BA-  
-AB-  
-----



# Chip Architecture

Requires a  
latched  
comparator.



# Core Analog Circuit Latched Comparator



# Latched Comparator: Design



# Latched Comparator: Design

- Two levels of regenerative latching for optimal response time.
  - Using a folded pre-amplifier input stage increases the power consumption in the latching stage but yields a wider common-mode input range on the lower end of the voltage spectrum.
  - Pre-amplifier stage provides added gain to reduce the input offset voltage.
- Comparator value is latched at the rising clock edge and reset at the falling edge.
- Input differential pair sized with larger  $WL$  to reduce the input-referred offset voltage.
- Charge injection and clock feedthrough effects mitigated with output buffer stages.



# Latched Comparator: Characteristics

- Falling edge  $c/k \rightarrow Q$  delay: 0.84 ns.
  - Measured for a 5 mV differential input step.
- Rising edge  $c/k \rightarrow Q$  delay: 1.05 ns.
  - Measured for a 5 mV differential input step.
- Input offset voltage < 0.1 mV.
  - Approximated via transient simulation at a common-mode input voltage of 0.9 V.
- Input common-mode voltage upper bound: 1.45 V.
  - Appropriate given the known range of analog output levels from the PLL and peak detector circuits (< 1.45 V).



# Latched Comparator: Transient Response



# Latched Comparator: Summary

| Specification                          | Design Target | Achieved   |
|----------------------------------------|---------------|------------|
| Input offset voltage                   | < 1 mV        | < 0.1 mV   |
| Bias current                           | 10 $\mu$ A    | 10 $\mu$ A |
| Falling edge $clk \rightarrow Q$ delay | < 1 $\mu$ s   | 0.84 ns    |
| Rising edge $clk \rightarrow Q$ delay  | < 1 $\mu$ s   | 1.05 ns    |
| 10-90% rise/fall time                  | < 1 $\mu$ s   | < 1 ns     |
| Input common-mode range upper bound    | > 1.2 V       | 1.45 V     |

# Latched Comparator: Layout

- Symmetric layout used to ensure matching of the positive and negative latch nodes.
- NMOS regenerative latching stage: uses a symmetric layout with an equal number of dummy devices for both the positive (*latchp*) and negative (*latchm*) latch nodes to ensure equal capacitance loading.
- PMOS reset transistor: odd number of fingers to ensure an equal number of dummies adjacent to *latchp* and *latchm*.
- Clock line isolated from the sensitive comparator latch nodes to mitigate clock feedthrough capacitance.
- Guard rings to mitigate coupling between digital lines and analog circuitry.

PMOS input differential pair:  
(unit transistor: 2/1)

-ABBAABBA-  
-BAABBAAB-

PMOS regenerative latch (with clocked reset transistors):  
(unit transistor: 2/1)

-CBABC-  
-BCDBA-

-CBBCD-  
-BCCBA-

PMOS current mirror bias:  
(unit transistor: 1/2)

-BAAB-

NMOS regenerative latch: ensure symmetric dummies for both transistors to yield equal capacitive loading on *vLatchm* and *vLatchp*.  
(unit transistor: 1/0.3)

-BA--AB-



# Chip Architecture

Digitize the estimated frequency and magnitude before passing into the digital IIR deconvolution kernel estimator.



# Analog Processing Block

## 8-Bit SAR ADC



# 8-Bit SAR ADC: Design



SAR Control Logic  
(Synthesized Verilog RTL)

## 8-Bit Binary-Weighted Capacitive DAC



Sample-and-Hold

# Analog Processing Sub-Circuit

## 8-Bit Binary-Weighted Capacitive DAC



# 8-Bit Binary-Weighted Capacitive DAC: Design



# 8-Bit Binary-Weighted Capacitive DAC: Design

- Analog muxes select between the low-level (0.7 V), the reference level (1.2 V), and the analog input voltage from either the PLL or the peak detector circuit.
  - Low-level and reference voltages provided from off-chip circuitry.
- LSB capacitance value:  $\sim 0.1 \text{ pF}$  ( $7 \mu\text{m} \times 7 \mu\text{m}$  unit capacitor).
  - $\rightarrow$  MSB capacitance value:  $\sim 12.8 \text{ pF}$ .
  - reasonable area.
- Buffer added on the common terminal to eliminate switching feedthrough to the comparator input (magnifies offset errors).
- Latched comparator used to compare the common terminal voltage with the low-level voltage.



# 8-Bit Binary-Weighted Capacitive DAC: Transient Response

- Example simulation: analog input voltage of 1.1 V.
- Given a reference of 1.2 V and a low level of 0.7 V, this yields an 8-bit ADC output of 11001100.
- Need to consider noise as well, though this should only affect the lowest bit of the converted value → not an issue given the low frequency resolution of the IIR filter and Q-factor determination.



# 8-Bit Binary-Weighted Capacitive DAC: Layout

- Common centroid configuration used for the capacitor array.
- Dummy unit capacitors on the perimeter.

Capacitor array layout:

```
- - - - -  
- H H H H H H H H H H H H H H H H H H H H  
- H H H H H H H H H H H H H H H H H H H H  
- H H H H H G G G G G G G G H H H H H H H  
- H H H G G G G G G G G G G G H H H H H H H  
- H H G G F F F F F F F F G H H H H H H H  
- H H G G F F E E E E F F G G H H H H H H  
- H H G G F E C D D C E F G G H H H H H H  
- H H G G F E D A B D E F G G H H H H H H  
- H H G G F E D B A D E F G G H H H H H H  
- H H G G F E C D D C E F G G H H H H H H  
- H H G G F F E E E E F F G G H H H H H H  
- H H H G F F F F F F F F G G H H H H H H  
- H H H G G G G G G G G G G H H H H H H H H  
- H H H H G G G G G G H H H H H H H H H H H H  
- H H H H H H H H H H H H H H H H H H H H H H  
- - - - -
```



# Digital Processing Sub-Circuit

## 8-Bit SAR ADC Controller



# 8-Bit SAR ADC Digital Controller

- Implemented in Verilog.
- Inputs:
  - *clk, rst\_n, adc\_start, comparator\_val*
- Outputs:
  - *run\_adc\_n, adc\_val[7:0], out\_valid*
- On reset: DAC select bits are reset to 0 and the run\_adc\_n line is pulled high, thereby switching all DAC capacitor nodes to the analog input.
- Moore state machine with 4 states.
  - SAMPLE state: sampling the analog input, DAC select bits are set to 0.
  - RESET state: run\_adc\_n is brought low to activate the ADC conversion process; the DAC mask is initialized with the highest bit activated.
  - ADC\_CONV state: DAC select bits are updated according to the input comparator value and the DAC mask is right shifted for each conversion step.
    - This state requires 8 cycles.
  - END\_CONV state: valid output signal is activated and DAC mask is reset.

# 8-Bit SAR ADC Digital Controller: Testbench

- Randomizes the analog input (represented in digital form).
- Simulates the comparator by comparing the current ADC output to the “analog” input value.
- Instantiates the SAR ADC module.
- Waits for the conversion to finish (*out\_valid* signal goes high 10 clock cycles after the start of the conversion cycle), then verifies the result with respect to the input value.

# 8-Bit SAR ADC Digital Controller: Layout



- Vertical power stripes routed in M5 and horizontal power stripes routed in M4.
- 2:1 aspect ratio used for the SAR ADC controller layout to fit nicely next to the corresponding analog DAC/comparator/sample-and-hold network for both the frequency and amplitude detection circuit pipelines.
- Layout was implemented separately from the rest of the digital processing unit to ensure that the controller blocks could be placed directly next to the SAR ADC analog blocks.
- Final size: 0.1 mm x 0.08 mm.

# Core Analog Circuit Bias Current Distribution



# Bias Current Distribution Circuit: Design + Layout



- Provides bias currents for each of the blocks in the chip.
  - All bias currents are sourced from two bias voltage inputs, one on the PMOS side and one on the NMOS side.



PMOS (unit transistor: 6/4) :

-AABBCCDDEEFF-  
-FFEEDDCCBAA-  
-GGHIIJJKKLL-  
-LLKKJJIIHHGG-  
-MMNNOOONNMM-

NMOS (unit transistor: 2/2) :

-ABBA-  
-BAAB-

# Analog Front-End Top-Level



# Analog Front-End: Design



# Top Level Analog Layout

- FINAL ANALOG AREA: 0.6 mm x 1.2 mm
- I/O pins routed manually to the harness.



# Top Level Analog Layout

- FINAL ANALOG AREA: 0.6 mm x 1.2 mm
- I/O pins routed manually to the harness.



# Agenda

- Chip Overview
  - Bird's Eye View
  - Introduction
  - Chip Architecture
- Analog Front-End: Designs and Layouts
- **Digital Processing Engine: Designs and Layouts**
- Complete Mixed-Signal Chip

# Digital Processing Engine

## Top-Level



# Top-Level Digital Processing Unit



# Top-Level Digital Processing Unit

- Consists of:
  - Two ADC controllers (one for the signal frequency ADC and one for the signal amplitude ADC).
  - Four groups of four 2 KB SRAMs.
    - SRAM group #1: stores phase/frequency vector
      - Unsigned 16-bit values, normalized to a  $[0, 2\pi)$  scale.
    - SRAM group #2: stores transfer function coefficients for various combinations of input signal amplitude and frequency.
      - The 12-bit address for this SRAM is constructed using the top 3 bits of the 8-bit amplitude value, followed by the top 7 bits of the 8-bit frequency value, followed by a 2-bit code indicating the transfer function coefficient.
      - 16-bit signed fixed-point data (13 fractional bits).
    - SRAM group #3: stores the computed deconvolution kernel magnitude data.
      - 16-bit unsigned fixed-point values (13 fractional bits).
    - SRAM group #4: stores the computed deconvolution kernel phase data.
      - Unsigned 16-bit values, normalized to a  $[0, 2\pi)$  scale.

# Top-Level Digital Processing Unit

- (continued):
  - SRAM interface for each memory block controls the input and output enables, read/write address, and read/write data flow.
    - Each SRAM can be read out sequentially for debugging, while SRAMs 1 and 2 can be loaded sequentially with external data.
    - SRAMs 1 and 2 are read, while SRAMs 3 and 4 are written during the kernel estimation process.
  - IIR deconvolution kernel estimator pipeline state machine.
    - Uses CORDIC units to compute the frequency response.
  - State machine controller: responds to trigger inputs from the ADC controllers and controls the timing and enables for each of the other blocks.
  - Input deserializer: deserializes the input data for loading SRAMs 1 and 2.
  - Output serializer: serializes the output data (select lines determine whether the read data from SRAM 1, 2, 3, or 4, is streamed out).
  - Debug read out and initial data writing/loading is done in “burst-mode”: the controller automatically increments the read/write addresses at the appropriate times (i.e. when the read or write enable signal is triggered).

# Top-Level Digital Processing Unit: Testbench

- The phase/frequency vector SRAM and transfer function coefficient SRAM are first populated with data by streaming in values via the input deserializer.
  - This data is generated in MATLAB and read in from a text file.
- Stimuli for the testbench include the ADC data, clock and reset signals, ADC done trigger signals, SRAM select signals, and load and debug signals.
- After loading the input data, the frequency evaluation is triggered and the testbench waits until the *freq\_eval\_done* signal goes high.
- The output phase and magnitude data has been written to SRAMs 3 and 4.
- The data is read out via the output serializer and compared with the gold model.
- Separate testbenches were used to verify the input deserializer, the output serializer, and each of the four SRAM interfaces.

# Top-Level Digital Processing Unit: Testbench

- Top-level digital processing unit output comparison:



# Top-Level Digital Processing Unit: Layout



- Use groups of 2KB SRAMs for each 8KB memory bank.
- Vertical power stripes routed on M5 and horizontal power stripes routed on M4.
- Synthesis and PNR implemented at 50 MHz (timing met).
  - Timing also met at higher clock frequencies, but unnecessary given the rate of incoming data.
- Layout does not include ADC controller blocks (integrated separately with the analog front-end unit).
- Final size: 2.5 mm x 2.5 mm.

# Digital Processing Block

## Second-Order IIR Notch Filter Deconvolution Kernel Estimator



# Second-Order IIR Notch Filter Deconvolution Kernel Estimator



# Second-Order IIR Notch Filter Deconvolution Kernel Estimator

- Goal: use the digitized input signal frequency and amplitude to estimate the deconvolution kernel capable of inverting the harmonic vibration effects witnessed in the radar FFT spectrum.
  - This entails estimating the coefficients for a second order IIR notch filter.

$$H(z) = b \frac{1 - 2 \cos w_0 z^{-1} + z^{-2}}{1 - 2b \cos w_0 z^{-1} + (2b - 1)z^{-2}}$$

- Fixed point coefficients for the transfer function are computed via a simple memory look-up; the memory values can then be reloaded if/when the radar sampling rate is changed.
- Magnitudes and real/imaginary components are represented in fixed point form → 16 bits total, 13 fractional bits (signed).
- Phases are represented in 16 bits normalized to a  $2\pi$  scale (unsigned).
- To compute the frequency response for the notch filter, need to evaluate the transfer function at  $e^{j\omega}$  across the desired frequency/phase spectrum.
  - Requires CORDIC rectangular-to-polar and polar-to-rectangular converters.

# Second-Order IIR Notch Filter Deconvolution Kernel Estimator Coefficient Lookup



# Second-Order IIR Notch Filter Deconvolution Kernel Estimator

- Inputs:
  - *clk*
  - *rst\_n*
  - *config\_nfft*[11:0] → length of frequency vector (equivalent to number of FFT points)
  - *a\_coeffs*[2:0][15:0] → transfer function denominator coefficients
  - *b\_coeffs*[2:0][15:0] → transfer function numerator coefficients
  - *eval\_iir\_freq\_resp* → trigger the frequency response evaluation
  - *phase\_mem\_data*[1:0][15:0] → phase/frequency vector values read out from the phase/frequency vector SRAM.
- Outputs:
  - *tf\_val\_magnitude*[15:0] → 16-bit fixed-point magnitude output
  - *tf\_val\_phase*[15:0] → 16-bit phase output
  - *tf\_val\_valid* → indicates the computed output value is valid
  - *phase\_mem\_indices*[1:0][11:0] → addresses at which to read the phase/frequency vector data
  - *freq\_eval\_done* → indicates that the kernel transfer function evaluation is complete

# Second-Order IIR Notch Filter Deconvolution Kernel Estimator



# Second-Order IIR Notch Filter Deconvolution Kernel Estimator

- Three pipeline stages.
  - Rotation of the transfer function coefficients by the appropriate phase.
    - Sequential CORDIC polar to rectangular converters: 15 clock cycles.
    - Generates real and imaginary components.
    - The numerator and denominator real and imaginary components are summed in combinational logic.
  - Conversion of the real and imaginary components in the numerator and denominator to phase and magnitude.
    - Sequential CORDIC rectangular to polar converters: 15 clock cycles.
    - Generates the output phase and magnitude for the transfer function numerator and denominator.
  - Fixed-point transfer function magnitude division and phase subtraction.
    - Serial fixed-point divider: ~30 clock cycles.

# Second-Order IIR Notch Filter Deconvolution Kernel Estimator

- Moore state machine with 5 states.
  - IDLE state: all registers are reset and the system waits for the `eval_iir_freq_resp` signal to be activated.
  - START\_FILT\_EVAL state: the frequency response evaluation pipeline is triggered and the state is immediately updated to the RUN\_FILT\_EVAL state.
  - RUN\_FILT\_EVAL state: system waits for all four CORDIC modules and the fixed point divider to complete their computation.
    - System remains in this state until this has been verified.
  - DONE\_FREQ\_BIN state:
    - The numerator and denominator real and imaginary component outputs from the polar-to-rectangular CORDIC modules are scaled by the CORDIC gain factor.
    - The output magnitude from the rectangular-to-polar CORDIC modules are scaled by the CORDIC gain factor.
    - The new complex phases are retrieved from the phase memory for the next frequency bin.
    - A counter tracks how many frequency bins we have processed.
    - Valid output signal is activated.
  - DONE\_FREQ\_VEC state: evaluation done signal is activated and the system returns to the IDLE state.

## Second-Order IIR Notch Filter Deconvolution Kernel Estimator: Testbench

- The testbench defines the input numerator and denominator coefficients for the IIR notch filter.
  - In reality, these will be computed on chip.
- The phase memory is loaded from an input file generated by MATLAB.
- Given the latency of 5 cycles from the IIR notch filter kernel estimator pipeline, the testbench does not start collecting the output data until it has waited this many cycles.
- Data outputs from each pipeline stage are collected to check against the gold model generated by MATLAB.
- Ideally, we would have exact matching outputs from the gold frequency response and the estimator RTL. However, the CORDIC modules themselves have error and phase variance built in given their attempts to execute nonlinear computation.
  - Fortunately, these errors will not affect the shape and notch locations of the computed filter, which is all we are interested in (does not need to be exact down to the final bit).

## Second-Order IIR Notch Filter Deconvolution Kernel Estimator: Testbench

- Example IIR notch filter spectrum for a radar sampling rate of 789 Hz, a radar window FFT length of 1275 (i.e. the length of frequency evaluation vector), a detected signal frequency of 50 Hz, and a quality factor of  $Q = 1$ .



# Second-Order IIR Notch Filter Deconvolution Kernel Estimator: Testbench

- Pipeline stage 1 output comparison:



# Second-Order IIR Notch Filter Deconvolution Kernel Estimator: Testbench

- Pipeline stage 2 output comparison:



# Second-Order IIR Notch Filter Deconvolution Kernel Estimator: Testbench

- Pipeline stage 3 output comparison:



# Agenda

- Chip Overview
  - Bird's Eye View
  - Introduction
  - Chip Architecture
- Analog Front-End: Designs and Layouts
- Digital Processing Engine: Designs and Layouts
- **Complete Mixed-Signal Chip**

# Complete Mixed-Signal Chip

## Top-Level Design + Harness Integration



# Secondary ESD Protection

- io\_analog[10:0] pins in the test harness do not include any ESD protection on the pads and thus need ESD protection manually added.
- ESD cell consists of complementary diode-connected transistors with their sources tied to their gates; any ESD-related current spurs are thus redirected/shunted to either the positive rail or ground, depending on the polarity.
  - Uses the high-voltage g5v0d10v5 devices in the Skywater library.



# Full Mixed-Signal Design: Top-Level



# Chip Harness Integration

- Includes ESD protection and pull-downs for the appropriate output enable and clamp pins.
- 38 I/O pins + separate analog and digital supplies (both 1.8 V power and ground).



# Final Chip Layout



# Final Chip Layout

- Final checks:
  - DRC clean.
  - LVS clean.
  - Well taps placed to avoid latch-up.
  - ESD protection on relevant I/O pads.
  - Post-parasitic extraction simulations match design specifications and exhibit full functionality.
  - Area: 3.5 mm x 2.9 mm
  - Average power consumption: ~45 mW.
    - ~43 mW for digital processing engine.
    - ~2 mW for analog front-end.



END

