



# Transforming Semiconductor Yield Management with AI



# CHIP SLEAUTH

An AI-powered wafer defect classification system, a groundbreaking solution that leverages advanced machine learning and computer vision techniques to transform semiconductor yield management.



# Members

Valerie Kigo

Joel Gitonga

Hudheyfa Mohamud

Lawrence Kamerino

# Introduction & Business Context



## Highly complex, capital-intensive semiconductor manufacturing process

Hundreds of fabrication steps involved, with microscopic defects leading to complete product failure and impacting yield and cost



## Traditional manual inspection is slow, subjective, and cannot scale with modern production speeds

Inability to keep up with high-volume production and struggles to identify subtle, complex defect patterns



## Industry leaders are shifting to AI-driven systems

Leveraging Machine Learning and Computer Vision for early, accurate defect detection to enhance production efficiency

The semiconductor industry is facing challenges with traditional manual inspection methods, presenting an opportunity for AI-driven solutions to transform yield management and production efficiency.



## Problem Statement

Manufacturers lack an efficient, automated method to identify and classify wafer defects early in production. Key pain points include manual inspection that fails to scale, inaccuracy and subjectivity in defect identification, and delayed root-cause analysis - leading to reduced yield, increased costs, and longer time-to-market.

# Project Objectives & Expected Impact

## Primary Objective

Develop and deploy a deep learning-based system to automatically detect and classify wafer defect patterns.

## Operational Efficiency

Faster, more accurate defect detection.

## Cost Reduction

Reduced labor costs & fewer defective chips.

## Quality Improvement

Early detection minimizes yield loss.

## Decision Support

Data-driven insights for process optimization.

## Scalability

System integrates into production pipelines.

# Our Methodology: A Structured Approach





# Data Understanding: The WM811K Dataset

The WM811K dataset is a publicly available dataset from TSMC that contains over 811,000 wafer samples. Each sample includes a wafer map image and metadata such as defect type and lot name. However, the dataset poses a significant challenge due to severe class imbalance, with over 96% of the samples being 'None' or unlabeled, and the critical defect patterns (Scratch, Donut, Random) making up less than 1% of the data.

# Data Preparation & Feature Engineering

| Task                                    | Description                                                                                                                                                                          |
|-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Data Cleansing & Standardization        | Unified inconsistent labels and filtered out rare classes. Standardized all wafer map dimensions to 32x32 pixels. Ensured data integrity and quality for modeling.                   |
| Feature Engineering for Enhanced Models | Extracted statistical and texture features (mean intensity, entropy, edge density). This provided richer input for traditional ML models (Random Forest, XGBoost) beyond raw pixels. |
| Class Balancing                         | Applied SMOTE to synthetically generate samples for rare defect types, preventing model bias.                                                                                        |

# Modeling Approach: A Multi-Model Strategy

## Logistic Regression

Established a performance benchmark as the Baseline Model.

## Random Forest & XGBoost

Leveraged engineered features for robust performance as Feature-Based Models.

## Convolutional Neural Network (CNN)

Automatically learns spatial patterns from wafer maps as the Image-Based Model.

# Model Evaluation & Performance Comparison

| METRICS            | LOGISTIC REGRESSION | RANDOM FOREST | XGBOOST | CNN |
|--------------------|---------------------|---------------|---------|-----|
| Accuracy           | 62%                 | 81 %          | 81%     | 72% |
| Macro-F1 score     | 50%                 | 64%           | 65%     |     |
| Weighted- f1 score | 66%                 | 75%           | 79%     |     |

# Model Evaluation & Performance Comparison



# CNN Model Architecture



# Key Findings & Business Interpretation



## AI is Viable

Deep learning can successfully automate wafer defect classification with high accuracy.



## CNN is Optimal

For image-based patterns, CNNs outperform traditional ML models, justifying the architectural complexity.



## Data Quality is Critical

Cleaning, standardization, and balancing were fundamental to our success.



## Actionable Insights

The model doesn't just predict; it identifies *\*where\** and *\*what\** the defect is (e.g., Center, Edge-Ring, Scratch), enabling rapid root-cause analysis.

# Recommendations

- **Semiconductor Industry:**  
**Automates wafer defect detection**  
**Cuts inspection time & production**
- **Industrial Automation**  
Prevents downtime in robotic systems  
Detects defects in other industrial components
- **Automotive:**  
**Ensures chip reliability in EVs & ADAS systems**  
**Prevents faulty sensor failures.**
- **Healthcare & Medical Devices:**  
**Guarantees defect-free chips in life-critical devices**  
**Enhances equipment reliability & patient safety**
- **Consumer Electronics:**  
**Improves smartphone & IoT chip quality**  
**Reduces recalls and warranty costs**



# Deployment: Interactive Defect Classification Dashboard

We have built a functional Streamlit prototype to demonstrate value and gather user feedback. The interactive dashboard allows process engineers to upload wafer map images and receive real-time defect classification predictions from the trained CNN model.

**The project successfully developed a robust AI-powered wafer defect classification system that leverages advanced deep learning techniques to address the key challenges in semiconductor manufacturing. By deploying this solution, semiconductor fabs can expect significant improvements in operational efficiency, cost reduction, quality enhancement, and data-driven decision support. The flexible and scalable CNN model demonstrated superior performance, paving the way for widespread adoption of AI in yield management.**

