Regression Model - Customer Acquisition Cost - A Feature Selection Approach

Table of Contents

About The Project
Business Objective
Business Metrics
Getting Started
Data Workflow
Conclusion
Prediction using API and Streamlit
- How To Run by API
- Data Input

About The Project

Introduction

The aim of this project is to explore the application of machine learning models for predicting customer acquisition costs (CAC) and to investigate the effectiveness of feature selection techniques in improving the accuracy of these models.
Customer acquisition cost is a crucial metric for businesses, as it directly affects their profitability and marketing strategies. By accurately estimating CAC, companies can optimize their marketing budgets and make informed decisions to maximize return on investment (ROI).

Feature selection plays a vital role in building accurate regression models. It involves identifying the most informative features that have a significant impact on the target variable (CAC). By discarding irrelevant or redundant features, feature selection techniques can enhance the model's performance, reduce overfitting, and improve interpretability.

The research focuses on various feature selection methods, including but not limited to:

Univariate feature selection: This approach evaluates each feature independently based on statistical measures such as chi-square test, ANNOVA test, mutual information, or correlation with the target variable.
Embedded methods: These techniques incorporate feature selection within the model building process itself. For instance, Lasso regression performs feature selection and regularization simultaneously.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
config		config
data		data
log		log
model		model
pics		pics
src		src
.gitignore		.gitignore
01-pipeline.ipynb		01-pipeline.ipynb
02-eda.ipynb		02-eda.ipynb
03-preprocessing.ipynb		03-preprocessing.ipynb
04-modelling.ipynb		04-modelling.ipynb
README.md		README.md
feature-selection.txt		feature-selection.txt
final_model.pkl		final_model.pkl

	Store_cost	total_children	avg_cars_at_home	num_children_at_home	net_weight	units_per_case	coffee_bar	video_store	prepared_food	florist
Type	float	float	float	float	float	float	float	float	float	float
Data Range	1700k - 97000k	0-5	0-4	0-5	3-21	1-36	0-1	0-1	0-1	0-1

DandiMahendris/regression-model-cac

Folders and files

Latest commit

History

Repository files navigation

Regression Model - Customer Acquisition Cost - A Feature Selection Approach

About The Project

Introduction

Business Objective

Business Metrics

Getting Started

Data Workflow

Data Preparation

EDA and Feature Selection

1. Statistical Inference (Univariate Analysis)

2. Parametric Assumption

2.1 Normality

2.2 Homogenity of Variance

3. One-Way ANOVA Test

4. Two-Group (T-Test or Welch's Test)

4.1 Independence T-Test

4.2 Welch's Test

4.3 Barplot of Two-Group

5. Pearson Correlation

6. Variance Inflation Factor

7. Lasso Method (Embedded)

8. Random Forest (Embedded)

9. Permutation Based (Embedded)

Data Preprocessing and Feature Engineering

1. Imputing

2. Encoding

3. Scaling

Data Modelling

1. Baseline Model

2. Cross Validation Score

Cross Validation score (CVS)

Scoring:

3. Hyperparameter Model

4. Model Performance on Test Dataset

Best Model

Conclusion

Prediction using API and Streamlit

How To Run by API

Data Input

About

Topics

Resources

Stars

Watchers

Forks

Languages