Skip to content

ileanadatamania/Data-Science-Portfolio

Repository files navigation

Data Science Portfolio

Portfolio Information Description
Language Python
Libraries Used sklearn, NLTK,statsmodels, Numpy, Pandas, re(Regex), matplotlib, seaborn, wordcloud
Projects Count 6
Author Ileana Cabada
Dataset Electronic_Pricedataset,AB Testing dataset

About Portfolio -Product Price and Cross- Price Elasticities of Advertisment demand, Feature engineering for machine learning (Product Category Labelling) with natural language processing , A/B Player Retention Testing and games played probability, Price Exploratory Analysis

Content

Applying Econometrics for Pricing Strategies using Linear Modeling

In following analysis, we would select Best Buy products as main data sample for our price elasticity analysis. For future reference,this model can be implemented in every kind of vendors by e-commerce or brick and mortar by measuring sales demand

Hypothesis Proposed

From Bestbuy laptop sample data in 2017. Is ad impression demand sensitive to its own product price changes? If yes, by how much ad impression demand is sensitive to price change?

Statistical Model

  • Linear Regression

Libraries

  • statsmodels, NumPy, Pandas, Matplotlib
Laptop, Desktop Price Elasticity

Hypothesis Proposed

How much is ad impression demand influenced by main competitors when they change their prices? This model help us to know the naturality of competition between prices of our own price product advertised against main competitors price product changes

Statistical Model

  • Multi Linear Regression

Libraries

  • statsmodels, NumPy, Pandas, Matplotlib
Cross-Price Elasticity of 12 Mac Book

A/B Testing for Consumer Retention

Statistical Model

  • Poisson Distribution, Bootstrap Distribution

Libraries

  • statsmodels, NumPy, Pandas, Matplotlib
A/B Testing Distribution Poisson Distribution

Feature Engineering for Machine Learning and Natural Language Processing

About Model Implemented

Due to the fact that the dataset doesn't count with category labelling for further price analysis between similar products (i.e. tablets, headphones).

Unsupervised texting clustering model for the creation of product category label segmentation was implemented by using texting preprocessing techniques such as Lemmatization, Regex, Tokenization, followed by TF-IDF Vectorization and Kmeans algorithm.

Category_name and Cluster features were created from unique product names with their respective product description.

Machine Learning Model

  • Kmeans

Libraries

  • NLTK, sklearn, RE(Regex), WordCloud, Matplotlib, Pandas and Numpy
WordCloud Electronic Category Label Clusters

Exploratory Data Analysis EDA

For further calculation of price elasticities with multilinear regression model. This price exploratory analysis was executed for following reasons:

  • Product Condition Selection
  • Price Outlier Detection
  • Price Distribution Analysis
  • Discount Price Correlation with Impression Total Count per Category
  • Merchant (e-commerce) Impression Time Analysis

Libraries

  • seaborn, Matplotlib, Pandas and Numpy
Price Distribution Plot Price Discount Correlation Heatmap

Data Cleaning and Preprocessing

managing null values, dropping of unused features, text normalization

Libraries

  • RE(Regex), Matplotlib, Pandas and Numpy
Null, Unique and Datatype column values table
Contact Source Information
e-mail ileana.cabada@gmail.com
Linkedin https://www.linkedin.com/in/ileana-c-24666159/