GitHub - TheMrityunjayPathak/FeatureEngineering: Feature Engineering with Python

Feature Engineering

Feature engineering is a machine learning technique that leverages data to create new variables that aren’t in the training set.
It can produce new features for both supervised and unsupervised learning, with the goal of simplifying and speeding up data transformations while also enhancing model accuracy.
Feature engineering is required when working with machine learning models.
Regardless of the data or architecture, a terrible feature will have a direct impact on your model.

Importance of Feature Engineering

Feature Engineering is a very important step in machine learning.
Feature engineering refers to the process of designing artificial features into an algorithm.
These artificial features are then used by that algorithm in order to improve its performance.

Getting Started

Clone the repository to your local machine using the following command :

git clone https://github.com/TheMrityunjayPathak/FeatureEngineering.git

Different Feature Engineering Techniques

Dummy Variable

Dummy variables are qualitative variables or discrete variables that represent categorical data and can take the values as 0 or 1 to indicate the absence or presence of a specified attribute respectively.

Inter Quartile Range

In Descriptive Statistics, the Interquartile Range tells you the spread of the middle half of your distribution.
Quartiles segment any distribution that’s ordered from low to high into four equal parts.
The interquartile range (IQR) contains the second and third quartiles, or the middle half of your data set.

The Interquartile Range is found by subtracting the Q1 value from the Q3 value :

IQR = Q3 - Q1
Q3 = 3rd quartile or 75th percentile
Q1 = 1st quartile or 25th percentile
Q1 is the value below which 25 percent of the distribution lies, while Q3 is the value below which 75 percent of the distribution lies.

Z-Score

Z-score is a statistical measurement that describes a value's relationship to the mean of a group of values.
Z-score is measured in terms of standard deviations from the mean.
Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean.
The statistical formula for a value's z-score is calculated using the following formula:

Z-Score = ( x - μ ) / σ

where :

z = Z-score
x = the value being evaluated
μ = the mean
σ = the standard deviation

Modified Z-Score

A Modified Z-Score is more robust because it uses the median to calculate z-scores as opposed to the mean, which is known to be influenced by outliers.

Modified Z-Score = 0.6745(xi – x̃) / MAD

where :

xi = A single data value
x̃ = The median of the dataset
MAD = The median absolute deviation of the dataset
Value's with Modified Z-Scores less than -3.5 or greater than 3.5 be labeled as potential outliers.

Data Standardization

Standardization is a scaling method where the values are centered around the mean with a unit standard deviation.
This means that the mean of the attribute becomes zero, and the resultant distribution has standard deviation equal to 1.

Handling Imbalance Dataset

Imbalanced data refers to those types of datasets where the target class has an uneven distribution of observations.
In an Imbalance Data one class label has a very high number of observations and the other has a very low number of observations.

Scroll to Top ⬆️

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
Data Standardization		Data Standardization
Dummy Variable		Dummy Variable
Handling Imbalance Dataset		Handling Imbalance Dataset
Removing Outlier Using IQR		Removing Outlier Using IQR
Removing Outlier Using Modified Z-Score		Removing Outlier Using Modified Z-Score
Removing Outlier Using Z-Score		Removing Outlier Using Z-Score
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Standardization

Data Standardization

Dummy Variable

Dummy Variable

Handling Imbalance Dataset

Handling Imbalance Dataset

Removing Outlier Using IQR

Removing Outlier Using IQR

Removing Outlier Using Modified Z-Score

Removing Outlier Using Modified Z-Score

Removing Outlier Using Z-Score

Removing Outlier Using Z-Score

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Feature Engineering

Importance of Feature Engineering

Getting Started

Different Feature Engineering Techniques

About

Releases

Packages

Languages

License

TheMrityunjayPathak/FeatureEngineering

Folders and files

Latest commit

History

Repository files navigation

Feature Engineering

Importance of Feature Engineering

Getting Started

Different Feature Engineering Techniques

About

Topics

Resources

License

Stars

Watchers

Forks

Languages