Machine-Learning-Projects

Solved end-to-end machine learning projects

All State Insurance Claims Severity Prediction

In this data science project, you will develop automated methods for predicting the cost, and severity of insurance claims.

Description:

When you've been devastated by a serious car accident, your focus is on the things that matter the most: family, friends, and other loved ones. Pushing paper with your insurance agent is the last place you want your time or mental energy spent. This is why Allstate, a personal insurer in the United States, is continually seeking fresh ideas to improve their claims service for the over 16 million households they protect.

Basic exploratory analysis using the claims data
Insights from exploratory data analysis
Factors to be considered for claims processing and severity prediction
Implementation of the model using R
Building smarter predictive models including XGBoost

Build an Image Classifier for Plant Species Identification

In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Description:

The objective of this machine learning project is to use binary leaf images and extracted features, including shape, margin, and texture, to accurately identify 99 species of plants. Leaves, due to their volume, prevalence, and unique characteristics, are an effective means of differentiating plant species. They also provide a fun introduction to applying techniques that involve image-based features. We are going to apply different classification techniques to benchmark the relevance of classifiers in image classification problem.

Image Processing
Feature selection
Classifier comparison
Benchmarking
Prediction

Choosing the right Time Series Forecasting Methods

There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Description:

In this machine learning project, we will be taking open source datasets that are publicly available and will be discussing various methods/techniques of performing time series forecasting. We will discuss about the traditional methods such as holt-winters method, Autoregressive integrated moving average method, exponential smoothing methods, as well we will also be comparing the modern methods of performing forecasting using neural network based models.

Understanding the importance of time series
Understanding the mathematics of time series
Discussion about methods/techniques
Application of the models using R or Python
Making conclusions

Credit Card Fraud Detection as a Classification Problem

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Handle imbalance data
Creation classifier
Compare accuracy
Use deep learning to classify
Implementation using R

The Credit Card Fraud detection Dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

The dataset has been collected and analyzed during a research collaboration of Worldline and the Machine Learning Group of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on http://mlg.ulb.ac.be/BruFence and http://mlg.ulb.ac.be/ARTML

Identifying Product Bundles from Sales Data Using R Language

Description:

The weekly sales transaction dataset consists of weekly purchased quantities of 800 products over 52 weeks. Normalised values are provided too. The objective of this data science project in R is to find out product bundles that can be put together on sale. Typically Market Basket Analysis was used to identify such bundles, here we are going to compare the relative importance of time series clustering in identifying product bundles.

Time series clustering
K-means
HC- clustering
Model Based clustering
Comparison of clustering

Instacart Market Basket Analysis

Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Read data from large size files
Perform Exploratory Data Analysis (EDA)
Apply logic to derive insights
Create association rule model
Implementation using R

Description:

Whether you shop from meticulously planned grocery lists or let whimsy guide your grazing, our unique food rituals define who we are. Instacart, a grocery ordering and delivery app aim to make it easy to fill your refrigerator and pantry with your personal favorites and staples when you need them. After selecting products through the Instacart app, personal shoppers review your order and do the in-store shopping and delivery for you.

Instacart’s data science team plays a big part in providing this delightful shopping experience. Currently, they use transactional data to develop models that predict which products a user will buy again, try for the first time, or add to their cart next during a session. Recently, Instacart open-sourced this data - see their blog post on 3 Million Instacart Orders (https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29ead6f2), Open Sourced.

In this data science project, we are going to use this anonymized data on customer orders over time to predict which previously purchased products will be in a user's next order.

Music Recommendation System Project using Python and R

Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Description:

The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) is challenging you to build a better music recommendation system using a donated dataset from KKBOX. WSDM (pronounced "wisdom") is one of the the premier conferences on web inspired research involving search and data mining. They're committed to publishing original, high quality papers and presentations, with an emphasis on practical but principled novel models. They currently use a collaborative filtering based algorithm with matrix factorization and word embedding in their recommendation system but believe new machine learning techniques could lead to better results.

In this machine learning project, you will be asked to predict the chances of a user listening to a song repetitively after the first observable listening event within a time window was triggered. If there are recurring listening event(s) triggered within a month after the user's very first observable listening event, its target is marked 1, and 0 otherwise in the training set. The same rule applies to the testing set.

KKBOX provides a music dataset that consists of information of the first observable listening event for each unique user-song pair within a specific time duration. Metadata of each unique user and song pair is also provided. The use of public data to increase the level of accuracy of your prediction is encouraged.

The train and the test data are selected from users listening history in a given time period. Note that this time period is chosen to be before the WSDM-KKBox Churn Prediction time period. The train and test sets are split based on time, and the split of public/private is based on unique user/song pairs.

Working with Music Data with several category
EDA using several Visualization techniques
Building Automated Recommendation Engine
Solve this use case using Python and R
Finding Parameter Tuning for better Algorithm

Perform Time series modelling using Facebook Prophet

In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Description:

There are various methods to perform time series forecasting. Traditionally people have used AR, MA or ARIMA based models to perform forecasting. Prophet is an open source forecasting tool built by Facebook. It can be used for time series modeling and forecasting trends in the future. The advantage of using Prophet over traditional libraries is that one does not need to know the technicalities of time series, domain knowledge is not really required to do time series forecasting. In this Hackerday we are going to use Prophet vs other methods to do the benchmarking.

Time series forecasting using ARIMA
Time series forecasting using Prophet
Implementing Prophet
Knowing advantages of Prophet
Using Bayesian Method of forecasting

Predict Big Mart Sale

The aim of this project is to build a predictive model, and use historical data to predict sales for each particular product in different stores.

Given sales data for 1559 products across 10 stores of the Big Mart chain in various cities. I will try to understand the properties of products and stores which play a key role in increasing sales.

The train and test data, can be found at the Analytics Vidya’s Big Mart Sales Prediction Challenge

Predict Churn for a Telecom company using Logistic Regression

Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Description:

Customer churn refers to a decision made by the customer about ending the business relationship. It is also referred to the loss of clients or customers. Customer loyalty and customer churn always add up to 100%. If a firm has a 60% loyalty rate, then their loss or churn rate of customers is 40%. As per 80/20 customer profitability rule, 20% of customers are generating 80% of revenue. So, it is very important to predict the users likely to churn from the business relationship and the factors affecting the customer decisions. Here we are going to show how logistic regression model using R can be used to identify the customer churn in the telecom dataset.

Understand the customer behavior
Understand reasons for churn
What are the top factors
How to retain customers
Apply multiple classification models

Predict Credit Default | Give Me Some Credit Kaggle

In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Description:

Banks often depend on credit score prediction models to approve or deny a loan request. A good prediction model is necessary for a bank so that they can provide maximum credit without exceeding the risk threshold. This data science project uses credit score dataset which has fairly large volume of data (250K). The predictive models will be build following various approaches - random forests, graident boosting and logistic regression. At the end of the project you will build a predictive model that will automatically score each applicant with a credit score which is human readable and easy to interpret.

Predict Employee Computer Access Needs in Python

Given his or her job role, predict employee access needs using amazon employee database.

Description:

When an employee at any company starts work, they first need to obtain the computer access necessary to fulfill their role. This access may allow an employee to read/manipulate resources through various applications or web portals. It is assumed that employees fulfilling the functions of a given role will access the same or similar resources. It is often the case that employees figure out the access they need as they encounter roadblocks during their daily work (e.g. not able to log into a reporting portal). A knowledgeable supervisor then takes time to manually grant the needed access in order to overcome access obstacles. As employees move throughout a company, this access discovery/recovery cycle wastes a nontrivial amount of time and money.

There is a considerable amount of data regarding an employee’s role within an organization and the resources to which they have access. Given the data related to current employees and their provisioned access, models can be built that automatically determine access privileges as employees enter and leave roles within a company. In this data science project, we will build an auto-access model that minimizes the human involvement required to grant or revoke employee access.

Predict Macro Economic Trends using Kaggle Financial Dataset

In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Description:

Two Sigma is a technology company dedicated to finding value in the world’s data. Since its founding in 2001, Two Sigma has built an innovative platform that combines extraordinary computing power, vast amounts of information, and advanced data science to produce breakthroughs in investment management, insurance, and related fields. Economic opportunity depends on the ability to deliver singularly accurate forecasts in a world of uncertainty.

By accurately predicting financial movements, you will learn about scientifically-driven approaches to unlocking significant predictive capability.

Two Sigma is excited to find predictive value and gain a better understanding of the skills offered by the global data science crowd.

Application of linear regression
Application of non-linear regression
Application of LASSO and elastic net regression
Application of XGBoost model
Interpretation of models

Prediction of Flower Species

The Iris dataset is a classic dataset from the 1930s; it is one of the first modern examples of statistical classification.

The dataset is a collection of morphological measurements of several Iris flowers. These measurements will enable us to distinguish multiple species of the flowers. Today, species are identified by their DNA fingerprints, but in the 1930s, DNA's role in genetics had not yet been discovered.

Predictive Models in IoT - Energy Prediction Use Case

In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Multiple linear regression,
Support vector machine with radial kernel,
Random forest and Gradient boosting machines (GBM).
Use of statistical models with repeated cross validation and evaluated in a testing set

Description:

This IoT project presents and discusses data-driven predictive models for the energy use of appliances. Data used include measurements of temperature and humidity sensors from a wireless network, whether from a nearby airport station and recorded energy use of lighting fixtures. The machine learning project discusses data filtering to remove non-predictive parameters and feature ranking. The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for 10 minutes periods. The energy data was logged every 10 minutes with m-bus energy meters. Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru) and merged together with the experimental data sets using the date and time column. Two random variables have been included in the data set for testing the regression models and to filter out non-predictive attributes (parameters).

Zillow's Home Value Prediction (Zestimate)

Build a machine learning algorithm to predict the future sale prices of homes.

Problem statement analysis
Exploratory Data Analysis
Input Data Visualization
Interpretation from Visualization

Description:

Zillow is asking you to predict the log-error between their Zestimate and the actual sale price, given all the features of a home. The log error is defined as: and it is recorded in the transactions file train.csv. In this project, you are going to predict the log error for the months in Fall 2017.

"Zestimates" are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

In this data science project, we will develop a machine learning algorithm that makes predictions about the future sale prices of homes. We will also build a model to improve the Zestimate residual error. And finally, we'll build a home valuation algorithm from the ground up, using external data sources.

Regression on Boston Housing Dataset

Titanic EDA

Solving Multiple Classification use cases Using H2O

In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Description:

H2O.ai is focused on bringing AI to businesses through software.

H2O includes many common Machine Learning algorithms, such as generalized linear modeling (linear regression, logistic regression, etc.), Naive Bayes, principal components analysis, k-means clustering, and word2vec. H2O implements best-in-class algorithms at scale, such as distributed random forest, gradient boosting, and deep learning. H2O also includes a Stacked Ensembles method, which finds the optimal combination of a collection of prediction algorithms using a process known as stacking.

Data cleaning using H2O
Model Training using H2O
Model scalability using H2O in Hadoop environment
Driverless AI using H2O

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
All State Insurance Claims Severity Prediction		All State Insurance Claims Severity Prediction
Build an Image Classifier for Plant Species Identification		Build an Image Classifier for Plant Species Identification
Choosing the right Time Series Forecasting Methods		Choosing the right Time Series Forecasting Methods
Credit Card Fraud Detection as a Classification Problem		Credit Card Fraud Detection as a Classification Problem
Identifying Product Bundles from Sales Data		Identifying Product Bundles from Sales Data
Instacart Market Basket Analysis		Instacart Market Basket Analysis
Music Recommendation System Project using Python and R		Music Recommendation System Project using Python and R
Perform Time series modelling using Facebook Prophet		Perform Time series modelling using Facebook Prophet
Predict BigMart Sales		Predict BigMart Sales
Predict Churn for a Telecom company using Logistic Regression		Predict Churn for a Telecom company using Logistic Regression
Predict Credit Default Give Me Some Credit		Predict Credit Default Give Me Some Credit
Predict Employee Computer Access Needs		Predict Employee Computer Access Needs
Predict Macro Economic Trends		Predict Macro Economic Trends
Prediction of Flower Species		Prediction of Flower Species
Predictive Models in IoT - Energy Prediction Use Case		Predictive Models in IoT - Energy Prediction Use Case
Regression on Boston Housing Dataset		Regression on Boston Housing Dataset
Solving Multiple Classification use cases Using H2O		Solving Multiple Classification use cases Using H2O
Titanic Survival Prediction		Titanic Survival Prediction
Zillow’s Home Value Prediction (Zestimate)		Zillow’s Home Value Prediction (Zestimate)
README.md		README.md

shejz/Machine-Learning-Projects

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning-Projects

About

Resources

Stars

Watchers

Forks

Languages