You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We create a model using the gradient boosting algorithm to cut down on the noise and improve performance. This work was done during an informal project under Prof. Yaganti while studying at BITS.
This repository is a partial fulfilment of the requirements for the module of MSIN0114: Business Analytics Consulting Project/Dissertation for UCL School of Management.
Crime and Incarceration in the United States contain data on crimes that are committed, and the prisoner counts in every 50 states, for which the data is analyzed using various analytical methods.
Develop a classification model that can accurately diagnose the presence of kidney disease in a person based on their medical test results. The model will then identify which factors are the most influential in determining a person's chances of developing kidney disease.
An end-to-end project to analyze and model vehicle sale price data then productionize the best model to help people select a price to sell their vehicle.
Feature Importance of categorical variables by converting them into dummy variables (One-hot-encoding) can skewed or hard to interpret results. Here I present a method to get around this problem using H2O.
Feature selection is widely used in nearly all data science pipelines. Hence I have created functions that do a form of backward stepwise selection based on the XGBoost classifier feature importance and a set of other input values with the goal to return the number of features to keep in regard to a prefered AUC-score.
Unleashed the power of data science to analyze the performance of golfers from the PGA tour. Built ML models and compared Strokes Gained to traditional metrics, resulting in insightful findings and actionable recommendations for golfers at all levels. Showcased advanced data analysis, decision trees, and visualizations in this comprehensive project
High data dimensionality and irrelevant features can negatively impact the performance of machine learning algorithms. This repository implements the Permutation feature importance method to enhance the performance of some machine learning models by identifying the contribution of each feature used.