Skip to content

eaglewarrior/Data-analysis-python-projects

Repository files navigation

Data-analysis-star-projects-portfolio

This are my basic projects build from scratch, I have used Python for programming.

  1. Iris Data set analysis -Machine learning project

In this project i have worked on iris data set with 150 samples of data ,and by using 6 different algorithms i have tried to find best fit for by prediction model ,later on i have used my built model on test data set to see the correctness of predictionand hence with charactteristics like sepal-length,petal-width etc 4 chracteristics we can predict the class of the flower.

  1. Naive Bayes data-analysis

In this project i have applied the naives bayes algorithm to weather dataset which predicts whether it should go for playing or not ,diabetes prediction dataset and breast cancer detection dataset.

  1. Parts of speech tagger

A parts of speech tagger where if we put a sentence in pos_tag function we get value of which parts of speech each word of that sentence belong to .Pos tagger is used for Grammar correction system ,Sentiment Analysis etc.

  1. Product Recommender

I have used Term Frequency and Inverse Document Frequency (TF — IDF)and cosine_similarities to find the similarity between products in database and recommend common product to the selected product by consumer.

  1. Stock market analysis using apple stock data base

I have used apple stock market latest data set from google.finance.com and applied regression models on it to check out the predictions.

  1. Product recomender using image processing in matlab

I have used HSV and Gabor radon algorithm to extract texture and color features of a image and later calculate euclidean distance between the query vector and the database of feature factor of images and best 10 images would be displayed.

  1. Human activity recognization

The Human Activity Recognition database was built from the recordings of 30 study participants performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors. The objective is to classify activities into one of the six activities performed.

  1. Predicting Financial distress

Here I have used a totally imbalanced dataset 3.8% of data includes sample of company who are under financial distress and rest 96% of data is of company with stable financial state.I have shown many techniques which are used for handling unbalanced data like undersampling, oversampling,etc.You can also fork my kaggle kernel https://www.kaggle.com/rinki24/financial-distress-prediction

  1. Analytics Vidhya : Loan Prediction III

A classification problem based on whether a person's application for a loan would be passed or rejected or if a person is eligible for the loan amount requested (If a bank wanted to automate the loan granting process).

  1. WaterPump_Classification (Top 30% among the teams participated)

This DrivenData competition was for identification of Tanzmanian govertment's water data I have used CatBoost algorithm which is proven to be best boosting algorithm for dataset having categorical values and as boosting algorithm has added advantage on working good on less data .

Score: 0.7261

Metric used :

Classification Rate =1N∑Ni=0I(yi=yi^)

Competition link: https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/

  1. Haptik data classification of small talk

Made a simple classfier to help chatbot understand whether a chat is small talk or not used python, nltk and sklearn.

  1. Udacity ML competitions (https://www.kaggle.com/c/udacity-mlcharity-competition)

Made submission in udacity ml competition got 64th Rank in the leaderboard