Data-analysis-star-projects-portfolio

This are my basic projects build from scratch, I have used Python for programming.

Iris Data set analysis -Machine learning project

In this project i have worked on iris data set with 150 samples of data ,and by using 6 different algorithms i have tried to find best fit for by prediction model ,later on i have used my built model on test data set to see the correctness of predictionand hence with charactteristics like sepal-length,petal-width etc 4 chracteristics we can predict the class of the flower.

Naive Bayes data-analysis

In this project i have applied the naives bayes algorithm to weather dataset which predicts whether it should go for playing or not ,diabetes prediction dataset and breast cancer detection dataset.

Parts of speech tagger

A parts of speech tagger where if we put a sentence in pos_tag function we get value of which parts of speech each word of that sentence belong to .Pos tagger is used for Grammar correction system ,Sentiment Analysis etc.

Product Recommender

I have used Term Frequency and Inverse Document Frequency (TF — IDF)and cosine_similarities to find the similarity between products in database and recommend common product to the selected product by consumer.

Stock market analysis using apple stock data base

I have used apple stock market latest data set from google.finance.com and applied regression models on it to check out the predictions.

Product recomender using image processing in matlab

I have used HSV and Gabor radon algorithm to extract texture and color features of a image and later calculate euclidean distance between the query vector and the database of feature factor of images and best 10 images would be displayed.

Human activity recognization

The Human Activity Recognition database was built from the recordings of 30 study participants performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors. The objective is to classify activities into one of the six activities performed.

Predicting Financial distress

Here I have used a totally imbalanced dataset 3.8% of data includes sample of company who are under financial distress and rest 96% of data is of company with stable financial state.I have shown many techniques which are used for handling unbalanced data like undersampling, oversampling,etc.You can also fork my kaggle kernel https://www.kaggle.com/rinki24/financial-distress-prediction

Analytics Vidhya : Loan Prediction III

A classification problem based on whether a person's application for a loan would be passed or rejected or if a person is eligible for the loan amount requested (If a bank wanted to automate the loan granting process).

WaterPump_Classification (Top 30% among the teams participated)

This DrivenData competition was for identification of Tanzmanian govertment's water data I have used CatBoost algorithm which is proven to be best boosting algorithm for dataset having categorical values and as boosting algorithm has added advantage on working good on less data .

Score: 0.7261

Metric used :

Classification Rate =1N∑Ni=0I(yi=yi^)

Competition link: https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/

Haptik data classification of small talk

Made a simple classfier to help chatbot understand whether a chat is small talk or not used python, nltk and sklearn.

Udacity ML competitions (https://www.kaggle.com/c/udacity-mlcharity-competition)

Made submission in udacity ml competition got 64th Rank in the leaderboard

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Financial Distress.csv		Financial Distress.csv
Hanpump_data_classification.ipynb		Hanpump_data_classification.ipynb
HistoricalQuotes.csv		HistoricalQuotes.csv
Iris Data set Analysis .ipynb		Iris Data set Analysis .ipynb
LICENSE		LICENSE
Loan_prediction.ipynb		Loan_prediction.ipynb
Naive Bayes data-analysis.ipynb		Naive Bayes data-analysis.ipynb
POS_from_scratch.ipynb		POS_from_scratch.ipynb
POS_tagger_various_lib.ipynb		POS_tagger_various_lib.ipynb
README.md		README.md
Recommendor_content_based_recommender (1).pptx		Recommendor_content_based_recommender (1).pptx
Recommendor_system.zip		Recommendor_system.zip
TF-IFD Recommendor.ipynb		TF-IFD Recommendor.ipynb
Training set labels.csv		Training set labels.csv
Training set values.csv		Training set values.csv
_upload_55512141-Colour and Texture Based Image Retrieval.pdf		_upload_55512141-Colour and Texture Based Image Retrieval.pdf
fashion house final.zip		fashion house final.zip
financial_distress_prediction.ipynb		financial_distress_prediction.ipynb
human-activity-recognization.ipynb		human-activity-recognization.ipynb
my_rinki_submission.csv		my_rinki_submission.csv
stock.py		stock.py
test.csv		test.csv
udacity-mlcharity-competition.zip		udacity-mlcharity-competition.zip
utf-8''Small_talk_classification.ipynb		utf-8''Small_talk_classification.ipynb

License

eaglewarrior/Data-analysis-python-projects

Folders and files

Latest commit

History

Repository files navigation

Data-analysis-star-projects-portfolio

About

Topics

Resources

License

Stars

Watchers

Forks

Languages