The repository is dedicated to containing a portfolio of data science projects and mini-projects made by myself for academic and self-enrichment purposes using I-Python Notebook.
P.S. Pyplot and Choropleth can not be displayed using Github code viewer and have to be viewed using NoteBook Viewer (nbviewer)
Show this README in a custom theme
Tools:
- Multi-Functional: Math, Numpy, Pandas, Scipy, String.
- Data Visualization: Cufflinks, Matplotlib, Plotly, Seaborn.
- Machine Learning: Scikit-learn, Statsmodels.api.
- Natural Language Processing: NLTK.
- Deep Learning: Tensorflow, Keras.
Categories:
- Data Analysis: Calculus, Data Mining, Data Preprocessing, Data Visualization, Data Wrangling, Econometrics, Geographical Plotting, Linear Algebra, Linear Model, Statistics.
- Machine Learning: Cross Validation, Decision Tree Classifier, K-Nearest Neighbors Classifier (KNN), Linear Regression, Logistic Regression, Naïve Bayes, Pipeline, Random Forest, Support Vector Machine (SVM).
- Natural Language Processing: Count Vectorizer, Term Frequency Inverse Document Frequency, Text Processing.
- Deep Learning: Callbacks Method, Dropout Method.
-
Exploring the data of 911 Emergency Calls of all townships and boroughs in Montgomery County, Pennsylvania, the United States of America across three departments: EMS (Emergency Medical Service), Fire, and Traffic. The data is provided by montcoalert.org.
-
Exploring the affecting variables of changes in housing price in Boston in 1978, an analysis is conducted in order to find out what variables affect the price changes the most. From crime rate to nitric oxides concentration, all the variables play a part in the price variation.
-
Gaining insight of how current e-commerce system has affected customers' purchases as well as consumptions and thus, finding a way to improve the system. The analysis involves two datasets, customers data and purchases data for an analysis from customers' and purchases' point of view respectively.
-
Biologically classifying a flower with Genus name Iris into three different species: Iris Setosa, Iris Versicolor, and Iris Virginica using a dataset of observations performed by Sir Ronald Fisher in 1936. The classification utilizes four properties: sepal length, sepal width, petal length, and petal width.
-
Identifying and classifying the loan status from Lending Club company based on the borrowers' more than 100 criteria and information. Furthermore, the project will also include using simulation upon a new set of data to predict whether the borrowers will default on a loan or not.
-
Creating a system which utilizes the users' rating to filter as well as recommend the most "similar" movies by implementing Memory-Based Collaborative Filtering and Model-Based Collaborative Filtering. The dataset was obtained from MovieLens.
-
Using a data from a corpus which was collected for research at the Department of Computer Science at the National University of Singapore (NUS), this project unravels two types of message: ham which is the normal non-spam text message and spam which is the irrelevant or unsolicited messages sent over the Internet (typically to a large number of users, for the purposes of advertising, phishing, spreading malware, etc.). The exploration starts from the characteristics of a text message to its classification.
-
An analysis about the Titanic Tragedy in 1912, explores the general idea of what the dataset is about. While unraveling the causes of passengers' survival, this analysis will also show insights of how each variable such as sex, age and socio-economic status determine the passengers' survival.