Skip to content

A Github repository dedicated to containing a portfolio of data science projects and mini-projects made by myself for academic and self-enrichment purposes using the I-Python Notebook.

fawiyogo001/Data-Science-Portfolio-Python

Repository files navigation

Data Science Portfolio


Overview

The repository is dedicated to containing a portfolio of data science projects and mini-projects made by myself for academic and self-enrichment purposes using I-Python Notebook.

P.S. Pyplot and Choropleth can not be displayed using Github code viewer and have to be viewed using NoteBook Viewer (nbviewer)


Show this README in a custom theme


Tools:

  • Multi-Functional: Math, Numpy, Pandas, Scipy, String.
  • Data Visualization: Cufflinks, Matplotlib, Plotly, Seaborn.
  • Machine Learning: Scikit-learn, Statsmodels.api.
  • Natural Language Processing: NLTK.
  • Deep Learning: Tensorflow, Keras.

Categories:

  • Data Analysis: Calculus, Data Mining, Data Preprocessing, Data Visualization, Data Wrangling, Econometrics, Geographical Plotting, Linear Algebra, Linear Model, Statistics.
  • Machine Learning: Cross Validation, Decision Tree Classifier, K-Nearest Neighbors Classifier (KNN), Linear Regression, Logistic Regression, Naïve Bayes, Pipeline, Random Forest, Support Vector Machine (SVM).
  • Natural Language Processing: Count Vectorizer, Term Frequency Inverse Document Frequency, Text Processing.
  • Deep Learning: Callbacks Method, Dropout Method.

Contents:

  • 911 Calls

    Exploring the data of 911 Emergency Calls of all townships and boroughs in Montgomery County, Pennsylvania, the United States of America across three departments: EMS (Emergency Medical Service), Fire, and Traffic. The data is provided by montcoalert.org.

  • Boston Housing Price

    Exploring the affecting variables of changes in housing price in Boston in 1978, an analysis is conducted in order to find out what variables affect the price changes the most. From crime rate to nitric oxides concentration, all the variables play a part in the price variation.

  • E-Commerce Purchases

    Gaining insight of how current e-commerce system has affected customers' purchases as well as consumptions and thus, finding a way to improve the system. The analysis involves two datasets, customers data and purchases data for an analysis from customers' and purchases' point of view respectively.

  • Iris Classification

    Biologically classifying a flower with Genus name Iris into three different species: Iris Setosa, Iris Versicolor, and Iris Virginica using a dataset of observations performed by Sir Ronald Fisher in 1936. The classification utilizes four properties: sepal length, sepal width, petal length, and petal width.

  • Lending Club Loan

    Identifying and classifying the loan status from Lending Club company based on the borrowers' more than 100 criteria and information. Furthermore, the project will also include using simulation upon a new set of data to predict whether the borrowers will default on a loan or not.


  • Movies Recommender System

    Creating a system which utilizes the users' rating to filter as well as recommend the most "similar" movies by implementing Memory-Based Collaborative Filtering and Model-Based Collaborative Filtering. The dataset was obtained from MovieLens.

  • Spam Detection Filter

    Using a data from a corpus which was collected for research at the Department of Computer Science at the National University of Singapore (NUS), this project unravels two types of message: ham which is the normal non-spam text message and spam which is the irrelevant or unsolicited messages sent over the Internet (typically to a large number of users, for the purposes of advertising, phishing, spreading malware, etc.). The exploration starts from the characteristics of a text message to its classification.

  • Titanic Survival Analysis

    An analysis about the Titanic Tragedy in 1912, explores the general idea of what the dataset is about. While unraveling the causes of passengers' survival, this analysis will also show insights of how each variable such as sex, age and socio-economic status determine the passengers' survival.


About

A Github repository dedicated to containing a portfolio of data science projects and mini-projects made by myself for academic and self-enrichment purposes using the I-Python Notebook.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published