Skip to content
View richhuwtaylor's full-sized avatar

Organizations

@CodeForSocialGood
Block or Report

Block or report richhuwtaylor

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
richhuwtaylor/README.md

3 years of Japanese language flashcard reviews

πŸ™‹β€β™‚οΈ Introducing Myself

Hello! I'm Rich, a data analyst with a background in data product management and software development.

This portolio page is where I track my personal development and showcase what I've been working on.

🌱 Portfolio Projects

A project demonstrating techniques for understanding and predicting customer churn for a simulated social network. By combining product usage analytics event data with customer subscription data, we can reveal which usage behaviours have the biggest impact on churn probability and predict the churn probability of individual accounts.

Cohort Analysis | Logistic Regression | Postgres | SQL

Uses AWS components to combine a SQL database with streaming event data and transform it for analysis with Amazon Athena.

Data Engineering | Kinesis Firehose | S3 | AWS Glue | AWS Lambda

A data cleaning and logistic regression pipeline implemented in PySpark which examines which aspects of an Epicurious recipe are important in determining whether or not the recipe is for a dessert. The final pyspark.ml Pipeline uses custom-made Transformers and Estimators for missing value imputation and outlier capping.

PySpark ML | Logistic Regression | Data Cleaning

A data pipeline orchestrated across AWS and Google Cloud Services using mage.ai for data transformation. The project visualises a month of trips made by licensed yellow cabs in January 2023 in New York in Looker Studio.

Data Engineering | EC2 | BigQuery | Looker Studio

A Microsoft Power BI business intelligence dashboard for AdventureWorks, a fictional global manufacturing company that produces cycling equipment and accessories. The data was derived from the AdventureWorks sample databases available from Microsoft.

Power BI | M Formula Language | Power Query | DAX

πŸ“– Materials I've Found Helpful

Where appropriate, I include links to my own solutions to "end of chapter" exercises.

SQL

Anthony DeBarros - Practical SQL, 2nd Edition
(my chapter solutions)

Statistics

James, Witten, Hastie, Tibshirani and Taylor - An Introduction to Statistical Learning with Applications in Python
(my solutions for labs and end-of-chapter exercises)

Thomas Haslwanter - An Introduction to Statistics with Python
(my chapter solutions)

Maven Analytics - Statistics for Data Analysis
(my solutions and notes from mid-course projects)

Spark

Jonathan Rioux - Data Analysis with Python and PySpark
(my chapter solutions)

πŸ‘‹πŸ» Connect with Me on LinkedIn

Pinned

  1. fighting-churn fighting-churn Public

    A project demonstrating how to predict customer churn based on product usage behaviours.

    Jupyter Notebook 4

  2. dessert-or-not dessert-or-not Public

    A PySpark ML pipeline for working out which attributes of an Epicurious recipe are important in determining whether or not it is a dessert.

    Jupyter Notebook

  3. adventure-works adventure-works Public

    A business intelligence dashboard for the fictional AdventureWorks bike manufacturer, built in Microsoft Power BI.

    1 1

  4. analysis-with-python-and-pyspark analysis-with-python-and-pyspark Public

    My solutions and notes to the mid-chapter and end-of-chapter exercises in Jonathan Rioux' Data Analysis with Python and PySpark.

    Jupyter Notebook 1

  5. practical-sql practical-sql Public

    My solutions to the exercises and notes from Anthony DeBarros' Practical SQL (2nd Edition)

    PLpgSQL

  6. statsintro_python statsintro_python Public

    My solutions to the end-of-chapter exercises from Thomas Haslwanter's An Introduction to Statistics with Python.

    Jupyter Notebook 1