This repository contains a collection of data analysis and predictive modelling projects I have completed for The Shepherd Centre. It contains 4 examples of the types of projects that I proactively initiated and completed.
In this project, we conducted an AB test on a Christmas Direct mail call for donations, where we analysed the effectiveness and profitability of an 'inflated' ask amount grid vs. a standard ask amount grid across our donor segmentation model. We concluded that the 'inflated' ask amounts were effective, but the analysis is limited to the immediate campaign. More work would be required to analyse the affect of the test on longer term donor behaviour. This analysis was conducted with Python, using pandas, numpy, scipy stats and stats models libraries.
2. An exploration of Binary classification supervised learning models of donors to the Tax time Direct marketing campaign.
In this project, we asked the question 'Is it possible to develop a machine learning model that can select the 'best' donors for the tax marketing campaign? To do this, we evaluated the effectiveness of 3 popular classification algorithms: Logistic Regression, Random Forest, and K-Nearest neighbours. We found that the models in general, did not perform significantly better than our donor segmentation model. This analysis was conducted in Python, sci-kit learn and stats models.
It was considered common knowledge amongst the clinical team at The Shepherd Centre that a child's language ability at 3 years of age is an indicator of a child's language at 5 years of age. In this project, we conducted a simple linear regression to see whether this assumption is generally correct based on available data. We concluded that there is evidence to suugest that this statement is valid. This work was conducted in R.
To view the language prediction html file in rendered form visit: http://htmlpreview.github.io/?https://github.com/yeekayau/tsc_data_analysis/blob/master/LanguageAt3PredictLanguageAt5.html
In this project, we evaluated the validity of an in-house developed clinical tool called the functional listening index. We evaluated three forms of validity: internal, concurrent and predictive validity. To study the internal validity of the tool, we conducted two non-parametric statistical tests. To study the concurrent and predictive validity we used linear regression modelling. We showed that the functional listening index showed evidence of all three forms of validity. This work was conducted in Python using pandas, numpy, Sci-Kit learn and stats models libraries.