Skip to content

This holds all my personal data-related project's (Automation, Modelling, Analysis)

Notifications You must be signed in to change notification settings

DeDeDeDer/Personal_Projects

Repository files navigation

Personal_Projects

This GitHub holds all my personal project's that I have worked on as a past time. Project's are mainly focused on Data Science, Insurance Pricing & Reserving fields.
A mapping of these are laid out below.

Mapping

Scraped Data Analysis Articles Simulators & Kernels
Python Scripts Article Profile Python Scripts
Geo-Visual SG Housing Prices
ScreenShot
Predictive Modelling
ScreenShot
Claims Simulator
ScreenShot
Box-Plot SG Housing Prices
ScreenShot
Web-Scraping Workflow
ScreenShot
FOREX & ML Algorithms
Private Insurance 14-Years
FOREX ML Algorithms Workflow
ScreenShot
Pending Stock Screener
Public Insurance 14-Years


For more info..

ScreenShot

Insurance (Pricing) & Data Science


What is Predictive Modelling?


It is simply the framework to integrate past data & statistics to predict future outcomes or project liabilities. There are 4 main techniques; Bayesian, Decision Trees, Support Vector Machines & Neural Networks. My project's utilizes mainly Bayesian & Decision Tree techniques. Hence, focused primarily on linear regression models.

.


ScreenShot

An article publication aimed at explaining concepts to:
1. Generalised structure to Predictive Modelling
2. Alternative interpretations to various statistical model metrics

The article follows the generalized framework of:

    Data preparation
    - Preliminary data analysis, executing 4-Tier's of data cleaning. (Correct, Complete, Create, Convert)
    Exploratory Data Analysis
    - Uni- Bi- & Multi- Analysis
    Model Preparation
    - Data stratified Train/Test splits, Hyper parameter tuning, parameter evaluation metrics.
    - Feature Engineering (Quantity & Quality), Feature evaluation metrics
    Predictive Modelling (Classification Problem)
    - Ensembles (Hard & Soft Voting)

Click To View

What is Web Scraping?


In short, it is simply the automated process of extracting data from the web. Subsequently, cleaning any irregularities & conducting Exploratory Data Analysis to spot Trends & Patterns.

.

Python Web Scraping PDF & Data Cleaning (Part 1)

Article or Python Code

ScreenShot

A Python Kernel written to automate repetitive clicking of 1,228c URLs & converting 1,000c PDF Tables into CSV to compile data.


Contents:

    1. Collate online source code URLs & sub-page URLs
    2. Download online data via URLs
    3. Convert & Neaten PDF Table into CSV
    4. Compile all CSV Tables

Click To View

.


ScreenShot

After extracting Annual Insurance Data Returns in the Part 1 series, we proceed to analyze the data.


Contents:

    Patterns
    1. Benchmark Range of ROC on Expense & Loss Ratios
    Trends
    2. Growing reinsurance ceded abroad beyond the ASEAN region
    3. Declining averages for Earned Premiums & Claims Incurred (with falling inflation rates)
    4. Average ROC, Expense & Loss Ratios

Click To View

What is Exploratory Data Analysis?


It is simply the analyzing of data sets to summarize characteristics & patterns. These include Uni- Bi- & Multi- Variate Analysis. Often discovering underlying relationships that conventional models overlook.

.


ScreenShot


EDA Summary


1. Those who have had past experience of financial distress (target variable):
>Made lesser loans or exceed deadlines
>Tend to have lesser dependents & debt ratio & net worth
>As expected are of lower-tier income, But lower debt ratio


2. Ignoring mortality and time value of money (i.e.Annuities)
>Debt ratio & Net worth shows gaussian distribution against age


3. Those who had acts of debt delinquency (Made loans or exceed deadlines)
>Tend to be from the higher-tier income or Retired


4. Others
>The higher the income, the higher the debt ratio
>The higher the income, the lower the dependents


Click To View

What is General Linear Modelling?


It is simply applying the fundamental straight line concept of a Y = mx + C. In other words, the idea that variable relationships are 1-dimensional (positive or negative).

.


ScreenShot


A Python Kernel aimed to:

    1. Get a better understanding of the simplified predictive modelling framework
    2. Grasp the logic behind different coding methods & concise techniques used
    3. Comparisons between different models

    Coding Techniques :
    A.List comprehensions
    B.Samples to reduce computational cost
    C.Concise 'def' functions that can be used repetitively
    D.Pivoting using groupby
    E.When & How to convert and reshape dictionary’s into lists or dataframes
    F.Quickly split dataframe columns
    H.Loop Sub-plots
    I.Quick Lambda formulae functions
    J.Quick looping print or DataFrame conversion of summative scores
    K.Order plot components
    L.Create & Plot Bulk Ensemble comparative results

Click To View



Insurance (Reserving)


Claim Simulations


In short, this projects contains a Python Kernel to automate the probabilistic claims simulation process for actuarial reserving calculations.
Reserving Method Used: Inflation Adjusted Chain Ladder

.

Claims Simulation

Article or Python Code Guide or Python Code v2

ScreenShot


Present: Simulation supports Claim Numbers (Poisson, Negative Binomial) & Amounts (Gaussian, LogNormal).
Ongoing:
1. Support Bornhuetter-Ferguson Method (BF).


Contents:

    0. Assumptions
    1. Development-Year lags
    2. Incremental & Cumulative claim amounts
    3. Uplift past inflation for incremental amounts & Derive cumulative
    4. Individual Loss Development Factors (LDFs)
    5. Raw preliminary view of triangle
    6. Establish predicted lag years data frame
    7. Impute latest cumulative amounts
    8. Simple Mean & Volume Weighted LDFs & 5/3 Year Averages & Select
    9. Predict future cumulative amounts
    10. Calculate incremental amounts
    11. Project future inflation for incremental amounts
    12. Reserve summation

Click To View



Microsoft Package


Microsoft Package


Prior to learning Python coding language, I had to refine the basics. Since Excel & VBA are broadly deemed essential skill-sets, I thought I build some personal models. Ideas are inspired whilst at my work placement tenure at a consultancy company. The main objective was to ease manual & repetitive tasking's.

.

Word Documentations

Spreadsheet or Excel VBA Code

ScreenShot



A reproducible Excel VBA programme that automates bulk simultaneous word document mail merges. Data entry checks (file exists etc.) & cleaning (excess spaces, invalid file directory ...) are done by the coding as well. This code does NOT use the standard mail merge function that operates ONLY on 1-single document. Instead allows running on mass word documentations.


Inspiration:
Whilst assisting my previous employer to prepare clients for the European General Data Protection Regulations (GDPR) privacy documentations, I created this programme to streamline over 30hours of manual work.

.

Outlook Communications

Spreadsheet or Excel VBA Code

ScreenShot



A reproducible Excel VBA programme that automates multiple simultaneous email communications if recipients receive overlapping/same attachments or spreadsheet tables.


Inspiration:
A responsibility of mine at a previous company involved weekly roll-forward projection updates. I found this repetitive & build this model to automate the job. It mitigated manual human input errors & eased the job handing over process.