Skip to content

KelvinJC/machine-learning-ETL-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Machine Learning ETL Pipeline

A data transformation job on data sourced from a MongoDB database.
Input data was a deeply nested JSON format from a MongoDB source system.
During the transformation stage of the ETL, the data was normalised into structured relational format
for subsequent feature engineering and analysis to build a prediction model.
The result dataset had over 56,000 features (i.e. columns).

Languages and Libraries

  • Python
  • Jupyter Notebook
  • Pandas
  • Flatten_json

About

A Jupyter notebook documentation of an ETL (extract -> transform -> load) data pipeline

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published