Project: STEDI-Human-Balance-Analytics

Problem Statement

The STEDI Team has been hard at work developing a hardware STEDI Step Trainer that:

trains the user to do a STEDI balance exercise
has sensors on the device that collect data to train a machine-learning algorithm to detect steps
has a companion mobile app that collects customer data and interacts with the device sensors

STEDI has heard from millions of early adopters who are willing to purchase the STEDI Step Trainers and use them.

Several customers have already received their Step Trainers, installed the mobile application, and begun using them together to test their balance. The Step Trainer is just a motion sensor that records the distance of the object detected. The app uses a mobile phone accelerometer to detect motion in the X, Y, and Z directions.

The STEDI team wants to use the motion sensor data to train a machine learning model to detect steps accurately in real-time. Privacy will be a primary consideration in deciding what data can be used.

Some of the early adopters have agreed to share their data for research purposes. Only these customers’ Step Trainer and accelerometer data should be used in the training data for the machine learning model.

Project Discription

In this project I extracted data produced by the STEDI Step Trainer sensors and the mobile app, and curated them into a data lakehouse solution on AWS. The intent is for Data Scientists to use the solution to train machine learning models.

The Data lake solution is developed using AWS Glue, AWS S3, Python, and Spark for sensor data that trains machine learning algorithms.

AWS infrastructure is used to create storage zones (landing, trusted and curated), data catalog, data transformations between zones and queries in semi-structured data.

Project Datasets

Customer Records: from fulfillment and the STEDI website.
Step Trainer Records: data from the motion sensor.
Accelerometer Records: data from the mobile app.

Implementation

Landing Zone

In the Landing Zone I stored the customer, accelerometer and step trainer raw data in AWS S3 bucket.

Using The AWS glue data catalog, I created a glue tables so that I can query the data using AWS athena.

1- Customer Landing Table:

2- Accelerometer Landing Table:

3- Step Trainer Landing Table:

Trusted Zone

In the Trusted Zone, I created AWS Glue jobs to make transofrmations on the raw data in the landing zones.

Glue job scripts

1. customer_landing_to_trusted.py - This script transfers customer data from the 'landing' to 'trusted' zones. It filters for customers who have agreed to share data with researchers.

2. accelerometer_landing_to_trusted_zone.py - This script transfers accelerometer data from the 'landing' to 'trusted' zones. Using a join on customer_trusted and accelerometer_landing, It filters for Accelerometer readings from customers who have agreed to share data with researchers.

3. Trainer_landing_to_trusted.py - This script transfers Step Trainer data from the 'landing' to 'trusted' zones. Using a join on customer_curated and step_trainer_landing, It filters for customers who have accelerometer data and have agreed to share their data for research with Step Trainer readings.

The customer_trusted table was queried in Athena to show that it only contains customer records from people who agreed to share their data.

Curated Zone

In the Curated Zone I created AWS Glue jobs to make further transformations, to meet the specific needs of a particular analysis.

Glue job scripts

customer_trusted_to_curated.py - This script transfers customer data from the 'trusted' to 'curated' zones. Using a join on customer_trusted and accelerometer_landing, It filters for customers with Accelerometer readings and have agreed to share data with researchers.

Trainer_trusted_to_curated.py: This script is used to build aggregated table that has each of the Step Trainer Readings, and the associated accelerometer reading data for the same timestamp, but only for customers who have agreed to share their data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDL

DDL

Datasets

Datasets

Screenshots

Screenshots

README.md

README.md

Trainer_landing_to_trusted.py

Trainer_landing_to_trusted.py

Trainer_trusted_to_curated.py

Trainer_trusted_to_curated.py

accelerometer_landing_to_trusted_zone.py

accelerometer_landing_to_trusted_zone.py

customer_landing_to_trusted.py

customer_landing_to_trusted.py

customer_trusted_to_curated.py

customer_trusted_to_curated.py

Repository files navigation

Project: STEDI-Human-Balance-Analytics

Contents

Problem Statement

Project Discription

Project Datasets

Implementation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
DDL		DDL
Datasets		Datasets
Screenshots		Screenshots
README.md		README.md
Trainer_landing_to_trusted.py		Trainer_landing_to_trusted.py
Trainer_trusted_to_curated.py		Trainer_trusted_to_curated.py
accelerometer_landing_to_trusted_zone.py		accelerometer_landing_to_trusted_zone.py
customer_landing_to_trusted.py		customer_landing_to_trusted.py
customer_trusted_to_curated.py		customer_trusted_to_curated.py

ibrahimmoursy/STEDI-Human-Balance-Analytics

Folders and files

Latest commit

History

Repository files navigation

Project: STEDI-Human-Balance-Analytics

Contents

Problem Statement

Project Discription

Project Datasets

Implementation

About

Topics

Resources

Stars

Watchers

Forks

Languages