Locomotion data lake

This work is embedded within the Big Data project of Breed4Food (http://breed4food.com). We experiment with data lake stack for storing and analysing sensor data, using an animal experiment as use case to have improved scalability, modularity, and interoperability. This repository includes the code (notebooks and scripts) of the corresponding paper, currently under review.

The use case was an animal experiment in which the gait score of 84 turkeys was determined.

Gait scoring is traditionally performed by an expert. In this experiment different type of sensors were used to explore to what extent sensors can describe or mirror the gait score of an expert.

Data & Sensors

Gait score (Visually trained person)
Body Weight (Weighing scale)
Force Plate (Kistler)
Accelerometers / inertial measurement units {IMUs} (Xsense MTw awinda)
3D Video camera (Intel Realsense D415)

An Extract, Transform, and Load (ETL) procedure, is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s). Extract: retrieve data from a source Transform: Converting retrieved data according to rules and lookup tables or creating combinations of data from different sources Load: save the data in a different location The ETL procedure will become more important because we need to handle ever increasing datasets, varying data structures, as well as heterogeneous and multimodal data. In the animal experiment different data types were acquired by each sensor. For example for the Force Plate (Kistler) this were binary files, so called Technical Data Management Streaming (TDMS) files, and this file format was generated to help engineers and scientists to properly store the large amounts of data they generate during simulation(s) and test(s). In our data lake stack we want to be able to scale up the ETL procedure for each sensor, so when large number of animals are being investigated with a certain sensor, we can minimize the time for the ETL procedure.

In short, we want to combine different sensors in a massive way (and extract features), as well as going from proprietary formats to FAIR data. When these data are loaded it will be possible to visualize these data and perform a linear regression (and Machine Learning)

Original data are proprietary. An subset of 3 turkeys has been released as open access on Zenodo, and can be used to run the code in this repository. The whole solution can be launched through Binder. To launch the notebooks, click on the button

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
files		files
notebooks		notebooks
scripts		scripts
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

files

files

notebooks

notebooks

scripts

scripts

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Locomotion data lake

About

Releases

Packages

Languages

License

ionathan/datalake-locomotion

Folders and files

Latest commit

History

Repository files navigation

Locomotion data lake

About

Resources

License

Stars

Watchers

Forks

Languages