EyeTism - Eye Movement Based Autism Diagnostics

Authors: Elena Ockfen, Dennis Dombrovskij, Mariano Santoro, Stefan Schlögl, Adam Zabicki

This repository contains the work of the Capstone project "EyeTism - Eye Movement Based Autism Diagnostics", developed within the intensive Data Science Bootcamp provided by neuefische GmbH.

Description

Our "EyeTism" project focused on the development of a tool for diagnosis of Autism Spectrum Disorder (ASD) in children within the age range of 8 - 15 years old. ASD is a developmental disability with effects on social interaction and learning. Hence, early diagnosis of affected children is crucial for child development. Although individuals with ASD often exhibit distinct gaze behavior compared to typically developing (TD), ASD detection still remains challenging. Our tool employs machine learning on eye tracking data from high-functioning ASD and TD children to build an integrative tool for pediatricians responsible for diagnosing ASD based on visual attention patterns of patients on a selected subset of images.

Data source

Gaze behaviors of 14 patients with ASD and 14 TD were analyzed when exposed to diverse visual stimuli. 300 images composed the Saliency4ASD dataset (https://saliency4asd.ls2n.fr/datasets/) featuring diverse scenes:

40 images featuring animals
88 with buildings or objects
20 depicting natural scenes
36 portraying multiple people in one image
41 displaying multiple people and objects in one image
32 with a single person
43 with a single person and objects in one image

Reference dataset: H. Duan, G. Zhai, X. Min, Z. Che, Y. Fang, X. Yang, J. Gutiérrez, P. Le Callet, “A Dataset of Eye Movements for the Children with Autism Spectrum Disorder”, ACM Multimedia Systems Conference (MMSys’19), Jun. 2019

Roadmap - from data to final models

0. clone this repo

git clone git@github.com:eockfen/EyeTism.git
cd EyeTism

1. Python Environment

Open the terminal
Create a virtual environmant with the tool of your choice
Install Python 3.11.3
Depending on how you manage your virtual environments, either install the all dependencies
- via conda:
  - conda env create -f environment.yml
- or via pip:
  - pip install -r requirements.txt
important notes:
- to install dlib you need to have CMake and a working C++ compiler installed
- in case your are on a Mac and run into Problems while pip-installing lightgbm, it could be that brew install libomp see here helps

2. Extract data from .zip files

Download the following .zip archives and store them in the /source folder:
Run python script in /scripts folder:

cd ./scripts
python unzip_data.py

This will extract the full Saliency4ASD dataset, as well as the saliency predictions of the 300 images for three different visual attentive models: DeepGazeIIE (repo) & the ResNET and VGG versions of SAM (repo).

Re-do saliency predictions

The extracted zip files contain the already generated saliency maps predicted by DeepGazeIIE and SAM, but your are able to reproduce our steps we did in order to obtain these maps.
Originally downloaded saliency prediction maps of the SAM model had different names as the images in the Saliency4ASD dataset, therefore the following steps were performed:
- Matching differently named files to the salency4asd files
- Renaming / copying the saliency predicted maps
DeepGazeIIE predictions were done by implementing their actual model.
To re-do these steps, run the following code:

cd ./scripts
python unzip_data.py sam
python prepare_saliency_maps.py sam
python prepare_saliency_maps.py dg

3. Extract all features

Check out and run the extract_features.iypnb notebook in the /notebooks folder.
Extracted features will be saved in /data/df_deep_sam.csv file. This process can approximately take a few hours, depending on your machine.
If you are as inpatient as me and don't want to waste precious time waiting for our "slow) script to calculate all the features, you can download the df_deep_sam.csv and store it in the /data folder

After running the notebook, three outputs are generated:

All individual scanpaths are overlayed onto the stimuli images.
All detected objects (whose probability scores will be saved in a .txt file) and faces are overlayed onto the stimuli images.
Individual scanpaths, detected objects and faces are overlayed onto the stimuli images.

Outputs will be saved in /data/obj_detection folder and in the /data/individual_scanpaths folder, respectively.

4. Exploratory Data Analysis

will be provided

5. Baseline model

Check out the notebook baseline.iypnb in the /notebooks folder to run the baseline model and see the results.

6. Construction of Classifiers

The final models were selected after evaluating the 30-image-test-set by defining the best model-image-pairs, as detailed in the notebooks in the /modeling folder

The results were generated as reported:

In notebook create_basemodel_pipelines.ipynb
- All models use a different set of features, therefore pipelines are built to being able to also generate stacking and voting classifiers
- This results in uncalibrated basemodels of RF, XGBoost and SVC, which are saved in /models/uncalibrated_pipelines/<MODEL>_uncalib.pickle
In notebook calib_RF_XGB_SVC_threshold.ipynb
- models mentioned in 1. are calibrated
- threshold analysis is performed to find the optimal decision thresholds for each model in order to maximize f1 score
- calibrated models are saved in /models/calibrated/<MODEL>_calib.pickle
In notebook voting_RF_XGB_SVC_threshold.ipynb
- voting classifier is built on top of the previous calibrated models (RF, XGBoost and SVC)
- also, the optimal (max. f1) threshold is found for this voting classifier
- voting model is saved in /models/calibrated/VTG_calib.pickle
In notebook stacking_<MODEL>_calib.ipynb
- stacking classifiers are built for 4 different final estimators
  - Logistic regression (LR)
  - K-nearest neighbors (KNN)
  - Light gradient boosting machine (LGBM)
  - Naive Bayes (NB)
- base estimators are the calibrated basemodels RF, XGBoost and SVC
- stacking models are saved in /models/calibrated/stacking_<MODEL>_calib.pickle
In notebook stacking_thresholding.ipynb
- threshold analysis is performed for the four stacking models

7. Final evaluation on 30 test images

The 8 models developed are then evaluated on our 30-image-test-set as reported in the notebook FINAL_EVALUATION.ipynb.

We selected 9 images, and defined the optimal models to classify the eye tracking data for the respective image. The following figure shows the model performance for each of these selected images:

Overall, the performace metrics for our diagnostic tool are:

f2-score: 90.5 %
accuracy: 82.1 %

Dashboard

To showcase the basic functionality of our diagnostic tool, we've constructed a Streamlit application. If you're inclined towards practical demonstrations rather than delving into intricate code details, this application is tailor-made for you. Feel free to explore and experience the practical side of our project!

To delve into its workings, you have two options:

Local Installation:
- cloning this repository onto your system
- next, establish a virtual environment to ensure a clean and isolated setup
- finally, initiate the dashboard by cd Dashboard and executing the command streamlit run app.py within your terminal
- this method allows you to explore the tool's capabilities firsthand, right from the comfort of your own machine
Online Access:
- Prefer a hassle-free experience? Look no further!
- Simply visit EyeTism to access the application online.

Whichever route you choose, we hope this demonstration offers valuable insights into the potential of diagnostic tools and inspires further exploration in the realm of data-driven solutions.

Presentation

We had the opportunity to present our Capstone project at the graduation event of the Neuefische Data Science Bootcamp. You can download the slides, or even watch our presentation on Youtube.

Acknowledgements

All authors express their profound gratitude to the coaches and the organization of neuefische GmbH

folder navigaton

/CNN

This folder contains the work done for the CNN modelling part (not integrated in the workflow)
README.md can navigate you through its content

/Dashboard

This folder contains the streamlit application we designed to demonstrate how a simple version of a diagnostic tool could look like.
You can either clone this repository, install a virtual environment and run the dashboard by yourself via streamlit run /Dashboard/app.py
or, you can visit the online version at LINK WILL FOLLOW

/data

All the generated data while running the scripts and notebooks will be saved here.

/images

In this folder you will find:
- final_set.png contains the final set of images
- test_set.png contains set of images used for generating the predictions of the models
- val_set.png was another candidate for the test-set
- figures which are used in this README

/modeling

In this folder you will find:
- the subfolder /dev where several models were developed, trained and tested, but not made it into the final set of models
- the notebooks generated for the 8 final models, containing the pipelines to realize voting and stacking classifiers (see Roadmap above)
- the final evaluation of the models FINAL_EVALUATION.ipynb

/models

In this folder you find the subfolders:
- /devcontains subfolders with all the models generated during the development, finetuning and optimization phase as picklefiles
- /mediapipe contains mediapipe models used for object detection
- /uncalibrated_pipelines contains uncalibrated models as picklefiles
- /calibrated contains the calibrated models as picklefiles

/notebooks

In this folder you find the notebooks generated for the EDA, the baseline modeling part, and the extraction of the features

/scripts

This folder contains all scripts and function used by different notebooks

Name		Name	Last commit message	Last commit date
Latest commit History 400 Commits
CNN		CNN
Dashboard		Dashboard
data		data
ignore_this		ignore_this
images		images
modeling		modeling
models		models
notebooks		notebooks
scripts		scripts
source		source
.gitignore		.gitignore
EyeTism_presentation.pdf		EyeTism_presentation.pdf
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

License

CrazyTrain93/Eyetism-MLProject

Folders and files

Latest commit

History

Repository files navigation

EyeTism - Eye Movement Based Autism Diagnostics

Authors: Elena Ockfen, Dennis Dombrovskij, Mariano Santoro, Stefan Schlögl, Adam Zabicki

Table of contents

Description

Data source

Roadmap - from data to final models

0. clone this repo

1. Python Environment

2. Extract data from .zip files

3. Extract all features

4. Exploratory Data Analysis

5. Baseline model

6. Construction of Classifiers

7. Final evaluation on 30 test images

Dashboard

Presentation

Acknowledgements

folder navigaton

About

Topics

Resources

License

Stars

Watchers

Forks

Languages