Curriculum Learning With Infant Egocentric Videos (Neurips 2023 Spotlight)

Saber Sheybani, Himanshu Hansaria, Justin Newell Wood, Linda B. Smith, Zoran Tiganj

Links: Paper

Abstract:

Infants possess a remarkable ability to rapidly learn and process visual inputs. As an infant's mobility increases, so does the variety and dynamics of their visual inputs. Is this change in the properties of the visual inputs beneficial or even critical for the proper development of the visual system? To address this question, we used video recordings from infants wearing head-mounted cameras to train a variety of self-supervised learning models. Critically, we separated the infant data by age group and evaluated the importance of training with a curriculum aligned with developmental order. We found that initiating learning with the data from the youngest age group provided the strongest learning signal and led to the best learning outcomes in terms of downstream task performance. We then showed that the benefits of the data from the youngest age group are due to the slowness and simplicity of the visual experience. The results provide strong empirical evidence for the importance of the properties of the early infant experience and developmental progression in training. More broadly, our approach and findings take a noteworthy step towards reverse engineering the learning mechanisms in newborn brains using image-computable models from artificial intelligence.

Code base Organization

baby-vision-curriculum
└── pretraining: python code used for pretraining the models with various objectives and architectures
│   ├── generative
│   |   └── pretrain_videomae.py
│   ├── predictive
│   |   └── pretrain_jepa.py
│   └── contrastive
│       └── pretrain_simclr.py
│
└── benchmarks: python code used for benchmarking any checkpoint on the tasks
│   ├── compute_embeddings_videomae.py
│   ├── compute_embeddings_jepa.py
│   ├── compute_embeddings_simclr.py
│
└── slurmscripts: linux bash code used for submitting jobs that train and evaluate models.
│
├── notebooks: Jupyter notebook files used for creating the figures in the manuscript.
    └── EvaluateEmbeddings.ipynb

Dependencies:

pytorch
torchvision
huggingface transformers (for VideoMAE)
tqdm

VideoMAE models need to be pretrained on multiple GPUs as they take up substantial GPU memory. We use PyTorch DistributedDataParallel for that.

Citation

  @inproceedings{sheybani2023curriculum,
  title={Curriculum Learning With Infant Egocentric Videos},
  author={Sheybani, Saber and Hansaria, Himanshu and Wood, Justin Newell and Smith, Linda B and Tiganj, Zoran},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
benchmarks		benchmarks
notebooks		notebooks
pretraining		pretraining
slurmscripts		slurmscripts
.gitignore		.gitignore
README.md		README.md
fig_overview_web.png		fig_overview_web.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks

benchmarks

notebooks

notebooks

pretraining

pretraining

slurmscripts

slurmscripts

.gitignore

.gitignore

README.md

README.md

fig_overview_web.png

fig_overview_web.png

Repository files navigation

Curriculum Learning With Infant Egocentric Videos (Neurips 2023 Spotlight)

Links: Paper

Abstract:

Code base Organization

Dependencies:

Citation

About

Releases

Packages

Contributors 2

Languages

ssheybani/baby-vision-curriculum

Folders and files

Latest commit

History

Repository files navigation

Curriculum Learning With Infant Egocentric Videos (Neurips 2023 Spotlight)

Links: Paper

Abstract:

Code base Organization

Dependencies:

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages