Skip to content

A toolkit for vision-language processing to support the increasing popularity of mulit-modal transformer-based models

License

Notifications You must be signed in to change notification settings

eltoto1219/vltk

Repository files navigation

Installation

To install (add editable for personal custimization)

git clone https://github.com/eltoto1219/vltk.git && cd vltk && pip install -e .

Alternatively:

pip install vltk

Documentation

The documentation is up! at vltk documentation

It is pretty bare bones for now, however first on the agenda to be added will be:

  1. Usage of adapters to rapidly create datasets.
  2. An overview of all the config options for automatically instantiating PyTorch dataloaders from one to many different datasets at once
  3. An overview of how dataset metadata is automatically + deterministically collected from multiple datasets
  4. Usage of modality prcoessors for language, vision, and language X vision which make it possible to universally load any visn, lang, visn-lang dataset.

Collaboration

There are many exciting directions and improvements I have in mind to make in vltk. While this is the "official" beginning of the project, please email me for any suggestions/collaboration ideas: antonio36764@gmail.com

About

A toolkit for vision-language processing to support the increasing popularity of mulit-modal transformer-based models

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published