Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vision dataset downloading #2910

Merged
merged 10 commits into from
May 21, 2024
Merged

Vision dataset downloading #2910

merged 10 commits into from
May 21, 2024

Conversation

rdondera-microsoft
Copy link
Contributor

@rdondera-microsoft rdondera-microsoft commented May 15, 2024

Support downloading vision + multimodal datasets in the dataset downloader component.

Implementation
It is not possible to directly save vision HF datasets to JSONL (as is done for NLP) because binary image data is not compatible with this setup. The solution was to keep the JSONL format but use a URL field that refers to image data saved to the default datastore. For each HF dataset, a specific adapter class extracts a label and a PIL image for a dataset instance and common code uploads the image, passes through the fields etc.

Extensions
New vision/multimodal datasets from HF can be added by adding new dataset adapters.

Tests
Ran the benchmarking pipeline e2e in the cloud: https://ml.azure.com/experiments/id/9f7237b4-3232-49d3-ae02-e07326242835?wsid=/subscriptions/dbd697c3-ef40-488f-83e6-5ad4dfb78f9b/resourcegroups/rdondera/providers/Microsoft.MachineLearningServices/workspaces/benchmarking&tid=72f988bf-86f1-41af-91ab-2d7cd011db47 runs "resisc45", "food101", "patch_camelyon", "gtsrb".

Copy link

github-actions bot commented May 15, 2024

Test Results for assets-test

250 tests   249 ✅  4h 37m 2s ⏱️
 11 suites    1 💤
 11 files      0 ❌

Results for commit b788660.

♻️ This comment has been updated with latest results.

@iamrk04 iamrk04 merged commit a212aed into main May 21, 2024
27 of 28 checks passed
@iamrk04 iamrk04 deleted the rdondera/downloader_vision branch May 21, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants