Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Support for ImageNet format annotations #86

Open
amirmk89 opened this issue Mar 2, 2023 · 0 comments
Open

[Feature request] Support for ImageNet format annotations #86

amirmk89 opened this issue Mar 2, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@amirmk89
Copy link
Contributor

amirmk89 commented Mar 2, 2023

Will be added in an upcoming release, shared here to help unblock users and collect feedback.

ImageNet image annotation format

The ImageNet format uses the directory structure for dividing images into splits and classes. Class names are coded such that n02979186 is 'cassette player', etc.
General structure is:

  • data:
    • train
      • n02979186
      • n03417042
      • ...
    • val
      • n02979186
      • n03417042
      • ...

E.g., train set cassete player images would be:

  • data/train/n02979186/1.jpg
  • data/train/n02979186/2.jpg

This snippet assumes that the data directory is provided as root and parses class codes and splits. Converting class codes to class names is not covered here, the full list can be found here

Parsing snippet

Assumes relative paths follow the data/train/n02979186/1.jpg format, meaning full path is /path/to/imagenet/data/train/n02979186/1.jpg.

import fastdup
import pandas as pd
from pathlib import Path

data_root = '/path/to/imagenet'
img_list = list(Path(data_root).rglob('*.JPEG'))

df = pd.DataFrame({'img_filename': [str(o.relative_to(data_root)) for o in img_list]})
df['split'] = df.img_filename.apply(lambda x: x.split('/')[0])
df['label'] = df.img_filename.apply(lambda x: x.split('/')[1])

# Run fastdup
fd = fastdup.create(work_dir, data_root)
fd.run(annotations=df)

Please let us know if you see any issues or want to request additional features.

@amirmk89 amirmk89 added the enhancement New feature or request label Mar 2, 2023
@amirmk89 amirmk89 self-assigned this Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant