Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for IPTC Subject Codes #4024

Open
paperboyo opened this issue Jan 24, 2023 · 2 comments
Open

Add support for IPTC Subject Codes #4024

paperboyo opened this issue Jan 24, 2023 · 2 comments
Labels

Comments

@paperboyo
Copy link
Contributor

paperboyo commented Jan 24, 2023

IPTC Subject Code is a newer version of old IPTC IIM Category and Supplemental Category fields. Currently, Grid only understands those ancient IIM fields and only when stored in IIM part of the metadata. Grid should be made to:

  1. understand their XMP versions (xmp.photoshop:Categoryand xmp.photoshop:SupplementalCategory respectively) and read them here
  2. understand newer (but still legacy) IPTC Subject Codes and map those to existing list of Grid Subjects. Mappings are available here.
  3. We could also spend some time to take a look if the list of Grid subjects could be made more useful and if imagery from suppliers wouldn’t allow for a more useful list. I don’t think Subjects should provide any extensive ontology. They are far more useful as a quick, short list to exclude/include big unwanted chunks of the corpus. The fact that IPTC went more and more extensive married with a fact that newer and newer schemas enjoy less and less support seems to back up this view. But I can see one useful fix: separating fashion and catwalk from Arts (if possible via metadata sent to us).
  4. Current IPTC recommendation is to use even more extensive CV-Term About Image. But, among ~53mln images, a single one having this property is… an IPTC test image, so we have another decade to worry about that, I guess 😜.
@honorcb
Copy link
Collaborator

honorcb commented Feb 6, 2023

You should be looking at https://iptc.org/standards/media-topics/ , the replacement to the Subject codes . The work to replace the Subject Codes was started by a small group of members from BBC Scotland, AP and PA in about 2003,

@paperboyo
Copy link
Contributor Author

paperboyo commented Feb 6, 2023

IIUC, media topics are supposed to be a newest version of Categories/Subject Codes. They use controlled vocabulary newscodes. But those (for media topics) are written into CV-Term About Image, right? And not one of XMP fields for the whole structure of CV-Term is available in a single image in our corpus.

Am I wrong and are they saved into some other XMP field? Have you seen them anywhere in the wild?

In any case, I still think for those to be useful, we need some short and manageable list. Sadly, not even this is supplied by some of our biggest suppliers…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants