Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle duplicates in property values #132

Open
simleo opened this issue Jul 20, 2022 · 2 comments
Open

Handle duplicates in property values #132

simleo opened this issue Jul 20, 2022 · 2 comments

Comments

@simleo
Copy link
Collaborator

simleo commented Jul 20, 2022

from rocrate.rocrate import ROCrate
from rocrate.model.person import Person

crate = ROCrate()
john = crate.add(Person(crate, "#johndoe"))
jane = crate.add(Person(crate, "#janedoe"))
crate.root_dataset["author"] = [john, jane, john]
crate.root_dataset.properties()
{'@id': './',
 '@type': 'Dataset',
 'datePublished': '2022-07-20T10:25:39+00:00',
 'author': [{'@id': '#johndoe'}, {'@id': '#janedoe'}, {'@id': '#johndoe'}]}

I.e., the JSON-LD is not properly flattened. Note that, while in the above example the API user can easily avoid generating the duplicate, in the general case it may be much trickier to even notice that one is being generated (e.g., subsequent calls to Entity.append_to in different sections of the code).

This should be dealt with in "real time", so that the crate stays flattened at all times and assertions like len(crate.root_dataset["author"]) == 2 don't fail while one is still working on it. Since lookup by value in a list is O(n), extending a property with subsequent calls to append_to would become quadratic. We should therefore switch to sets for property values, which is also closer to their actual semantics, since they have no predefined order. Should we then add support for JSON-LD lists? Are they supported / do they make sense in Schema.org / RO-Crate?

@simleo
Copy link
Collaborator Author

simleo commented Aug 26, 2022

We discussed ordering for multiple-value properties at yesterday's RO-Crate meeting.

  • We should support @list. Sometimes order matters, e.g., authors in a Workflow RO-Crate.
  • Libraries should keep JSON list order by default anyway. If we don't do that, the metadata file could change just by reading and then writing an RO-Crate. That would be odd and really annoying when doing comparisons for testing etc.

@simleo
Copy link
Collaborator Author

simleo commented Mar 20, 2023

We should therefore switch to sets for property values

This is harder than it looks, since Entity uses the underlying JSON dictionary (self._jsonld) for storage (__getitem__ / __setitem__ perform conversions as needed when the value of a property is requested).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant