Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental: defining "extension" slot as a list of key-value pairs #263

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

matentzn
Copy link
Collaborator

@matentzn matentzn commented Mar 7, 2023

Resolves [#ISSUE, #ISSUE]

  • docs/ have been added/updated if necessary
  • make test has been run locally
  • tests have been added/updated (if applicable)
  • CHANGELOG.md has been updated.

[Description, mentioning at least relevant #ISSUE and how it was addressed. A bulleted list of all changes performed by the PR is is helpful.]

range: string
range: key value pair
multivalued: true
inlined_as_list: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is false it makes the serialized dict slightly more idiomatic

@matentzn matentzn changed the title Experimental: defining "other" slot more formally is a list of key-value pairs Experimental: defining "extension" slot as a list of key-value pairs Mar 11, 2023
matentzn added a commit that referenced this pull request Jul 27, 2023
This needs to be dealt with in in #263
# value: MYINTERNALNAMESPACE:mapping_id
# - key: funded_by
# value: foaf:fundedBy
subject_id predicate_id object_id mapping_justification modified mapping_id funded_by
Copy link
Contributor

@gouttegd gouttegd Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the risk of clashes if a future version of SSSOM defines, say, mapping_id as a standard field?

I’d suggest mandating that columns defined by this extension mechanism should be prefixed, e.g. with ext_ or similar.

That is, if an extension is defined as follows:

#extension_definition:
#  - key: mapping_id
#    value: MYINTERNALNAMESPACE:mapping_id

Then in the TSV file it should appear as:

subject_id   predicate_id   [...] ext_mapping_id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either that or adding a SSSOM version number in the mapping set metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I was musing about before I stopped working on it. The downside of using "ext" prefixes for column names is that I already know projects using SSSOM that really don't want to change their column names which are part of a large metadata model for mappings (with a few organisation specific columns like, ehem, species, Grant_id, and other Shananigans. I would have much liked the prefix solution for cleanliness.

My thought was that if a custom column is defined in the mapping set header, it trumps any definitions in sssom schema. This is super messy, but it's unclear to me how I should even approach this question.

Copy link
Contributor

@gouttegd gouttegd Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unclear to me how I should even approach this question.

My suggestion:

  1. Make a list of those extra column names that you already know are used in the wild.
  2. Mark those column names in the spec as being reserved, meaning that future SSSOM versions will never reuse those names.
  3. From that moment on, new extension columns MUST use prefixed names. Existing projects that are currently using “reserved” names MAY continue to do so, though they are encouraged to switch to prefixed versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be the cleanest solution, but it’s a pretty common way of dealing with existing legacy stuff in specifications and standards.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very French way to go about it, while the real world clearly works under Mediterranean principles.. People will just not care about this constraint in the spec - the only consequence it will have is that we can't use normal processing tools on many mapping sets in the wild. I know I sound negative but man, after 10 years of lobbying with great charme and all we barely managed to get people to agree to publish ontologies in a common syntax - it has proven nearly impossible to tell these same people to use a common set of annotation properties to describe their metadata.

On the other hand, I cant see a different solution either! I am 51% inclined to go your way. I wouldn't even bother to enumerate existing columns, they are too messy anyways. Just force the ext_.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a real-world approach. Look at any IETF RFCs, any IANA registry: they are littered with this kind of “reserved” attributes, keywords, identifiers. Most of them exist for only one reason: to deal with this kind of mess and the fact that many people don’t give a shit about standards, and standards have to take that into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants