Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codemeta content negotiation is not correct #216

Open
dgarijo opened this issue Jun 29, 2019 · 37 comments
Open

Codemeta content negotiation is not correct #216

dgarijo opened this issue Jun 29, 2019 · 37 comments
Labels
discussion A discussion is still needed to analyze, resolve and implement enhancement

Comments

@dgarijo
Copy link
Contributor

dgarijo commented Jun 29, 2019

Hello,
I want to use Codemeta on our next version of OntoSoft. However, the content negotiation of the schema is not correct. When I do
curl -sH "accept:application/json+ld" -L https://codemeta.github.io/terms/
I am expected to obtain https://raw.githubusercontent.com/codemeta/codemeta/1.0/codemeta.jsonld, i.e., a machine readable description of the schema. Instead, I am redirected to the human-readable html page.
I don't think you can enable content negotiation on GitHub pages, but maybe having a w3id would be desirable.

@mbjones
Copy link
Collaborator

mbjones commented Jun 30, 2019

@dgarijo you make a good point. We published the 2.0 version using the doi https://doi.org/10.5063/schema/codemeta-2.0, and our intention was that would be the context for the terms. So this resolves properly:

curl -sH "accept:application/json+ld" -L https://doi.org/10.5063/schema/codemeta-2.0

We meant to switch the namespace to that, but it seems we didn't and we are now going to have the github.io URL as the context URI, which could cause issues in the long run for us. So, @cboettig, what do you think of a new release using the doi as the context? Alternatively, we live with it at the current github.io URL, and accept that we can't redirect it to the schema, and that we may not be able to maintain it over the long term depending on what happens with github.io. Other options?

@dgarijo
Copy link
Contributor Author

dgarijo commented Jun 30, 2019

Some suggestions based on my experience. Maybe they will help:

I think that a raw.githubusercontent URL may not be the best for production. It would be better to give it a more stable URI, even if you use a github page. Or even better, you tag a release and have the json representation there, so you have a permanent link such as https://github.com/codemeta/codemeta/archive/2.0.jsonld

Do you have control on the content negotiation for https://doi.org/10.5063/schema/codemeta-2.0? If so, in the htaccess you may add a 303 redirect to the github.io html documentation or the jsonld file depending on the header request. Right now doing:

curl -sH "accept:text/html" -L https://doi.org/10.5063/schema/codemeta-2.0

also returns the json ld file, instead of https://codemeta.github.io/terms/

If you decide to move towards the DOI as a namespace, then it would be nice to design a vocabulary URI independent from the version. Something like https://doi.org/10.5063/schema/codemeta#. Having a github namespace usually doesn't work, because you don't have control on the redirections.

@mbjones
Copy link
Collaborator

mbjones commented Nov 14, 2019

Thanks @dgarijo Personally I much prefer to keep the version in the namespace, as it makes it much clearer which version of a vocabulary people are using in particular instance documents. But I agree we have some namespace cleanup to do here. This partly depends on the recent changes at DataCite on how much control we have over content negotiation for the json-ld document -- I need to look into that further.

@dgarijo
Copy link
Contributor Author

dgarijo commented Nov 14, 2019

All right, please keep me posted :)
The version on the URI is informative, but it kind of breaks your queries if you store your codemeta information in a knowledge base...Take into account that each time you change the version it would be like a property rename.

@mbjones
Copy link
Collaborator

mbjones commented Nov 15, 2019

That's a very good point about the property rename. For other vocabularies that we manage like ECSO, the new versions only contain new terms -- once minted, old term URIs never change, whereas new terms would have the new namespace version. I'm not so sure that applies here, and all may be moot if our push to get all of these terms into schema.org directly is successful.

@dgarijo
Copy link
Contributor Author

dgarijo commented Jun 3, 2021

Now that a v3 is on the discussion, maybe it's a good time to bring back this issue @mbjones ?
Would it be possible to have a permanent URI for codemeta?

@tmorrell
Copy link
Contributor

tmorrell commented Jul 1, 2021

DataCite removed native custom content negotiation for DOIs, so there is not an easy way to have both the html and json ld on the same DOI. I'm not sure that we need both formats in the same url. If we do, content negotiation would have to happen on a separate service, which brings up sustainability issues.

@dgarijo
Copy link
Contributor Author

dgarijo commented Jul 2, 2021

Having content negotiation is desirable, but not a requirement. You are right that external services bring sustainability issues, although there are permanent url services (purl.org, w3id.org) that may help. Of course, this would mean changing the github URI to something like purl.org/codemeta or w3id.org/codemeta, which is a huge change.

I think the main issue right now is that if you translate the JSON-LD to other serializations, the context is lost and may not resolve. For example, take the following codemeta file https://tinyurl.com/yevqrs2s and translate it to n-quads in the application. You will see that schema.org terms will resolve, but things like https://codemeta.github.io/terms/issueTracker will return a 404 :(

@moranegg
Copy link
Contributor

This issue will be open for discussion until March 15th and we will start a vote from March 15th until March 25th 2023

This is an urgent matter which should be a candidate for the v3.0.

An option would be to use a SWHID for the json-ld file:
swh:1:cnt:2f3ea1e82e6df8cdd1bf12c25dc26ef20e8d6642;origin=https://github.com/codemeta/codemeta;visit=swh:1:snp:d8b491dfab4319441ad438db7c693026f6d1b3e3;anchor=swh:1:rev:22cf4c2b836a0026565792b5d1b3c5ff6c1fc82b;path=/codemeta.jsonld

https://archive.softwareheritage.org/swh:1:cnt:2f3ea1e82e6df8cdd1bf12c25dc26ef20e8d6642;origin=https://github.com/codemeta/codemeta;visit=swh:1:snp:d8b491dfab4319441ad438db7c693026f6d1b3e3;anchor=swh:1:rev:22cf4c2b836a0026565792b5d1b3c5ff6c1fc82b;path=/codemeta.jsonld

Some more information about SWHIDs can be find here:
https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/

@dgarijo
Copy link
Contributor Author

dgarijo commented Feb 16, 2023

Thanks, @moranegg. I think this would solve the versioning aspect, but not the main identity problem with the vocabulary. Plus, the ids are too long to use in a vocabulary directly. Let me elaborate below.

Codemeta namespace URI is: https://codemeta.github.io/terms/

When I resolve this, I don't get a machine-readable representation of the vocabulary. I get HTML. So I cannot use it in my apps.

Since a lot of time has passed, I would like to propose the following for version 3:

Create a w3id for codemeta. Something like https://w3id.org/codemeta, which always resolves to the latest version of the vocabulary. This can be a DOI, a SWHID, or other. The HTML version would not change, as https://w3id.org/codemeta would redirect to https://codemeta.github.io/terms/. On the surface nobody would notice a change.

This approach supports direct versioning, as we can declare version IRIs like https://w3id.org/codemeta/1.0/ and then redirect to https://raw.githubusercontent.com/codemeta/codemeta/1.0/codemeta.jsonld, or its corresponding SWHID

Note: This would be a MAJOR change, as it would mean moving the ID of codemeta from https://codemeta.github.io/terms/ to https://w3id.org/codemeta. This would be a superficial change for humans, but would help enormously working with codemeta on the technical end.

@stain
Copy link

stain commented Sep 28, 2023

Considering #321 I would propose https://w3id.org/codemeta# as namespace in agreement with @dgarijo. As this is a v3.0 (major change in https://semver.org/) and only affect a couple of additional terms like softwareSuggestions (not the majority which remain in http://schema.org) v3 would be the right time to do it.

Vocabulary namespace should not be versioned, so that their semantic meaning can stay the same over time, although the exact interpretation/definition may change slightly as the vocabulary evolves. In this case we only have a textual representation of the vocabulary anyway (no stringent OWL file etc) so it shouldn't matter.

@stain
Copy link

stain commented Sep 28, 2023

BTW, in RO-Crate call today we agreed to add the additional codemeta terms to our context, so we'll use the newer w3id namespace for now in anticipation of this issue (as I would be reluctant to add a github.io namespace)

stain added a commit to ResearchObject/ro-crate that referenced this issue Sep 29, 2023
Codemeta Namespace boldly assumed to be w3id-based rather than github.io
see codemeta/codemeta#216

Added Biochemas namespace comment, see
BioSchemas/specifications#653
@moranegg
Copy link
Contributor

moranegg commented Oct 2, 2023

I'm inclined to drop the https://codemeta.github.io/terms/ redirection because it is not persistent, since it is the location of the last release, which will evidently evolve with time.

What would be the best approach here in term of replacement?
This is an open question to the community, thanks all for your help in deciding on this point.

This discussion will be open until October 15th.
After this date, the PMC will choose the best candidate and move forward with the finalization of the v3.0.

@stain
Copy link

stain commented Oct 6, 2023

I'll try to clarify the proposal:

One problem with # is what to do about terms that are now (or in the future) in schema.org -- like maintainer -- if they are removed from the page then we can't redirect to an earlier version as the term is after the #anchor. An easy solution is to have a section for "Obsolete terms" which would also be beneficial for users. The more detailed version is to use https://w3id.org/codemeta/maintainer etc. "slash-style" namespaces which generically could redirect to the HTML section but could be overridden for each deprecated term.

These considerations are explained in https://www.w3.org/TR/swbp-vocab-pub/ -- perhaps @dgarijo have some more modern sources.

@dgarijo
Copy link
Contributor Author

dgarijo commented Oct 6, 2023 via email

@stain
Copy link

stain commented Oct 6, 2023

It may be tricky to add <tr id="contIntegration"> anchors without redoing the knitr magic of terms.Rmd.

https://github.com/yihui/knitr/blob/v1.44/R/table.R#L370 do not have any way to add anchor attributes (beyond align).

Perhaps these terms should be loaded from elsewhere than the crossmap? Also I wonder why they have the type codemeta:SoftwareSourceCode -- is this an implied subclass of http://schema.org/SoftwareSourceCode we should/should not formally define, or more of a trick for this term selection?

@progval
Copy link
Member

progval commented Oct 6, 2023

It may be tricky to add <tr id="contIntegration"> anchors without redoing the knitr magic of terms.Rmd.

Shouldn't be an issue, we rewrote other parts of the .Rmd-based website generation to Python and Hugo templates in recent months.

@progval
Copy link
Member

progval commented Oct 6, 2023

Also I wonder why they have the type codemeta:SoftwareSourceCode

What does? codemeta:SoftwareSourceCode doesn't exist, it's most likely a mistake.

@stain
Copy link

stain commented Oct 16, 2023

Also I wonder why they have the type codemeta:SoftwareSourceCode
What does? codemeta:SoftwareSourceCode doesn't exist, it's most likely a mistake.

From crosswalk.csv as read by terms.Rmd:

codemeta:SoftwareSourceCode,softwareSuggestions,SoftwareSourceCode,,,,(references),,,,,,,,identificationInfo.additionalDocumentation*,,,,devDependencies / optionalDependencies,,BuildDepends,,,,,,Suggests,add_development_dependency,,,,,,,suggests,softwareSuggestions
codemeta:SoftwareSourceCode,maintainer,Person,,maintainer,package.authors,,prov:qualifiedAttribution,maintainer,,,,,,identificationInfo.pointOfContact,,,,,,,,,,maintainer / maintainer_email,,Maintainer,,,,,,,,uploadedBy,maintainer

@stain
Copy link

stain commented Oct 16, 2023

It may be tricky to add <tr id="contIntegration"> anchors without redoing the knitr magic of terms.Rmd.

Shouldn't be an issue, we rewrote other parts of the .Rmd-based website generation to Python and Hugo templates in recent months.

Can try to help on this, I have done a couple of Hugo templates for my own website. In this way the term listing could also include sufficient RDFa description of the new terms.

@progval
Copy link
Member

progval commented Oct 16, 2023

From crosswalk.csv as read by terms.Rmd:

codemeta:SoftwareSourceCode,softwareSuggestions,SoftwareSourceCode,,,,(references),,,,,,,,identificationInfo.additionalDocumentation*,,,,devDependencies / optionalDependencies,,BuildDepends,,,,,,Suggests,add_development_dependency,,,,,,,suggests,softwareSuggestions
codemeta:SoftwareSourceCode,maintainer,Person,,maintainer,package.authors,,prov:qualifiedAttribution,maintainer,,,,,,identificationInfo.pointOfContact,,,,,,,,,,maintainer / maintainer_email,,Maintainer,,,,,,,,uploadedBy,maintainer

That looks like a mistake

Can try to help on this, I have done a couple of Hugo templates for my own website. In this way the term listing could also include sufficient RDFa description of the new terms.

Thanks!

@dgarijo
Copy link
Contributor Author

dgarijo commented May 9, 2024

Hello,
the conversation in this thread seems to have stopped a bit. With the v3 release, I can resolve https://w3id.org/codemeta to the right context. However, when I get the context I still get that codemeta terms have the identifier: https://codemeta.github.io/terms/. These do not resolve yet to anything

Would it be possible to change this id the context file too so we can resolve terms live https://w3id.org/codemeta/contIntegration to the right term? This is a breaking decision as it would make older versions of codemeta not compatible with new ones (the URI would change). However, it would help resolving any vocabulary term.

@mbjones
Copy link
Collaborator

mbjones commented May 9, 2024

@dgarijo I don't see any issues with making the full term URIs resolve without a breaking change -- we just need https://w3id.org/codemeta/contIntegration to first resolve to a domain we control, and then use rewrite rules to send back the term URI with a fragment identifier. Seems very doable. Can you clarify why you think this would be breaking so I can understand?

@progval
Copy link
Member

progval commented May 9, 2024

switching from https://codemeta.github.io/terms/contIntegration to https://w3id.org/codemeta/contIntegration is a breaking change because it's a different URI, so JSON-LD parsers see it as a completely different property.

@progval progval added enhancement discussion A discussion is still needed to analyze, resolve and implement labels May 9, 2024
@dgarijo
Copy link
Contributor Author

dgarijo commented May 9, 2024

Exactly, the problem is what @progval states. We would need to open a community consultation and take a decision on this.
I would prefer that the vocabulary terms resolve properly. I know some adopters who are reluctant to using codemeta because the terms do not resolve. You kind of have to know where to find the context URI.

@mbjones we already control https://w3id.org/codemeta/ and it's redirected to the latest context with content negotiation :)

@progval
Copy link
Member

progval commented May 9, 2024

I would prefer that the vocabulary terms resolve properly. I know some adopters who are reluctant to using codemeta because the terms do not resolve.

I don't understand this. What should property URIs resolve to?

schema.org's property URIs also don't resolve to anything:

$curl -sH "accept:application/json+ld" http://schema.org/name -I 
HTTP/1.1 301 Moved Permanently
Location: https://schema.org/name
X-Cloud-Trace-Context: aa3c0a5da08b832625c109858a1b9c6e
Date: Thu, 09 May 2024 17:03:14 GMT
Content-Type: text/html
Server: Google Frontend
Transfer-Encoding: chunked

$ curl -sH "accept:application/json+ld" https://schema.org/name -I
HTTP/2 200 
[...]
content-type: text/html

in the first message of this thread, you mentioned you wanted https://codemeta.github.io/terms/ (ie. namespace of Codemeta's property URIs) to resolve to the v1.0 context; which is a different thing.

I am also not convinced it is that useful to make a namespace redirect to the context (which is not schema, btw) because it has no meaning in and of itself. Wikidata's namespace also resolves only to an HTML page (and it's a 404):

$ curl -sH "accept:application/json+ld" http://www.wikidata.org/prop/direct/ -I
HTTP/1.1 301 Moved Permanently
content-length: 0
location: https://www.wikidata.org/prop/direct/
[...]

$ curl -sH "accept:application/json+ld" https://www.wikidata.org/prop/direct/ -I
HTTP/2 303 
date: Thu, 09 May 2024 17:07:55 GMT
server: mw-web.eqiad.main-5ffd5d6f88-k56tc
location: https://www.wikidata.org/wiki/Property:
[...]

$ curl -sH "accept:application/json+ld" https://www.wikidata.org/wiki/Property: -I
HTTP/2 404 
date: Thu, 09 May 2024 17:08:08 GMT
server: mw-web.eqiad.main-5ffd5d6f88-6rplt
[...]
content-type: text/html; charset=UTF-8
[...]

@dgarijo
Copy link
Contributor Author

dgarijo commented May 9, 2024

Schema.org supports content negotiation on a special way, and not for terms, that's true (schemaorg/schemaorg#3500).

I would like the vocabulary URI (and the respective terms) to resolve to the specification following w3c best practices. We only have the context right now, so that would ok. Ideally we should return the new codemeta properties in a graph stating the schema:domainIncludes and schema:rangeIncludes (we have a separate issue open for that). Example: https://github.com/BioSchemas/specifications/blob/master/ScholarlyArticle/jsonld/ScholarlyArticle_v0.2-DRAFT-2020_12_03.json (disclaimer: I have not verified if the content negotiation works in bioschemas, but I like the JSON-LD spec)

At the moment https://codemeta.github.io/terms/ returns nothing in a machine-readable manner.

@progval
Copy link
Member

progval commented May 9, 2024

I see. Any chance we could somehow serve the machine-readable data within the HTML, ie. as microdata?

@dgarijo
Copy link
Contributor Author

dgarijo commented May 9, 2024

Ah that's interesting. We could embed it in a JSON-LD snippet in the html doc, I had not considered that option. Still, this would not support content negotiation and you would need a specific extractor to get the snippet...

I'd still prefer going for a URI rename because w3ids allow for more control, but I would be keen to see what the rest of the community thinks about it. The solution above would be a compromise (unless github decides that github.io needs a URI change in the future and we need the rename anyways)

@mbjones
Copy link
Collaborator

mbjones commented May 9, 2024

I see where my confusion came from -- I wrongly assumed we were already using the w3id in the context file. I thought that was the whole point of the switch away from a DOI, to provide long term continuity and to enable redirects to work properly. So, currently, where the context file has:

  "@context": {
      "type": "@type",
      "id": "@id",
      "schema":"http://schema.org/",
      "codemeta": "https://codemeta.github.io/terms/",
...

If we were to change this to:

  "@context": {
      "type": "@type",
      "id": "@id",
      "schema":"http://schema.org/",
      "codemeta": "https://w3id.org/codemeta/",
...

then I think we would be well served. That URI space could 1) provide redirects to the context file for requests for JSON-LD, 2) provide redirects to human readable docs for the context and terms overall, and 3) provide redirects to HTML for individual terms at term anchors in the html. Honestly, I thought we had done this before, which is why I thought 3.0 was a breaking change. I think this is still an important change, even if it is breaking again. Once we make this change, the website at codemeta.gitub.io could get moved/renamed with impunity because the redirects can be updated. So I support that the new official term URIs for codemeta would have the form that @dgarijo proposed -- https://w3id.org/codemeta/contIntegration.

@progval
Copy link
Member

progval commented May 9, 2024

I wrongly assumed we were already using the w3id in the context file

indeed, we are using w3id as the URL for the context file, but not as the URI namespace inside the context file. It's the properties that are still on the codemeta.github.io domain.

I thought we had done this before, which is why I thought 3.0 was a breaking change.

It was a breaking change because we renamed some properties within the namespace, though we kept the same namespace. (see the change log)

@mbjones
Copy link
Collaborator

mbjones commented May 9, 2024

Good to know. So, are you supportive of changing the properties to use the form https://w3id.org/codemeta/contIntegration?

@progval
Copy link
Member

progval commented May 9, 2024

I don't know, there is merit to both. (and if we do, probably https://w3id.org/codemeta/terms/continuousIntegration instead of https://w3id.org/codemeta/continuousIntegration so the https://w3id.org/codemeta/terms/.* wildcard for property URIs does not conflict with the existing https://w3id.org/codemeta/.* wildcard for context versions; otherwise we would need to switch the version wildcard to https://w3id.org/codemeta/v[0-9].* which might be trickier for w3id to support)

@mbjones
Copy link
Collaborator

mbjones commented May 9, 2024

The w3id config is an apache rewrite config, so it is very flexible and could handle those regexes, but I agree it might make sense to keep the parallel URL structure in the new namespace. I like that /codemeta/terms/ is the version-independent root of the term URIs and is distinguishable from the versioned URIs.

@dgarijo
Copy link
Contributor Author

dgarijo commented May 12, 2024

I am happy to mimic the current structure and use the /codemeta/terms/ for the vocabulary specification and the root for the context. We can also add codemeta/context to return the context.

This discussion is a little buried in the issue stack. If you think we should prioritize it, we can make a concrete proposal with how the new w3id/redirections would be and clearly state what would be the changes. Then we can open it up to the rest of the community to be incorporated in the next release?

pinging @moranegg and @stain for their thoughts too.

@moranegg
Copy link
Contributor

In the SciCodes consortium, we have decided to open this as a discussion to get a community vote.
@dgarijo will summarize the content in the discussion and I will communicate this to the community with the deadline on June 20th 2024.

@dgarijo
Copy link
Contributor Author

dgarijo commented May 22, 2024

Now in #360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion A discussion is still needed to analyze, resolve and implement enhancement
Projects
None yet
Development

No branches or pull requests

6 participants