Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema.org http or https? #21

Open
NoelDeMartin opened this issue Dec 18, 2021 · 10 comments
Open

schema.org http or https? #21

NoelDeMartin opened this issue Dec 18, 2021 · 10 comments
Assignees

Comments

@NoelDeMartin
Copy link

I've noticed that schema is declared as http://schema.org/, which may cause some issues for applications that are using https://schema.org/ instead. Looking at their documentation, it seems like https is the correct one: https://schema.org/Recipe (even if you visit using http://, you are redirected and it says Canonical URL: https://schema.org/Recipe).

But I realize it's not as easy as just changing it, because it may break some existing code. What do you think about this? Should we change it for the next major release (0.6.0 or even 1.0.0?) and document it as a breaking change?

@bourgeoa
Copy link
Member

The problem to solve is how to query a document using the 2 http and https URI version of the same ontology ?

I don't see we can have a breaking change each time an http ontology becomes a https ontology.
Actually all http being redirected to https are valid. Why not stay with HTTP until a W3C migration recommendation ?

@NoelDeMartin
Copy link
Author

The problem to solve is how to query a document using the 2 http and https URI version of the same ontology ?

Yes, I guess that's the main problem, but do you think that can be easily solved with this library? It seems like it's just a dictionary, it isn't using any complicated logic. So I'm not sure how easy it would be to solve that, I'm not too familiar with this library.

I don't see we can have a breaking change each time an http ontology becomes a https ontology.

If their official documentation changes, I think we should as well. I don't think this happens every day anyways. At the very least, maybe we should have two? schema and schemasecure or something? I agree that the ideal scenario would be to handle both, but until that happens, we should at least support the cannonical url in some way.

Actually all http being redirected to https are valid. Why not stay with HTTP until a W3C migration recommendation ?

For existing applications, I guess it's ok to stay on http, but for new applications they will use https (because that's what they see when they look at the documentation on their website). In practice yes, if you as a user open a URL in the browser using http, it'll be redirected to https. But for applications, unless they handle both schemas, it's like the data doesn't exist. That's what I found working on the hello world, it wasn't getting the data and I didn't know why until I noticed this issue.

In any case, it's not an easy problem to tackle... I just realized I'm not handling this in my own apps either, so this is something else to add to my never-ending list of improvements 😅

@timbl
Copy link
Contributor

timbl commented Jan 28, 2022

What we did with the many w3.org/ns vocabularies which all use http: was:

  • Keep using 'http' in the RDF
  • Actually retrieve the ontologies using https:

I think the http: redirect to the https: so we can do it like that, or we can put code in rdflib to always add the 's' except for localhost.

Is the schema docs talk about the terms in the ontology with an 'S' then that is a different way to go, and yet it threatens all the RDF we have all over the place.

Another dramatic thing would be to say that for RDF they are always the same and canonicalize all in the store to one of the other. That would be a big change to the RDF model.

@bourgeoa
Copy link
Member

Are there any recommendations or discussions at the W3C level ?

@ericprud
Copy link

Another dramatic thing would be to say that for RDF they are always the same and canonicalize all in the store to one of the other. That would be a big change to the RDF model.

Maybe a .wellknown location for general server config that allows one to say "this server is one which serves content from http and https"? Such a parallel access isn't a default behavior for e.g. apache (can specify roots for all listened ports). I don't know if lighthttpd allows a server.document-root inside $SERVER["socket"] == ":443" {...}. Regardless, even if people have to be diligent to configure parallel access, it's something they do the far majority of the time.

@TallTed
Copy link

TallTed commented Jan 31, 2022

It would be necessary to be able to advertise that "this server serves HTTP on port n and HTTPS on port m", as there is no reason that HTTP must be on 80 nor that HTTPS must be on 443; these are just frequently followed convention/default.

@bourgeoa
Copy link
Member

@csarven Is this something to be addressed in specification?

@kjetilk
Copy link
Member

kjetilk commented Jan 31, 2022

My feeling here is that the vocab itself has to define its URI, and historically that has defaulted to the http scheme, and so, that's what it is in the majority of cases, regardless of whether it is eventually served through a TLS tunnel. Thus, I also think it isn't something we can define, it falls upon whoever controls those URIs.

The case of schema.org is a tricky one, because it has not made that very clear at any point, and so I have in my code defaulted to http but I recognize that https has been used since very early days, and so, it is invariably going to be breakage somewhere. It is very unfortunate.

@csarven
Copy link
Member

csarven commented Jan 31, 2022

I ran into the issue of flipping between schemorg's http/https in my application. Sticked with http in the end. It is not something we can address in our specs - as Kjetil mentioned, the vocab will make that call.

See https://www.w3.org/blog/2016/05/https-and-the-semantic-weblinked-data/ for what Tim is referring to.

@angelo-v
Copy link
Contributor

I am not convinced that schema.org switched to https for the RDF vocabulary.

Following our nose from JSON-LD @context to the context document and the @vocab reveals http://schema.org/ as the prefix used for this vocabulary, and therefore http://schema.org/Person as e.g. the correct identifier for a schema:Person.

▶ curl -s -I https://schema.org/  | grep link:                      
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
                                                                                                                                                                                                      
▶ curl -s https://schema.org/docs/jsonldcontext.jsonld | grep @vocab
        "@vocab": "http://schema.org/",

I think https is only canonical for the HTML documentation pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants