Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow all URI characters in prefixed names #89

Open
dbooth-boston opened this issue Sep 28, 2021 · 17 comments
Open

Allow all URI characters in prefixed names #89

dbooth-boston opened this issue Sep 28, 2021 · 17 comments

Comments

@dbooth-boston
Copy link
Collaborator

In turtle and SPARQL, prefixed names like fhir:patient have a very limited syntax after the prefix. This means that prefix definitions can only be used to shorten URIs in very limited ways that conform to the syntax rules for local names.

For example, suppose I have these URIs:

<http://example.org/Encounter/f201>
<http://example.org/Patient/f201> 
<http://example.org/Practitioner/d444> 
<http://example.org/Practitioner/f201> 

Since they all have the http://example.org/ part in common as a prefix, it would be nice if I could define a prefix ex: like this to shorten them all:

@prefix ex: <http://example.org/> .

ex:Encounter/f201
ex:Patient/f201
ex:Practitioner/d444
ex:Practitioner/f201

But those prefixed names are not allowed in Turtle or SPARQL because slash ("/") is not allowed in a local name. It would be helpful if a prefixed name would allow any valid URI syntax after the prefix, so that URIs could be shortened more flexibly.

One might claim that the above URIs should have been designed differently, to avoid using a slash in that part of the URI, but often we do not control how the URIs are designed: they are given to us and we must deal with them as they are.

@afs
Copy link
Contributor

afs commented Sep 28, 2021

This is a clash in SPARQL. / is in use elsewhere in property paths.

Some of the other characters are possible.

But maybe have delimiters for extended prefix names c.f. CURIEs?

(The other concern is making it too easy to create URIs that break RDF/XML. YMMV.)

@TomConlin
Copy link

TomConlin commented Oct 4, 2021

Although I have often wanted bare top level "site" prefixes
for those cases where an item is attributed with
"came from waves hand vaguely ..." example.org
I do not see this as the best way forward.

The local-ID portion of a curie in the wild has more then enough variability as-is
without including what can more succinctly be viewed as the prefix "type"
or path refinement and is already accommodated in the
prefix generation process as a leading uri fragment.

Also if the number of distinct prefixes matters at all in comparison with
repeating the url path fragment with every data item then
I urge consideration of datasets with more rows.

tl;dr qualify the prefix not the item-identifier as then anyone can qualify differently and it is no longer an identifier.

@dbooth-boston
Copy link
Collaborator Author

dbooth-boston commented Oct 4, 2021

@TomConlin , am I understanding properly? Are you suggesting that CURIEs (which neither Turtle nor SPARQL currently support) would be an adequate solution to the problem?

@TomConlin
Copy link

I am recommending that when partitioning strings,
your uri characters portion stay with the prefix
so as not to break existing uses of curies nor impede
the extension of SPARQL/Turtle to include them
Or adoption of some practial format which would
expect ntriples/quads/... as curies.

@dbooth-boston
Copy link
Collaborator Author

@TomConlin , sorry for my continued confusion, but could you perhaps give an example? Specifically, how would you propose to solve the problem of having the following URIs -- which you do not control, so you cannot change them -- and you want to define some kind of prefix/CURIE/whatever to shorten the references to them. How do you propose to shorten them?

<http://example.org/Encounter/f201>
<http://example.org/Patient/f201> 
<http://example.org/Practitioner/d444> 
<http://example.org/Practitioner/f201> 

@TomConlin
Copy link

TomConlin commented Oct 12, 2021

Does this help?

@prefix exEncounter:   <http://example.org/Encounter/>  .
@prefix exPatient:         <http://example.org/Patient/ >  .
@prefix exPractitioner:  <http://example.org/Practitioner/>  . 


exPatient:f201 exEncounter:f201 exPractitioner:d444;  exPractitioner:f201   .

took some liberties to rearrange list into plausible statment

@dbooth-boston
Copy link
Collaborator Author

Yes, thanks for the clarification. Indeed that is an option, since that is exactly what we currently have to do in Turtle and SPARQL. But it causes namespace proliferation, so the point of the issue is come up with a way to define a single prefix for those URIs, to avoid that namespace proliferation.

@pchampin
Copy link

Could we solve this the other way around, i.e. by making it easier to handle and reuse big lists of prefixes?
What if we allowed a set of prefix definitions to be "imported" by URL, in the same way as JSON-LD contexts?
Something like

@prefixes <http://example.org/prefixes.ttl> .
# assuming that URL contains  the list of prefixes given by @TomConlin above

exPatient:f201 exEncounter:f201 exPractitioner:d444;  exPractitioner:f201   .

It would not matter so much that we are using a lot of prefixes if the burden of declaring them is offloaded to a single resource.

@TomConlin
Copy link

proliferates prefixes, yes it does;
although very true, it is a relative thing
when I worked with hundreds of millions of statements
there were still only hundreds of prefixes and that number
could have been reduces with less baroque modeling.

As I see it,
the trade-off is to perturb hundreds of millions of local identifiers
or hundreds of prefixes.

I would still like to see a way to get at the base uri
maybe something like the not well considered following ...

@prefix ex:   <http://example.org/>  .
@prefix Encounter:    ex:Encounter/ .
@prefix Patient:         ex:Patient/ .
@prefix Practitioner:  ex:Practitioner/ . 

Patient:f201 Encounter:f201 Practitioner:d444;  Practitioner:f201   .

which is of course is not any type of valid @Prefix syntax I know of
but does allow access to and reuse of the base or root url

@HughGlaser
Copy link
Collaborator

HughGlaser commented Oct 13, 2021 via email

@dbooth-boston
Copy link
Collaborator Author

What if we allowed a set of prefix definitions to be "imported" by URL, in the same way as JSON-LD contexts?

That seems worth considering. I think that also raises the question of whether a general-purpose "include" or "import" capability should be added to RDF. RDF serializations currently lack such capability, because a design principle was to make each RDF file be completely self-contained. OWL has an import statement, but I don't know how well suited it would be for RDF that does not otherwise use OWL, and it doesn't work at the syntactic level that would be needed for prefix definitions.

BTW, this issue is related to issue #13 (Namespace proliferation) and issue #12 (IRI allocation) .

@afs
Copy link
Contributor

afs commented Oct 13, 2021

Re:

PREFIX exEncounter:    ex:Encounter/

Technical points: having an extended set of characters at this point is a little tricky for tokenizing. Not impossible but it is not a simple matter of allowing [136s] PrefixedName as well as URIs.

That would be:

PREFIX exEncounter:    ex:Encounter\/

Is that adequate? Does having a less than perfect solution in the PREFIX to give better appearance in the data provide a practical tradeoff?

We have CURIEs with delimited syntax form: '[' curie ']' (yes, it reuses a delimiter pair - can't have everything).


Imports: You can do with today by concatenating Turtle files :-) "a local data management" issue.

@TallTed
Copy link
Member

TallTed commented Oct 13, 2021

@TomConlin wrote

Patient:f201 Encounter:f201 Practitioner:d444; Practitioner:f201 .

That's invalid Turtle, and very distracting, especially since it's been copied several times. The semicolon should be a comma.

Errors like this are part of why I include lots of extra whitespace in my examples, and use lots of spaces (no tabs!) to indent things into recognizable subject predicate object columns.


@HughGlaser wrote a terrible email reply, which doesn't show up right and loses much of its meaning, so I won't bother replicating it here, because GitHub doesn't handle Markdown in email comments.

Even worse, since there are no codefences (and they wouldn't work if they were present), @HughGlaser pinged the @prefix user of GitHub, as @TomConlin did earlier in #89 (comment)

I'm quite sure that user @prefix does not care about this discussion. Please take care to always wrap any @entity in some kind of codefence, or separate that @ from the rest of the entity name (e.g., @ name or `@name`), whenever they come up in Github discussions!

@dbooth-boston
Copy link
Collaborator Author

Alas, I tried to edit @HughGlaser 's post, to correct the formatting, but github would not let me do so, because it originated as an email reply and markdown is therefore disabled in it. :( I guess the lesson here is: Don't use github email replies for anything but the simplest of plain text responses, because the formatting cannot later be corrected.

@namedgraph
Copy link

Is this one of the pressing issues in the RDF ecosystem?..

@HughGlaser
Copy link
Collaborator

HughGlaser commented Oct 14, 2021

@TallTed wrote

@HughGlaser [wrote a terrible email reply]

Is that a comment on the content, or just the formatting?


Off topic:
My comments on the formatting.

I find you can't attach images in email replies either, it seems - so I deleted my reply that had one.

I did nothing but cut, paste and add simple plain text to an email I received from the mailing list.
It hadn't crossed my mind that a simple reply to an email would get screwed up - I guess we all know better now.
I am pretty pissed off that the system is so bad that I would generate this level of distraction.

David Booth, thanks ever so for taking the trouble to try to fix my post - it looks like you succeeded pretty well, I see.
Your opening comment in the thread is still the best description, I think.

@dbooth-boston
Copy link
Collaborator Author

Is this one of the pressing issues in the RDF ecosystem?..

I don't think I'd personally put it in the top three, but if one were to design a new higher-level RDF serialization -- hint hint -- then it definitely should be considered. I think it is important to remove every bit of unnecessary complexity that we can from RDF usage, because complexity always seems to multiply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants