Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database Paths: compliance with resolve-uri(), base-uri(), … #1172

Open
drmacro opened this issue Jul 13, 2015 · 10 comments
Open

Database Paths: compliance with resolve-uri(), base-uri(), … #1172

drmacro opened this issue Jul 13, 2015 · 10 comments
Labels

Comments

@drmacro
Copy link

drmacro commented Jul 13, 2015

Given a database named "foo^bar" and this document at /docs/doc01.xml:

<doc xml:base="foo/bar/">
    <link href="doc2.xml"/>
</doc>

This query:

let $uri := 'foo^bar/docs/doc01.xml'
let $doc1 := collection('foo^bar/docs/doc01.xml')
let $doc2 := doc($uri)
let $doc3 := doc('foo^bar/docs/doc01.xml')
let $link := $doc3/*/link
let $baseURIdoc :=  base-uri($doc1)
let $baseURILink :=  base-uri($link)
let $resolvedURI := () (: resolve-uri($baseURI, string($link/@href)) :)

return<result>
<doc1>{
$doc1
}</doc1>
<doc2>{
$doc2
}</doc2>
<doc3>{
$doc3
}</doc3>
<link-elem base-uri-doc="{$baseURIdoc}"
base-uri-link="{$baseURILink}"
 resolved-uri="{$resolvedURI}"
>
{$link}
</link-elem>
</result>

Fails with the message:

URI argument is invalid: Illegal character in path at index 3: foo^bar/docs/doc01.xml.

The failure is on the baseURILink value--if you change it to "()" (commenting out the call to base-uri()), the query succeeds, and in particular, the value of base-uri() for the document itself succeeds.

Thus there is an issue with getting the base-uri() of an element within the document.

@ChristianGruen
Copy link
Member

You are completely right, the BaseX extensions for handling databases don't go 100% hand in hand with standard XQuery functions, and I am not sure if it's that easy to bring them together. We usually suggest using the helper functions of our Database Module (db:name, db:path).

@drmacro
Copy link
Author

drmacro commented Jul 13, 2015

While I can work around the issue I think it's a potentially serious problem for BaseX because it means that certain processing cannot be implemented using generic XQuery functions.

If I was maintaining a common DITA support package, for example, that would be a problem and I would be tempted to simply not support BaseX simply because I don't have the bandwidth to maintain two different versions of the same code.

For my DITA for Small Teams project I'm committed to using BaseX because it offers significant advantages otherwise (light weight, ease of installation, good support for DTD-based processing, etc.) and I can't change now.

But at some point the base code I'm implementing will need to be generalized for use in any XQuery database and that point the issue will come to fore--certainly for DITA URI resolution is a large part of what the code is doing (because DITA is all about linking).

I understand the challenge in modifying or extending the way BaseX constructs and uses URIs at this point, but I think it's something you must plan for because things are definitely broken as they stand today.

@ChristianGruen
Copy link
Member

But at some point the base code I'm implementing will need to be generalized for use in any XQuery database and that point the issue will come to fore--certainly for DITA URI resolution is a large part of what the code is doing (because DITA is all about linking).

I guess that, at least today, there will be no chance to do it all without vendor-specific code. Historically, it's partially because XQuery is completely database-agnostic, and as a consequence, every XMLDB integrated completely different ways to retrieve, store and update XML on document, collection or database level.

But of course I agree that it would be nice to be able to completely rely on the standards at some time in future.

@ChristianGruen
Copy link
Member

ChristianGruen commented Sep 17, 2015

Copied from #1171:

Given this document in a repo named "uri-test" at the location "/docs/doc02.xml":

<doc xml:base="foo/bar/">
      <link href="doc2.xml"/>
    </doc>

and this query run from the admin panel:

let $uri := 'uri-test/docs/doc-01.xml'
let $doc1 := collection('uri-test/docs/doc-01.xml')
let $doc2 := doc($uri)
let $doc3 := doc('uri-test/docs/doc02.xml')
let $link := $doc3/*/link
let $baseURI := base-uri($link)
let $resolvedURI := resolve-uri($baseURI, string($link/@href))

return<result>
<doc1>{
$doc1
}</doc1>
<doc2>{
$doc2
}</doc2>
<doc3>{
$doc3
}</doc3>
<link-elem base-uri="{$baseURI}"
 resolved-uri="{$resolvedURI}"
>
{$link}
</link-elem>

</result>

I get this failure from resolve-uri():

Base URI is not absolute: "doc2.xml".

The problem is that document-uri() returns "uri-test/docs/doc02.xml", which is not an absolute URI.

But it needs to be one for the normal XPath functions that expect absolute URIs to work.

Either the built-in implementation of document-uri() needs to recognize this URI as being absolute (because it starts with the name of a repository) or, better, BaseX needs to provide a URL scheme that can be used in this case, e.g. "basex://uri-test/docs/doc02.xml".

Without this, there's no way to use normal URI-manipulation functions.

For example, I'm trying to use @xml:base to determine the effective URL for a relative reference made within the scope of the @xml:base, but that fails within BaseX because of this issue.

@ChristianGruen ChristianGruen changed the title Non-URI characters in database names cause base-uri() to fail Database Paths: compliance with resolve-uri(), base-uri(), … Sep 17, 2015
ChristianGruen added a commit that referenced this issue May 4, 2016
* Base URIs of databases now start with slash (#1172)
* Specified database URIs may now start with slash
* If IO instance has no name, it will now only be created via dbName()
@ChristianGruen
Copy link
Member

I spent some time in working through our database URI handling to facilitate the use of standard XQuery functions. This is the new status quo:

  • Database URIs may now start with a slash
  • fn:resolve-uri will now also accept such base URIs

A new snapshot is online. @drmacro: If you have time for it, your feedback will be welcome. The handling of invalid URI characters (such as ^) hasn’t changed so far, as it might introduce some incompatibilities with existing databases.

@ChristianGruen ChristianGruen added this to the 8.7 milestone Dec 7, 2016
@ChristianGruen
Copy link
Member

The issue applies to various other characters that are no valid URI characters (see #1464).

@drmacro
Copy link
Author

drmacro commented Jun 22, 2017

Somehow I didn't see the May 4 update for this issue. I'll see if I can evaluate this fix in the context of my current D4ST code.

@ChristianGruen ChristianGruen removed this from the 9.0 milestone Oct 23, 2017
@ChristianGruen
Copy link
Member

Related feedback from the mailing list:
https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg12563.html

@ChristianGruen
Copy link
Member

Related: https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg13809.html

Could possibly be tackled with BaseX 10.

@ChristianGruen ChristianGruen added this to the 10 milestone Oct 25, 2021
@ChristianGruen
Copy link
Member

We came across too many implications that would need to be tackled, so we’ll postpone this to a later version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants