Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IRI expansion with missing @base does not conform to RFC 3986 #187

Open
RinkeHoekstra opened this issue Nov 17, 2023 · 1 comment
Open

Comments

@RinkeHoekstra
Copy link

RFC 3986 section 5.1 specifies that relative URIs should be expanded against the document's base URI. In absence of an explicit base, there are prescribed steps to determine the base IRI for a given document:

5.1.1. Base URI Embedded in Content . . . . . . . . . . 29
5.1.2. Base URI from the Encapsulating Entity . . . . . 29
5.1.3. Base URI from the Retrieval URI . . . . . . . . 30
5.1.4. Default Base URI . . . . . . . . . . . . . . . . 30

The current implementation in pyLD ignores the last two requirements. For 5.1.3 this is understandable, as the library only operates on a data payload. However, 5.1.4 is the catch-all that would ensure that @id values are always expanded to absolute IRIs.

In absence of this, non-IRI @id values in documents that do not explicitly specify a base in a context are not expanded to an absolute IRI. This means that the to_rdf function ignores them when producing N-Quads output. This is a showstopper for RDFLib/rdflib#2308.

The JSON-LD spec does allow for a means to prevent expansion against a base by setting @base to null (see https://www.w3.org/TR/json-ld/#base-iri) but does not specify that null is the default.

This violates test t0060 in and t0060.

The output should be something similar to (with a different application-specific base):

[
  {
    "@id": "https://w3c.github.io/json-ld-api/tests/document-relative",
    "@type": [ "https://w3c.github.io/json-ld-api/tests/expand/0060-in.jsonld#document-relative" ],
    "http://example.com/vocab#property": [
      {
        "@id": "http://example.org/document-base-overwritten",
        "@type": [ "http://example.org/test/#document-base-overwritten" ],
        "http://example.com/vocab#property": [
          {
            "@id": "https://w3c.github.io/json-ld-api/tests/document-relative",
            "@type": [ "https://w3c.github.io/json-ld-api/tests/expand/0060-in.jsonld#document-relative" ]
          },
          {
            "@id": "../document-relative",
            "@type": [ "#document-relative" ],
            "http://example.com/vocab#property": [ { "@value": "only @base is cleared" } ]
          }
        ]
      }
    ]
  }
]

But the output of pyld is:

  {
    "@id": "../document-relative",
    "@type": [
      "#document-relative"
    ],
    "http://example.com/vocab#property": [
      {
        "@id": "http://example.org/document-base-overwritten",
        "@type": [
          "http://example.org/test/#document-base-overwritten"
        ],
        "http://example.com/vocab#property": [
          {
            "@id": "../document-relative",
            "@type": [
              "#document-relative"
            ]
          },
          {
            "@id": "../document-relative",
            "@type": [
              "#document-relative"
            ],
            "http://example.com/vocab#property": [
              {
                "@value": "only @base is cleared"
              }
            ]
          }
        ]
      }
    ]
  }
]

The resulting N-Quads only returns a single triple:

http://example.org/document-base-overwritten> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/test/#document-base-overwritten> .

This is not a duplicate of #143 as that issue is about a case where the @base is specified.

The problem appears to reside here:

pyld/lib/pyld/jsonld.py

Lines 3186 to 3202 in 316fbc2

# handle @base
if '@base' in ctx:
base = ctx['@base']
if base is None:
base = None
elif _is_absolute_iri(base):
base = base
elif _is_relative_iri(base):
base = prepend_base(active_ctx.get('@base'), base)
else:
raise JsonLdError(
'Invalid JSON-LD syntax; the value of "@base" in a '
'@context must be a string or null.',
'jsonld.SyntaxError', {'context': ctx},
code='invalid base IRI')
rval['@base'] = base
defined['@base'] = True

Where in absence of a@base (or an explicit null base, see https://www.w3.org/TR/json-ld/#base-iri) a default base needs to be set.

@RinkeHoekstra
Copy link
Author

RinkeHoekstra commented Nov 17, 2023

I started wondering why the test suite doesn't pick this up, and the explanation is in the runtests.py file:

pyld/tests/runtests.py

Lines 259 to 264 in 316fbc2

# expand @id and input base
if 'baseIri' in manifest.data:
data['@id'] = (
manifest.data['baseIri'] +
os.path.basename(str.replace(manifest.filename, '.jsonld', '')) + data['@id'])
self.base = self.manifest.data['baseIri'] + data['input']

Because the manifest files specify a baseIRI value, the test will always run with a base specified. This means that the situation reported in this issue is not recognised.

Rewriting the test is not an option as with an unspecified base IRI, the output will be application specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant