Resource ID/source #8

jankaszel · 2020-07-27T13:52:29Z

Hi! As far as I know, RecogitoJS currently does not support specifying and identifier for the annotated resource. While trying to integrate a general purpose annotation server with RecogitoJS, these identifiers become crucial for filtering annotations from a LDP annotation container.

The Web Annotation Data Model specification states in regard to target selector sources:

The relationship between a Specific Resource and the resource that it is a more specific representation of.
There must be exactly 1 source relationship associated with a Specific Resource. The source resource may be described in detail, as defined above, or be just the resource's IRI.

So, to be compliant with the WADM, RecogitoJS should possibly adhere to that requirement and add a source field to the target object. However, I imagine it being difficult to realise it with the current API design, as RecogitoJS selects only a DOM fragment.

For now, it should suffice to provide the target source externally from RecogitoJS (e.g., during the createAnnotation event). I can however imagine a versatile text referencing mechanism being built into RecogitoJS: One that can annotate textual parts of a HTML document, but reference the annotation selection globally within the document IRI.

Maybe the following best practices report can be of help. I'd be happy to give this some further thought! https://www.w3.org/TR/fragid-best-practices/

P.S.: I just stumbled upon §4.2.9 "Refinement of Selection", which relates to the above issue. Multiple selectors can be combined to refine the selection: https://www.w3.org/TR/annotation-model/#refinement-of-selection

The text was updated successfully, but these errors were encountered:

rsimon · 2020-07-27T17:52:19Z

My (naive) understanding is that the source attribute can/should simply be the URI of the document. So perhaps it would be sufficient to automatically set it to the current URL?

FWIW: Annotorious automatically sets the source to the image URI. I was thinking that we could also add a config parameter which allows implementers to override it with a custom value. Would that be a suitable approach?

jankaszel · 2020-07-28T08:10:22Z

Your assumption on source is true—it's the IRI of the annotated resource. RecogitoJS could take the current document URI for that.

However, the selectors RecogitoJS uses don't allow to uniquely identify the selection within the DOM of the resource, since—to my understanding—the selectors specify text ranges relatively to the container DOM node that has been supplied to RecogitoJS.

With the above technique on using refined selections, we could identify that container node—either...

by requiring the container node to have a ID. With a fragment selector, we could refer to that ID and refine the selection with the currently used text quote and text range selectors. Or...
by using an XPath selector to (hopefully) uniquely identify the node within the DOM. Here, again, the selection could be refined with the currently used text selectors within the specified node.

rsimon · 2020-07-28T13:22:49Z

Both solutions look good, and I think you could just automatically pick one, depending on whether the annotated DOM node has an id or not.

What worries me a bit that this will break compatibility with the current implementation, since we now would have a Fragmen-/XPathSelector refined by TextQuote and OffsetSelectors, rather than the TextQuote-/OffsetSelector directly. Nothing that couldn't be handled with a few basic ifs, of course. But not a one-liner either.

Another question is whether Recogito should the verify the validity of the XPath/Fragment-info... (which I think would be pointless since it's up to the implementer to init RecogitoJS on the right DOM node, anyway?)

rsimon · 2020-07-28T13:34:13Z

PS: if you want to give this a try, the code that generates targets from a user selection is here:

https://github.com/recogito/recogito-client-core/blob/master/src/selection/SelectionUtils.js#L40

The code that renders annotations from their WebAnno form is, essentially, here:

https://github.com/recogito/recogito-client-core/blob/master/src/highlighter/Highlighter.js#L44

However this would need some revising since that line relies on built-in helper functions (.start, .end) that are part of the WebAnnotation class. But these should rather be in some external helper function (perhaps part of Highlighter), in order to keep WebAnnotation clean.

jankaszel · 2020-07-30T13:59:12Z

Thanks for the input! On the question of validating XPath/fragment selectors: I'm also unsure about this. On the one hand, RecogitoJS is a general-purpose annotation library, on the other it adheres to the Web Annotation spec which asks for unique identification.

Maybe we could realize a trade-off between both by either strictly requiring an ID attribute on the container or alternatively issueing a console.warn() if no ID is present but use the XPath selector as backup.

Either way, recogito.addAnnotations() could include annotations that address a different target (e.g., annotations on multiple targets within the same LDP annotation container). It'd be easy for RecogitoJS to verify the fragment selector ("does it match the container ID?"), but it could introduce entropy for users who use RecogitoJS differently (i.e., with different backends) and don't care about the resource IRI. For backwards compatibility, we could simply render annotations without XPath/fragment selector and ignore those who explicitly target a different fragment/XPath (or introduce an option for that).

jankaszel · 2020-09-13T10:37:00Z

After giving this issue some further thought, I suspect that resource ID validation doesn't need to be part of RecogitoJS itself after all. My main concerns:

Introducing resource ID validation could lead to backwards incompatibilities.
The actual requirement of resource IDs might be tied to specific use cases. Platform-related storage like Firebase or platform-independent like LDP Annotation servers use resource IDs in different ways: A platform might need to identify a piece of text, while decoupled storage will need to deterministically identify a DOM node on a website.
Resource IDs can't be generated deterministically: DOM node ID or XPath? What's the exact hostname? And so on.
Comparing and validating XPath selectors could lead to some non-trivial work (e.g., which DOM attributes to include?).

With that in mind, I'm happy to solve the resource ID issue outside of RecogitoJS, e.g. with the web annotation adapter. While the Web Annotation data model specification demands a resource ID, its necessity depends on the demand for interoperability of the given platform (as mentioned above with Firebase).

@rsimon If you agree, feel free to close this issue!

rsimon · 2020-09-14T06:22:15Z

Thanks for the update! I agree that enforcing the standard might be beyond the job of RecogitoJS. However, I still think you're absolutely right on the issue of the (refined) FragmentSelector vs. the current situation, which uses only the TextOffset selector. Therefore I'm in favor of keeping the issue open. (I've been focusing predominantly on Annotorious recently, so the work on RecogitoJS has slowed down a bit. But it will get picked up again eventually ;-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource ID/source #8

Resource ID/source #8

jankaszel commented Jul 27, 2020 •

edited

rsimon commented Jul 27, 2020

jankaszel commented Jul 28, 2020

rsimon commented Jul 28, 2020

rsimon commented Jul 28, 2020 •

edited

jankaszel commented Jul 30, 2020

jankaszel commented Sep 13, 2020 •

edited

rsimon commented Sep 14, 2020

Resource ID/source #8

Resource ID/source #8

Comments

jankaszel commented Jul 27, 2020 • edited

rsimon commented Jul 27, 2020

jankaszel commented Jul 28, 2020

rsimon commented Jul 28, 2020

rsimon commented Jul 28, 2020 • edited

jankaszel commented Jul 30, 2020

jankaszel commented Sep 13, 2020 • edited

rsimon commented Sep 14, 2020

jankaszel commented Jul 27, 2020 •

edited

rsimon commented Jul 28, 2020 •

edited

jankaszel commented Sep 13, 2020 •

edited