Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Chromium/WICG’s Text Fragment specification #60

Open
tilgovi opened this issue Feb 27, 2020 · 4 comments
Open

Implement Chromium/WICG’s Text Fragment specification #60

tilgovi opened this issue Feb 27, 2020 · 4 comments
Labels

Comments

@tilgovi
Copy link
Contributor

tilgovi commented Feb 27, 2020

Spec: https://wicg.github.io/ScrollToTextFragment/#parsing-the-fragment-directive

@tilgovi
Copy link
Contributor Author

tilgovi commented Feb 27, 2020

We might also polyfill just the window.location.fragmentDirective as a building block for this: https://wicg.github.io/ScrollToTextFragment/#feature-detectability

@Treora
Copy link
Contributor

Treora commented Jun 18, 2020

Is this still a plan? We have ditched our other fragment identifier effort (see PR #71).

Besides making a parser for the fragment identifier syntax itself, I suppose we would want to implement the algorithm for resolving it. May well be worthwhile if others are not already doing this!

Also, I added this topic to the new Tech Radar page on the wiki.

@Treora
Copy link
Contributor

Treora commented Sep 3, 2020

Besides making a parser for the fragment identifier syntax itself, I suppose we would want to implement the algorithm for resolving it. May well be worthwhile if others are not already doing this!

Update: last month, I decided to take a stab at this and implemented the algorithm in TypeScript: https://code.treora.com/gerben/text-fragments-ts

See also: WICG/scroll-to-text-fragment#135

Perhaps it could some day be considered adopting this implementation in Annotator, but I suppose it is a bit early as the spec is still in flux, and so far my impression is that few people (want to) adopt it.

I expect that a significant disadvantage for many annotation-ish tools would be that, as it is currently defined, the expressivity of the text fragment identifier is limited by only being able to point at whole words. See my issue #37 on the spec’s repo.

To use the fragment syntax as a standard for use within, and exchange between, annotation softwares, one could of course choose to interpret it differently (except when activating a ‘browser compat mode’), e.g. as a WA RangeSelector containing two TextQuoteSelectors. But that might lead to a misleading situation of half-interoperability which seems better to avoid.

Nevertheless, for the goal of making annotations (ex)portable from annotation tools, it could be valuable to have a tool that helps convert (where possible) annotation targets to browser-compatible text-fragment URLs. And vice versa to import them.

@Treora
Copy link
Contributor

Treora commented Nov 5, 2020

We shortly discussed this topic in today’s call while looking over the open issues. We agreed that just parsing the syntax is not much use, as it comes together with a specific algorithm for finding the target text, which differs from the Web Annotation model.

A quick overview of things we could provide:

  • anchoring of a fragment directive: I implemented the essence of this already (see above comment); we could provide a function that simply wraps my implementation. (we even discussed the option of importing my whole implementation into this repo, though to me it feels cleaner to keep these as separate projects)

  • describing a selection (a Range or perhaps a list of Ranges) as a fragment directive: this would need a custom adaptation of describeTextQuote, modified to ensure that the total quote (including prefix&suffix) ends at word boundaries (note that at least this is possible now, since a recent change in the spec). Also, it should use a textStart,textEnd pair (again to be cut at word boundaries) instead of an exact quote when the selection crosses block elements. And perhaps there are more hurdles.

  • convert fragment directive ⇒ Selector: If the document is available, we could simply anchor it and describe it in the other format. Without the document at hand, we could also convert it, although with a (hopefully small) risk that the differences in specifications will make it fail to anchor or (worse) point at something else. I think the conversion could, after syntax parsing, be done with more or less this simple code:

      ({ prefix, textStart, textEnd, suffix }) => textEnd
          ? {
              type: 'RangeSelector',
              start: { type: 'TextQuoteSelector', prefix, exact: textStart },
              end: { type: 'TextQuoteSelector', prefix: textEnd, exact: '', suffix }
          }
          : { type: 'TextQuoteSelector', prefix, exact: textStart, suffix }
    

    (note the little hack of using prefix: textEnd, exact: '' because RangeSelector’s end is exclusive and textEnd should nevertheless be included in the target)

  • convert Selector ⇒ fragment directive: the reverse of the above. Again, if the document is available, we could simply anchor it and describe it in the other format. But in case the document is not available, conversion in this direction would only possible if the selector is of the type/shape shown in the above example code.

I suppose it is mainly a matter of demand and priority whether we’ll implement any of these. I might actually try tackle some of these points soon, as I would like to use these features myself.

@Treora Treora changed the title Implement a parser for ScrollToTextFragment syntax Implement Chromium/WICG’s Text Fragment specification Nov 5, 2020
Treora added a commit that referenced this issue Jan 8, 2021
A TextQuoteSelector can add as much prefix and suffix as desired. Until now, we only added prefix and suffix as much as was strictly necessary to disambiguate the target from other occurrences of the exact same text in the same document. When an annotation should still anchor on a modified version of the document, it can be helpful to add a little more context, in order to be robust against the ambiguity that would result if after such a modification the quoted text appears in more places than before.

Also, it seems neat to have the prefix and suffix contain whole words instead of stopping halfway inside a word. This makes it pleasant to read when user interfaces expose the prefix&suffix. Also it makes the implementation closer to being compatible with the WICG TextFragments spec (see #60).

This PR thus adds two ways to generate less minimal prefixes&suffixes:

    - Round them up to the next whitespace.
    - Optionally add prefix&suffix around a short quote even if it is not
    ambiguous.

I made rounding up to whitespace the default behaviour, while the previous behaviour can still be obtained using the option minimalContext. For the context around short quotes I would not know what would be a good default (might depend on use case and document length?); so I left it at 0 for now, i.e. the feature is turned off by default.

This PR also refactors the implementation a bit, reusing the seekers instead of creating new ones on every match.

To pass options, I added an options object as the last function parameter. I thought we might want to move the scope parameter into this option object too, but scope is specific to the DOM implementation, so I’m not sure if that is desirable.

I added options for anything that would otherwise feel like we’re hardcoding a ‘magic number’, but of course quite some choices on how exactly the algorithm works are hardcoded opinions too. I doubted between a few variations, but thought this the most straightforward with I hope generally sensible results. To be seen in practice, I guess.

I added basic tests for each of the new behaviours. Currently these tests are still in the dom package, but should be refactored and moved into the selector package as the actual algorithms being tested reside there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants