Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextQuoteSelector selector yielding an infinite number of matches #112

Open
ziodave opened this issue Jun 25, 2021 · 2 comments
Open

TextQuoteSelector selector yielding an infinite number of matches #112

ziodave opened this issue Jun 25, 2021 · 2 comments
Labels
🦟 Bug Something isn't working

Comments

@ziodave
Copy link

ziodave commented Jun 25, 2021

Hello,

Can you help me out, I tried to understand what's going on here: I created the following, which indeed works, i.e. the text is highlighted in DOM.

However it enters an infinite loop in for await (const match of matches) and I couldn't understand the cause:

async function highlightMatcher(matcher) {
  const matches = matcher(document.body);
  for await (const match of matches) {
    highlightRange(match); //    <----- this is called an infinite number of times
  }
}

const createMatcher = makeRefinable((selector) => {
  const innerCreateMatcher = {
    TextQuoteSelector: createTextQuoteSelectorMatcher,
    TextPositionSelector: createTextPositionSelectorMatcher,
    CssSelector: createCssSelectorMatcher,
    RangeSelector: makeCreateRangeSelectorMatcher(createMatcher)
  }[selector.type];
  if (!innerCreateMatcher) {
    throw new Error(`Unsupported selector type: ${selector.type}`);
  }
  return innerCreateMatcher(selector);
});

highlightMatcher(createMatcher({
  value: "#my-div",
  type: "CssSelector",
  refinedBy: {
    type: "TextQuoteSelector",
    exact: "My Text"
  }
}));
@Treora
Copy link
Contributor

Treora commented Jun 25, 2021

Can reproduce. The issue is that if the DOM is changed (by highlightRange/highlightText) while the TextQuoteSelector is being matched, it can end up finding the quote again and again. (note that the refinedBy part is not relevant here, the TextQuoteSelector could be passed to createMatcher directly)

I would like our text matching to be resistant to DOM changes that do not modify the text content. I am not sure how easy or hard this may be to achieve. If we cannot fix this, it seems unhelpful to return an (async) iterable, and we’d better just return an array with the matches.

For now, best is to avoid changing the DOM while running the matcher, and first collect the matches in an array. For example, we do this in the demo:

  const matchAll = createMatcher(selector);
  const ranges = [];

  for await (const range of matchAll(target)) {
    ranges.push(range);
  }

  for (const range of ranges) {
    const removeHighlight = highlightText(range);
    moduleState.cleanupFunctions.push(removeHighlight);
  }

I see now that I gave the simpler but broken example in the getting started page. I will add update and a note!

Treora added a commit that referenced this issue Jun 25, 2021
Treora added a commit that referenced this issue Jul 9, 2021
@Treora Treora changed the title refinedBy selector yielding an infinite number of matches TextQuoteSelector selector yielding an infinite number of matches Jul 9, 2021
@Treora
Copy link
Contributor

Treora commented Jul 9, 2021

For now, I just added warnings to the documentation. Longer term, I’d really like to fix this, but it is not trivial to deal with search in a changing DOM. However, while robustness to arbitrary DOM changes may be hard, it gets easier if we limit ourselves to limited changes. In any case, it seems fair to limit our effort to DOM changes that keep the scope’s textContent unchanged. Within these changes, we could consider:

  1. changes that split text nodes (as done by highlightText)
  2. changes that merge text nodes
  3. changes that wrap text nodes in elements, or unwrap them (also done in highlightText; but I suppose this is not a problem for us anyway, as we simply iterate through text nodes)
  4. changes that replace a node with another that has the same text content.

Besides these kinds of changes, we could distinguish between changes on the nodes where the search is currently looking, changes before it, and changes ahead of it. Perhaps changes before and after it need not be a problem either, but we’d probably more often deal with changes at the nodes that are currently searched in (for highlighting and such).

I would probably start with changing the approach of the abstract text quote matcher implementation. Currently, it manually walks through chunks, but I think it could be simpler by using a TextSeeker to do this. Then, perhaps we can create robustness to split nodes/chunks in there, by e.g. always checking if this.offsetInChunk < chunk.length at the start of any function, and walking to the correct chunk if not. Perhaps there will be more problems though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🦟 Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants