Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn AsyncGenerators into Generators #115

Open
DellCliff opened this issue Jul 18, 2021 · 5 comments
Open

Turn AsyncGenerators into Generators #115

DellCliff opened this issue Jul 18, 2021 · 5 comments
Labels
⭐️ Enhancement Improvement or new feature for users

Comments

@DellCliff
Copy link

What is the reason behind matchers producing AsyncGenerators. They don't do IO or wait on callbacks as far as I can see. Having an async function which doesn't need to be one has serious disadvantages, like async-infection up the call-chain, race conditions and so on.

A simple modification of packages/selector/src/text/match-text-position.ts removes the unnecessary async.

export function textPositionSelectorMatcher(
  selector: TextPositionSelector,
): <TChunk extends Chunk<any>>(
  scope: Chunker<TChunk>,
) => AsyncGenerator<ChunkRange<TChunk>, void, void> {
  const { start, end } = selector;

  return async function* matchAll<TChunk extends Chunk<string>>(
    textChunks: Chunker<TChunk>,
  ) {
    const codeUnitSeeker = new TextSeeker(textChunks);
    const codePointSeeker = new CodePointSeeker(codeUnitSeeker);

    codePointSeeker.seekTo(start);
    const startChunk = codeUnitSeeker.currentChunk;
    const startIndex = codeUnitSeeker.offsetInChunk;
    codePointSeeker.seekTo(end);
    const endChunk = codeUnitSeeker.currentChunk;
    const endIndex = codeUnitSeeker.offsetInChunk;

    yield { startChunk, startIndex, endChunk, endIndex };
  };
}
export function textPositionSelectorMatcher(
  selector: TextPositionSelector,
): <TChunk extends Chunk<any>>(
  scope: Chunker<TChunk>,
) => Generator<ChunkRange<TChunk>, void, void> {
  const { start, end } = selector;

  return function* matchAll<TChunk extends Chunk<string>>(
    textChunks: Chunker<TChunk>,
  ) {
    const codeUnitSeeker = new TextSeeker(textChunks);
    const codePointSeeker = new CodePointSeeker(codeUnitSeeker);

    codePointSeeker.seekTo(start);
    const startChunk = codeUnitSeeker.currentChunk;
    const startIndex = codeUnitSeeker.offsetInChunk;
    codePointSeeker.seekTo(end);
    const endChunk = codeUnitSeeker.currentChunk;
    const endIndex = codeUnitSeeker.offsetInChunk;

    yield { startChunk, startIndex, endChunk, endIndex };
  };
}
@Treora
Copy link
Contributor

Treora commented Jul 19, 2021

You are completely correct that there is no need for the current functions to be async. The reason they are async is to make it easier to do async stuff in the future, without needing to change the API. For example, fuzzy text search may be computationally expensive, and could be offloaded to a worker thread.

Note that e.g. a TextPositionSelector can never produce multiple results, so besides not needing to be async its matcher would not even have be a generator. However, the idea is to have a coherency in the function signatures, and making all matchers return async generators seemed the most flexible option, allowing for easy composition of matchers; that is, to have a single function able to handle various types of Selectors, dispatching them to the appropriate functions, as we do in the demo.

We are aware that this choice of generality does impose constraints on the users of the functions, who need to await each result, and whose functions thus have to become async themselves; ‘async infection’ as you call it. We have been thinking about a practical way to provide a synchronous API in addition to an asynchronous API. See the discussion in #81; suggestions are welcome!

@DellCliff
Copy link
Author

DellCliff commented Jul 19, 2021

This kinda smells like speculative generality. IMO I would make those matchers not adhere to some common interface and let them have their own signature. A user can then pick and choose. Maybe supply helper functions which take a matcher and produce one with the desired signature, or offer both sync and async out of the box or fuzzy/non-fuzzy. One can always make a sync function into async one, but not the other way around.

@tilgovi
Copy link
Contributor

tilgovi commented Jul 20, 2021

One can always make a sync function into async one, but not the other way around.

Yes, that's the reason for making everything async now. If we compose higher level pipelines with sync interfaces then we won't be ample to incorporate async building blocks, but if all the composition is asynchronous then it admits synchronous or asynchronous building blocks.

I hear the concern about speculative generality, but it's not baseless speculative. I had use cases like streaming corpuses in mind as we designed this.

Initial designs had an interface that allowed selector implementations to implement synchronous or asynchronous functions (or both), but I dropped that to make the surface area smaller.

In any case, feedback is appreciated and I'm open to changing this but I don't want to jump to conclusions. If you find you're really working with the APIs a lot and async is really proving cumbersome, I'd love to see examples of the kinds of code that feels hard to modify to be asynchronous.

@DellCliff
Copy link
Author

One of my use cases is a Vue.js component. The only way I get it to work with async is using a cache (called "computed" in Vue.js) since Vue.js needs functions to return the actual value (string) and not Promise. The function now has to check the cache on each call, return the computed value if it already exists or return an empty string and trigger the async computation. It adds another layer of complexity on the side of the user of the library, of what could have been 3 lines of sync code.

I understand the pipeline idea, but I'd rather keep the function signatures true and minimal to what they actually need and handle pipelines through composition through helper functions. That way users can opt-in. Right now, there is no way for them to opt-out.

@tilgovi
Copy link
Contributor

tilgovi commented Aug 18, 2021

I've been thinking more about this and I think we should try to tackle it for the next release.

One option would be to just make sure that our API allows context to be passed in that makes it possible to process things (synchronously) in chunks, so that higher level, async APIs could limit the amount of work they do before returning to the event loop.

@reckart reckart added the ⭐️ Enhancement Improvement or new feature for users label Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⭐️ Enhancement Improvement or new feature for users
Projects
None yet
Development

No branches or pull requests

4 participants