Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPM Package for gritql #214

Open
transitive-bullshit opened this issue Apr 12, 2024 · 2 comments
Open

NPM Package for gritql #214

transitive-bullshit opened this issue Apr 12, 2024 · 2 comments

Comments

@transitive-bullshit
Copy link

Hey hey 馃憢

Had a great chat w/ Morgante yesterday & am considering integrating gritql into our LLM-based linting pipeline.

I think the main blocker will be exposing gritql as an NPM package so we can use it programmatically from JS/TS instead of invoking gritql as a subprocess.

I'd be happy to work on this after we launch. Just wanted to open an issue to discuss and get feedback on 1) whether your team thinks this is a good idea and 2) any potential blockers we're likely to run into.

Really really love gritql && excited for its potential!

Thanks 馃檹

@morgante
Copy link
Contributor

We definitely want to release a Node SDK at some point, though I'm undecided on which of a few approaches to take:

  1. Make the SDK a wrapper around calling the grit binary (as a subprocess). There's nothing fundamentally wrong with this except a large binary size, and avoids needing to optimize another API surface.
  2. Use napi to create a native Node addon. This has the advantage of native interfacing, but I don't know how well it will work with the (extensive) multithreading we do for performance.
  3. Use our wasm bindings from node. These already exist and avoid needing to distribute native addons for each platform, but I don't think we can do multi-threading.

The other thing to decide would be which API to surface.

@transitive-bullshit
Copy link
Author

transitive-bullshit commented Apr 14, 2024

For now, I ended up going with option 1.

You can see my PoC here with more info on the PR and motivations here. Note that this project is not OSS yet, but it will be hopefully by end of week, so I'm just documenting my exploration as I go for anyone else who's interested in this type of integration.

So far, using grit as a subprocess with @getgrit/launcher as an optional dependency seems like a good solution to maximize compatibility, with the linting engine still working even if grit fails to install on some platforms. I'm currently only using grit apply --dry-run --jsonl with a readonly pattern, and this is only enabled for any markdown rules which contain an optional grit or gritql code block like this one: https://github.com/gptlint/gptlint/blob/feature/gritql/.gptlint/prefer-array-at-negative-indexing.md.

I've done an initial analysis on using gritql to filter the source file context sent to our LLM-based linting engine, and it reduces the context significantly for several of the built-in rules, which is amazing. I still need to do a full suite of evals w/ gritql ablations to see how this impacts rule accuracy / precision / recall / cost / speed at scale, but I'm really happy with the results so far. A lot of files which we were previously naively linting can be quickly ruled out without doing any LLM calls if there are no gritql matches.

btw if anyone else is exploring similar integrations and wants to chat, feel free to DM me. Cheers 馃憢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants