Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ignoring differences that only consist of variable/function name changes (eg. within minified JavaScript) #819

Open
0xdevalias opened this issue Jan 31, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@0xdevalias
Copy link
Contributor

0xdevalias commented Jan 31, 2024

Is your feature request related to a problem? Please describe.

Currently, when diffing minimized bundled JavaScript code, there's a significant amount of 'noise' due to the bundler often changing the minified variable names between builds. This can obscure the real changes and make the diff output less useful for understanding code changes.

Describe the solution you'd like

I propose adding a feature to diffsitter that ignores changes in variable/function names within minified JavaScript code. This improvement would drastically reduce the noise in diffs of minimized source builds, allowing for a clearer focus on the actual code changes rather than the fluctuation of variable names.

Describe alternatives you've considered
As workarounds, I've experimented with various git diff modes like patience, histogram, and minimal to somewhat reduce the diff size. For instance, changing the diff algorithm can alter the number of lines in the diff output significantly:

⇒ git diff --diff-algorithm=default -- unpacked/_next/static/chunks/pages/_app.js | wc -l
  116000

⇒ git diff --diff-algorithm=patience -- unpacked/_next/static/chunks/pages/_app.js | wc -l
   35826

Nonetheless, these approaches still capture variable name changes, which can introduce a substantial amount of 'noise', especially in larger files.

Other potential solutions include pre-processing the files to normalize variable/function names or post-processing the diff output to filter out sections where the only changes involve variable/function names.

Additional context

The ideal solution would provide diff output in text format, but the actual diffing would occur at the AST level, ignoring variable/function name changes.

I suspect this might be possible already (at least to some degree) with the following; though I haven't found any good examples/docs to help explain how to use it better yet:

  • https://github.com/afnanenayet/diffsitter
    • A tree-sitter based AST difftool to get meaningful semantic diffs

    • You can also filter which tree sitter nodes are considered in the diff through the config file.

    • https://github.com/afnanenayet/diffsitter#node-filtering
      • You can filter the nodes that are considered in the diff by setting include_nodes or exclude_nodes in the config file. exclude_nodes always takes precedence over include_nodes, and the type of a node is the kind of a tree-sitter node.

        This feature currently only applies to leaf nodes, but we could exclude nodes recursively if there's demand for it.

I'm going to hopefully play around with it a bit more now, but wanted to capture this while it was fresh in my mind.

See Also

@afnanenayet
Copy link
Owner

So this works well with an idea I had before - allow users to supply tree-sitter queries to filter which nodes can be diffed on. That is general enough that you could filter for/against certain node types and ignore variable names, for example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants