Skip to content

How should I measure the similarity of 2 SQLs? #2870

Closed Answered by izeigerman
CBQu asked this question in Q&A
Discussion options

You must be logged in to vote

The diff function returns an edit script which also contains all nodes that have been unchanged. Those nodes will be wrapped into Keep.

So the upper bound (denominator) would the length of the edit script. The numerator will be a number of Keep nodes in the edit script. For example:

edit_script = diff(...)
numerator = len([e for e in edit_script if isinstance(e, Keep)])
denominator = len(edit_script)
score = numerator / denominator

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by georgesittas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants