Use proptest in codegen tests #80

nicoabie · 2023-12-14T20:53:09Z

What kind of change does this PR introduce?

POC of using proptest to help finish the parser #51
What I want to show with this little PR is how easy is to generate families of tests.

What is the current behavior?

Static unit tests

What is the new behavior?

Generated unit tests

Additional context

There are two crates quickcheck and proptest, I found the documentation of proptest easier to understand for newcomers. I 've used PBT in different languages (prolog, scheme, javascript and python) but never in rust so this is my first time.

https://altsysrq.github.io/proptest-book/intro.html

psteinroe

thanks for the contribution!

unfortunately, there is not much value in these tests, because the tokens we receive from the lexer are exactly the same. For our implementation, whether we parse select 1 from contact or select 48 from apple makes no difference. it only makes a difference when the semantic meaning changes. I think proptests can still be useful, but not on this level. do you have an idea how to apply them in a meaningful manner?

nicoabie · 2023-12-17T14:33:18Z

The value is that you don't need to write all the possible scenarios that will produce the same output from the lexer.

what are all the posible statements that produce vec![TokenProperty::from(SyntaxKind::Select)]?

select *;
select *
select 1;
select 1
select 'a';
select 'a'
select 1 as alias;
select 1 as alias
select 'a' as alias;
select 'a' as alias

and now combinations of the previous selecting more than one field.

maybe more? I didn't get into the details of how it works

And now:

1 can be any number composed of 1, 2, 3, N digits
'a' can be any valid sequence of chars
alias can be any valid sequence of chars

'contact' or 'apple' represent all the possible table names? not really therefore you can have a custom arbitrary that generates valid names that respect postgres constraints.

Length: Up to 63 characters.
Characters: Start with a letter or an underscore, followed by letters, numbers, or underscores.
Case: Case-insensitive, but it's a good practice to use lowercase to avoid confusion.
Reserved Words: Avoid using reserved words like "select," "insert," "update," etc.

how many unit tests would you need to really make sure test_select_with_where works?
let's see all the possible combinations.

all the combinations of columns that go into test_simple_select
all the combinations of different table names that are valid in postgres described in the previous section
all the different operators to compare (you can have a custom arbitrary)
right operands be numbers or strings (you can have a custom arbitrary)
that where is a very simple one, I could have ANDs, ORs, etc (you can have a custom arbitrary)

That is the value of proptesting, there is no way one can write all the combinations by hand.
I guess it depends on the confidence you want/need.

Question is, how do you make the LLM to have that coverage of the domain of the problem?

nicoabie added 2 commits December 14, 2023 17:25

add proptest crate

246c72d

add test with where

c5de57e

psteinroe requested changes Dec 17, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use proptest in codegen tests #80

Use proptest in codegen tests #80

nicoabie commented Dec 14, 2023

psteinroe left a comment

nicoabie commented Dec 17, 2023

Use proptest in codegen tests #80

Are you sure you want to change the base?

Use proptest in codegen tests #80

Conversation

nicoabie commented Dec 14, 2023

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Additional context

psteinroe left a comment

Choose a reason for hiding this comment

nicoabie commented Dec 17, 2023