Use Local Type Inference approach to typecheck polymorphic calls #564

xxdavid · 2024-05-03T16:13:55Z

Hi!

After months of working on this, I finally come up with a PR that implements the ideas captured in #553. It behaves in the same way as I described in #553 (comment), I only changed the second error message a tiny bit.

The main changes are in the commit b00cadd. It implements the new approach to typechecking polymorphic calls based on Local Type Inference. I think it is quite straightforward but I already know the algorithm well so I am probably biased. However, it should be at least much simpler than the original constraint solver. I tried to document it in the wiki (together with the rigid type variables), so you may read it as documentation to the code. I'm also writing a thesis about this change, so I will link it to the wiki when it's out, to provide another source of documentation.

The main benefit of this change should be error messages that are understandable even when polymorphism is involved. Only in a few cases, a more complex error message shows up but even then, it should be clear what's going on. I think that if this new approach proves successful, we can even enable --solve_constraints by default in the future.

As a byproduct, all the "shape limitations" of the original constraint solver are gone (this could have been also solved by fixing the constraint generation). A type variable can now be instantiated to both a tuple and an atom without any problems.

Other big changes are in a3029e1. In this commit, I remove the constraints and clauses for flexible type variables from most of the code. Constraints are now generated and propagated only in functions related to typechecking polymorphic calls and functions related to subtyping. This should make the code simpler and easier to read. It also trivially solves all but one "don't drop the constraints" TODOs because constraints are now combined only in the subtyping functions and in type_check_poly_call/4. (Please do not squash commits when merging the PR as this commit would make the resulting commit unreadable.)

I also fixed some things along the way (fc432f2, 2323f10, e9a0afa) and added a lot of tests.

While the new approach is better in some cases (like the minus/1 in poly_fail.erl), it is also weaker in some other cases. I see the biggest limitation in that the type of anonymous functions passed to polymorphic functions is now inferred via type_check_expr/2 and not type_check_expr_in/2 as before, and thus some errors may not be discovered. This can, however, be solved in the future by solving the constraints gathered from all other arguments first and then typechecking the anonymous function using the information from the solved constraints. But I think it is better to have understandable errors than just more of them. Once we have this, let's then focus on inferring the type of anonymous functions passed to polymorphic functions.

The function allows it, only the spec didn't.

Consider a function spec: -spec negate(bool()) -> bool(); (integer()) -> integer(). Sometimes, we want to make a single function type out of these two clauses. Using an outer union like fun((bool()) -> bool()) | fun((integer()) -> integer()) leads to a loss of precision, because this type says that we either have a function that can take a bool, or we have a function that can take an integer (and we do not know which case it is). This means that we cannot apply a function of this type to anything unless we assert type or something like that. But we know for sure that it can take both, bools and integers! Therefore, using an inner union like fun((bool() | integer()) -> bool() | integer()) seems better to me as this way we know that it can take both. Of course, an ideal solution would be to use intersection types like fun((bool()) -> bool()) & fun((integer()) -> integer()), but Erlang doesn't have them in its syntax and so don't we in Gradualizer.

Union function types (`fun_ty_union`) do not seem to work. These simple examples should be perfectly valid. I think it throws that error because `expect_fun_type/3` inputs a list of function types (i.e., a clause), containing only a single element (the union of two function types). This leads to expecting an intersection function type later on (which is wrong). Note that these union function types aren't that much useful in practice as when you want to call them, all your argument types must be subtypes of parameter types of all the possible function types. That is, when you have a function of the type fun((integer()) -> integer()) | fun((atom()) -> atom()) then you can only call it with `none()` as we don't know whether the function will accept integers or bools and no other type is a subtype of `integer()` and `atom()`.

xxdavid · 2024-05-23T16:16:45Z

@zuiderkwast @erszcz I know you're busy, but do you think you'll have time for a code review?

erszcz · 2024-05-23T19:23:20Z

@xxdavid Sorry for the lack of responsiveness. Life got surprisingly busy the last few months. I can imagine it being quite frustrating given the terrific effort you've put into this work.

That being said, I've just gone through the wiki description - it's excellent!
I'll also go over the code of the PR, in the next 1-2 days, and try it out on some examples. Please bear with me.

xxdavid · 2024-05-23T20:19:05Z

No need to apologize! :) It's a volunteer project after all and I can imagine life getting busy.

Thank you for your kind words and for planning to go over it. I'll stay tuned.

erszcz

This PR is a fantastic piece of work, it's truly impressive!

I've left a few superficial comments, there's nothing serious I could find.

I've also tried it out one some examples and the new and improved error messages are great!

src/gradualizer_fmt.erl

src/constraints.hrl

src/gradualizer_lib.erl

src/typechecker.erl

We use the Local Type Argument Synthesis algorithm from https://www.cis.upenn.edu/~bcpierce/papers/lti-toplas.pdf (a bit extended for our use case). This replaces the global constraint generation and constraint solver. It should result in a much more precise type error messages (pinpointing the exact location, not just the function) and more streamlined code (refactor will be made in a subsequent commit). For more details, see the original paper or https://github.com/josefs/Gradualizer/wiki/Polymorphism#typechecking-polymorphic-calls

The following type spec from gradualizer_db uses a type variables (K) as a bound for another type variable (Key): -spec add_entries_to_map([{Key, Value}], #{K => V}) -> #{K => V} when Key :: K, Value :: V. Until now, we converted all Ks and Vs to rigid vars, but then Key and Value got replaced to plain vars K and S in solve_bounds/2. We must therefore convert the type variables to rigid type variables *after* the call to solve_bounds/2, which is what this commit does.

Flexible type variables may now appear only when typechecking a polymorphic call, in which case they are only in the type of that polymorphic function. No other expression may of be a type that contains a flexible type variable. Constraints are thus needed only in the subtype_with_constraints/3 function that gets called only from type_check_poly_call/4. This means that nothing but the subtyping functions and the type_check_poly_call/4 function needs to deal with constraints. Therefore, I am removing constraints from all the other places. I believe this is a great simplification of the code. It also trivially solves all the TODOs like "Don't drop the constraints". In such a huge refactoring, Gradualizer has proved to be an invaluable help. Via self-gradualization, it discovered numerous errors that I didn't notice. Hopefully, there are no undetected errors left. All tests are passing, the self-gradualization does not crash and 100_000 proptests have also passed.

And also remove the assert_type uses that were needed because of this error.

This quite a trivial change enables us to typecheck polymorphic functions like -spec generic_hd([A,...] | {A, any()}) -> A. or functions from Elixir's Enum module that use protocol polymorphism in addition to the parametric polymorphism: -spec enum_map([A] | map(), fun ((A) -> B)) -> [B]. which is really handy for Gradient and other Elixir frontends! Support for intersection polymorphic functions like -spec generic_hd([A,...]) -> A; ({A, any()}) -> A. is of course still missing. But at least some of these can now be typechecked when using a spec with the unions.

Since now we do not generate type variables, they don't need to be strings anymore.

It is already there since December 2022 (unless "subtype polymorphism" means bounded quantification).

They probably originate from the `fmt_location` description that uses hyphens to list possible values.

Removes the hyphens and add `--solve_constraints` flag.

lists:uniq/1 was introduced in OTP 25.

xxdavid · 2024-05-26T12:32:26Z

Thank you very much for the review! :) I've fixed all the issues and resolved the threads.

Can I merge this pull request now? Or would you also like to take a look, @zuiderkwast?

zuiderkwast

I don't have time to review properly, but I agree it's impressive.

zuiderkwast · 2024-05-26T12:44:47Z

src/typechecker.erl

 subtype(Ty1, Ty2, Env) ->
+    case subtype_with_constraints(Ty1, Ty2, Env) of
+        {true, _Cs} ->
+            %% There shouldn't be any constraint really.
+            %% We could assert that the constraints are empty but this could crash
+            %% Gradualizer even for end users if there was some error in our code,
+            %% it's probably safer not to assert and just drop the constraints
+            true;


Debug logging in verbose mode, would that be useful?

xxdavid added 4 commits April 19, 2024 15:32

Allow subst_ty/2 to take empty substitutions

16ad12e

The function allows it, only the spec didn't.

Document a known problem related to different normalization levels

8d70850

zuiderkwast assigned zuiderkwast and unassigned zuiderkwast May 3, 2024

erszcz approved these changes May 24, 2024

View reviewed changes

xxdavid added 13 commits May 26, 2024 14:25

Fix newly found self-check errors

35ab4fe

Fix type inference of try ... of expressions

b10b5fb

And also remove the assert_type uses that were needed because of this error.

Make type variables only atoms again

7d45b1b

Since now we do not generate type variables, they don't need to be strings anymore.

Remove code that was due to the constraint solver limitations

31caf2d

Remove the remark on missing subtype polymorpism support

a912cef

It is already there since December 2022 (unless "subtype polymorphism" means bounded quantification).

Make the --solve_constraints flag description a bit clearer

66bd8b5

Remove hyphens from flag descriptions

cad1047

They probably originate from the `fmt_location` description that uses hyphens to list possible values.

Update CLI synopsis in the README

8efca04

Removes the hyphens and add `--solve_constraints` flag.

Use lists:usort/1 instead of lists:uniq/1 on OTP < 25

aac8000

lists:uniq/1 was introduced in OTP 25.

xxdavid force-pushed the local_type_inference branch from 1ea043c to aac8000 Compare May 26, 2024 12:27

zuiderkwast approved these changes May 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Local Type Inference approach to typecheck polymorphic calls #564

Use Local Type Inference approach to typecheck polymorphic calls #564

xxdavid commented May 3, 2024

xxdavid commented May 23, 2024

erszcz commented May 23, 2024

xxdavid commented May 23, 2024

erszcz left a comment

xxdavid commented May 26, 2024

zuiderkwast left a comment

zuiderkwast May 26, 2024

Use Local Type Inference approach to typecheck polymorphic calls #564

Are you sure you want to change the base?

Use Local Type Inference approach to typecheck polymorphic calls #564

Conversation

xxdavid commented May 3, 2024

xxdavid commented May 23, 2024

erszcz commented May 23, 2024

xxdavid commented May 23, 2024

erszcz left a comment

Choose a reason for hiding this comment

xxdavid commented May 26, 2024

zuiderkwast left a comment

Choose a reason for hiding this comment

zuiderkwast May 26, 2024

Choose a reason for hiding this comment