Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Local Type Inference approach to typecheck polymorphic calls #564

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

xxdavid
Copy link
Collaborator

@xxdavid xxdavid commented May 3, 2024

Hi!

After months of working on this, I finally come up with a PR that implements the ideas captured in #553. It behaves in the same way as I described in #553 (comment), I only changed the second error message a tiny bit.

The main changes are in the commit b00cadd. It implements the new approach to typechecking polymorphic calls based on Local Type Inference. I think it is quite straightforward but I already know the algorithm well so I am probably biased. However, it should be at least much simpler than the original constraint solver. I tried to document it in the wiki (together with the rigid type variables), so you may read it as documentation to the code. I'm also writing a thesis about this change, so I will link it to the wiki when it's out, to provide another source of documentation.

The main benefit of this change should be error messages that are understandable even when polymorphism is involved. Only in a few cases, a more complex error message shows up but even then, it should be clear what's going on. I think that if this new approach proves successful, we can even enable --solve_constraints by default in the future.

As a byproduct, all the "shape limitations" of the original constraint solver are gone (this could have been also solved by fixing the constraint generation). A type variable can now be instantiated to both a tuple and an atom without any problems.

Other big changes are in a3029e1. In this commit, I remove the constraints and clauses for flexible type variables from most of the code. Constraints are now generated and propagated only in functions related to typechecking polymorphic calls and functions related to subtyping. This should make the code simpler and easier to read. It also trivially solves all but one "don't drop the constraints" TODOs because constraints are now combined only in the subtyping functions and in type_check_poly_call/4. (Please do not squash commits when merging the PR as this commit would make the resulting commit unreadable.)

I also fixed some things along the way (fc432f2, 2323f10, e9a0afa) and added a lot of tests.

While the new approach is better in some cases (like the minus/1 in poly_fail.erl), it is also weaker in some other cases. I see the biggest limitation in that the type of anonymous functions passed to polymorphic functions is now inferred via type_check_expr/2 and not type_check_expr_in/2 as before, and thus some errors may not be discovered. This can, however, be solved in the future by solving the constraints gathered from all other arguments first and then typechecking the anonymous function using the information from the solved constraints. But I think it is better to have understandable errors than just more of them. Once we have this, let's then focus on inferring the type of anonymous functions passed to polymorphic functions.

The function allows it, only the spec didn't.
Consider a function spec:
    -spec negate(bool()) -> bool();
                (integer()) -> integer().

Sometimes, we want to make a single function type out of these two clauses.
Using an outer union like fun((bool()) -> bool()) | fun((integer()) -> integer())
leads to a loss of precision, because this type says that we either have
a function that can take a bool, or we have a function that can take an integer
(and we do not know which case it is). This means that we cannot apply a function
of this type to anything unless we assert type or something like that.

But we know for sure that it can take both, bools and integers! Therefore, using
an inner union like fun((bool() | integer()) -> bool() | integer()) seems better
to me as this way we know that it can take both.

Of course, an ideal solution would be to use intersection types like
fun((bool()) -> bool()) & fun((integer()) -> integer()), but Erlang doesn't have
them in its syntax and so don't we in Gradualizer.
Union function types (`fun_ty_union`) do not seem to work. These simple
examples should be perfectly valid. I think it throws that error because
`expect_fun_type/3` inputs a list of function types (i.e., a clause),
containing only a single element (the union of two function types).
This leads to expecting an intersection function type later on
(which is wrong).

Note that these union function types aren't that much useful in practice
as when you want to call them, all your argument types must be subtypes
of parameter types of all the possible function types. That is, when
you have a function of the type
    fun((integer()) -> integer()) | fun((atom()) -> atom())
then you can only call it with `none()` as we don't know whether the
function will accept integers or bools and no other type is a subtype
of `integer()` and `atom()`.
@xxdavid
Copy link
Collaborator Author

xxdavid commented May 23, 2024

@zuiderkwast @erszcz I know you're busy, but do you think you'll have time for a code review?

@erszcz
Copy link
Collaborator

erszcz commented May 23, 2024

@xxdavid Sorry for the lack of responsiveness. Life got surprisingly busy the last few months. I can imagine it being quite frustrating given the terrific effort you've put into this work.

That being said, I've just gone through the wiki description - it's excellent!
I'll also go over the code of the PR, in the next 1-2 days, and try it out on some examples. Please bear with me.

@xxdavid
Copy link
Collaborator Author

xxdavid commented May 23, 2024

No need to apologize! :) It's a volunteer project after all and I can imagine life getting busy.

Thank you for your kind words and for planning to go over it. I'll stay tuned.

Copy link
Collaborator

@erszcz erszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a fantastic piece of work, it's truly impressive!

I've left a few superficial comments, there's nothing serious I could find.

I've also tried it out one some examples and the new and improved error messages are great!

src/gradualizer_fmt.erl Outdated Show resolved Hide resolved
src/constraints.hrl Show resolved Hide resolved
src/gradualizer_lib.erl Show resolved Hide resolved
src/typechecker.erl Outdated Show resolved Hide resolved
src/typechecker.erl Outdated Show resolved Hide resolved
src/typechecker.erl Outdated Show resolved Hide resolved
src/typechecker.erl Outdated Show resolved Hide resolved
xxdavid added 13 commits May 26, 2024 14:25
We use the Local Type Argument Synthesis algorithm from
https://www.cis.upenn.edu/~bcpierce/papers/lti-toplas.pdf
(a bit extended for our use case).

This replaces the global constraint generation and constraint
solver. It should result in a much more precise type error
messages (pinpointing the exact location, not just the
function) and more streamlined code (refactor will be made
in a subsequent commit).

For more details, see the original paper or
https://github.com/josefs/Gradualizer/wiki/Polymorphism#typechecking-polymorphic-calls
The following type spec from gradualizer_db uses a type variables (K)
as a bound for another type variable (Key):

    -spec add_entries_to_map([{Key, Value}], #{K => V}) -> #{K => V}
                                             when Key :: K, Value :: V.

Until now, we converted all Ks and Vs to rigid vars, but then Key
and Value got replaced to plain vars K and S in solve_bounds/2.

We must therefore convert the type variables to rigid type variables
*after* the call to solve_bounds/2, which is what this commit does.
Flexible type variables may now appear only when typechecking a polymorphic call,
in which case they are only in the type of that polymorphic function. No other
expression may of be a type that contains a flexible type variable. Constraints
are thus needed only in the subtype_with_constraints/3 function that gets called
only from type_check_poly_call/4. This means that nothing but the subtyping
functions and the type_check_poly_call/4 function needs to deal with constraints.
Therefore, I am removing constraints from all the other places.

I believe this is a great simplification of the code. It also trivially solves
all the TODOs like "Don't drop the constraints".

In such a huge refactoring, Gradualizer has proved to be an invaluable help.
Via self-gradualization, it discovered numerous errors that I didn't notice.
Hopefully, there are no undetected errors left. All tests are passing, the
self-gradualization does not crash and 100_000 proptests have also passed.
And also remove the assert_type uses that were needed because of this error.
This quite a trivial change enables us to typecheck polymorphic
functions like

    -spec generic_hd([A,...] | {A, any()}) -> A.

or functions from Elixir's Enum module that use protocol
polymorphism in addition to the parametric polymorphism:

    -spec enum_map([A] | map(), fun ((A) -> B)) -> [B].

which is really handy for Gradient and other Elixir frontends!

Support for intersection polymorphic functions like

    -spec generic_hd([A,...]) -> A;
                    ({A, any()}) -> A.

is of course still missing. But at least some of these can now
be typechecked when using a spec with the unions.
Since now we do not generate type variables,
they don't need to be strings anymore.
It is already there since December 2022 (unless "subtype polymorphism"
means bounded quantification).
They probably originate from the `fmt_location` description
that uses hyphens to list possible values.
Removes the hyphens and add `--solve_constraints` flag.
lists:uniq/1 was introduced in OTP 25.
@xxdavid
Copy link
Collaborator Author

xxdavid commented May 26, 2024

Thank you very much for the review! :) I've fixed all the issues and resolved the threads.

Can I merge this pull request now? Or would you also like to take a look, @zuiderkwast?

Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to review properly, but I agree it's impressive.

Comment on lines 186 to +193
subtype(Ty1, Ty2, Env) ->
case subtype_with_constraints(Ty1, Ty2, Env) of
{true, _Cs} ->
%% There shouldn't be any constraint really.
%% We could assert that the constraints are empty but this could crash
%% Gradualizer even for end users if there was some error in our code,
%% it's probably safer not to assert and just drop the constraints
true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug logging in verbose mode, would that be useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants