Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing dependencies and extra info to optimizer step docs #15054

Draft
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

cameel
Copy link
Member

@cameel cameel commented Apr 25, 2024

Depends on #15053. Merged.

The current optimizer docs are pretty incomplete when it comes to documenting prerequisites and recommended steps to run before/after. I need up to date info to verify that the new sequence (#15030) satisfies these requirements so I went over the docs and the docstrings of the steps in the code, compared them and amended the docs where necessary.

I also compiled a table listing them, which is what I will actually refer to when analyzing the sequence - doing it by reading the docs is too unwieldy.

@cameel cameel added documentation 📖 optimizer has dependencies The PR depends on other PRs that must be merged first labels Apr 25, 2024
@cameel cameel self-assigned this Apr 25, 2024
@cameel cameel marked this pull request as draft April 25, 2024 17:12
Comment on lines +174 to +179
=========================================== ===================================================== =============================================================== ===========================================
Step Hard prerequisites Steps that should run before Steps that should run after
=========================================== ===================================================== =============================================================== ===========================================
:ref:`a <ssa-transform>` Disambiguator, ForLoopInitRewriter ExpressionSplitter, CommonSubexpressionEliminator UnusedAssignEliminator
:ref:`C <conditional-simplifier>` Disambiguator SSATransform, DeadCodeEliminator
:ref:`c <common-subexpression-eliminator>` Disambiguator, ForLoopInitRewriter ExpressionSplitter, SSATransform UnusedPruner, UnusedAssignEliminator
Copy link
Member Author

@cameel cameel Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I wanted to just add the extra info to the table we already have in the optimizer docs, but our current CSS only allows very narrow tables. I even had to cut full names here to avoid the scrollbar hiding most of the info.

The best compromise I found was to create a new table in the cheatsheet. I don't like it very much but I like other options even less. I could maybe put it in the optimizer docs, but having two tables side by side seems weird. This is the kind of summarized information that actually fits the purpose of the cheatsheet, it's just that this is not the info users will be reaching for often, it will be mostly useful to us.

The nice thing is that here I can sort the steps alphabetically by their abbreviations, which is much more useful than the original order when decoding the sequences.

Copy link
Member

@ekpyron ekpyron May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you have the large comment exactly on the first line I wanted to comment on :-D...
Why the SSATransform here? There could be different implementations for the data flow analyzer, which might mean that pseudo-SSA form helps - but currently, if anything, it may get more slower due to larger numbers of variables, won't it?

EDIT: Ah, it's in the previous docs about it. And I see the point made there - not sure that makes much of a difference in practice and in almost all cases having ever run an SSA transform will do that (rather than having to be in pseudo-SSA form), but alright.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this table is not new information. It only summarizes info already available in step docs (in some cases amended with some extra info I pulled from the docstrings). This is what it says currently:

This step is especially efficient if the ExpressionSplitter is run
before. If the code is in pseudo-SSA form,
the values of variables are available for a longer time and thus we
have a higher chance of expressions to be replaceable.

So you're basically saying that there's some nuance missing in the current docs - because they do make it sound like you might want to use SSATransform here. In that case I'll try to adjust them based on what you said and then remove this from the table.

Now you have the large comment exactly on the first line I wanted to comment on :-D...

You know that you can make multiple separate review comments for the same line, right?

@@ -411,6 +411,8 @@ its visibility and it is impossible to reference variables defined in a differen
The benefit of this stage is that function definitions can be looked up more easily
and functions can be optimized in isolation without having to traverse the AST completely.

Prerequisites: Disambiguator.
Copy link
Member Author

@cameel cameel Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that not all steps list disambiguator as a prerequisite even though its own description above says that all of them depend on it.

BTW, not sure if listing prerequisites is all that useful since we have very few hard prerequisites and those are hard-coded anyway. Maybe it would be fine to remove this info. For now though we do list that info so I amended it for steps where it was available in the docstrings.


Prerequisites: Disambiguator.

Important: Can only be used on EVM code.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about usefulness of this either now that we only have an EVM backend, but since other steps already list this kind of info (e.g. Important: Introduces EVM opcodes and thus can only be used on EVM code for now for ControlFlowSimplifier) I added it where it was available for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code only has that, because we haven't removed all of it when removing the EWASM parts :-).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, this still info makes some sense unless we know for sure that there will never be backends other than EVM. I suspect it's incomplete so it may be not very useful even then, but I don't know for sure so I didn't want to remove it without some confirmation.

Comment on lines +1021 to +1028
Works best on code without literal arguments, which is the case when the code is in the
:ref:`expression-split <expression-splitter>` form.
Copy link
Member Author

@cameel cameel Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based this addition on the docstrings, which says this:

* Works best if the code is in SSA form - without literal arguments.

Since SSATransform on its own does nothing about literal arguments, this basically means that that ExpressionSplitter is recommended before, right?

:ref:`o <for-loop-init-rewriter>` Disambiguator
:ref:`p <unused-function-parameter-pruner>` Disambiguator, FunctionHoister LiteralRematerialiser ExpressionInliner
:ref:`r <unused-assign-eliminator>` Disambiguator, ForLoopInitRewriter
:ref:`S <unused-store-eliminator>` Disambiguator, ForLoopInitRewriter SSATransform
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion we had with @nikola-matic in #14779 (comment), I think that CommonSubexpressionEliminator should be recommended before UnusedStoreEliminator. Anyone can confirm/deny?

Comment on lines 1168 to 1175
The inliner should be run afterwards to make sure that all references to ``f2`` are replaced by
``f``.
Copy link
Member Author

@cameel cameel Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this means ExpressionInliner, because it inlines functions of exactly this form that this step introduces. Is that correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's just coincidence here - if you have a function with three return arguments and only one of them is unused, so two remain, only the full inliner can inline this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That's very useful to know. I'll update the docs here.

@cameel cameel changed the title Optimizer docs add missing dependencies and info Add missing dependencies and extra info to optimizer step docs Apr 25, 2024
@cameel cameel force-pushed the optimizer-docs-style-cleanup branch from af0af5c to 125225c Compare April 27, 2024 12:06
@cameel cameel force-pushed the optimizer-docs-add-missing-dependencies-and-info branch 2 times, most recently from a7498de to cd9c124 Compare April 27, 2024 19:54
@cameel cameel force-pushed the optimizer-docs-style-cleanup branch from 125225c to 6702af2 Compare April 27, 2024 19:54
@cameel cameel force-pushed the optimizer-docs-add-missing-dependencies-and-info branch from cd9c124 to 9747d70 Compare April 27, 2024 20:02

To avoid unnecessary rewriting, it is recommended to run this step after StructuralSimplifier.

May destroy the :ref:`expression-split <expression-splitter>` form.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inferred the info about which steps preserve the expression-split and SSA forms from the current docs. Please double check that I did not make a mistake here. Since most steps seem to preserve the forms, I used the convention of stating it only if the step can destroy it.

For example this step (ForLoopConditionIntoBody) may destroy expression-split form because it would transform something like this:

for { ... } iszero(a) { ... } {
    ...
}

into this:

for { ... } 1 { ... } {
    if iszero(iszero(a)) { break }
}

ExpressionSimplifier also seems to mostly preserve it but also has some rules that don't. For example the exp rule will transform this:

let _0 := 0
let a := calldataload(_0)
let b := not(_0)
let c := exp(b, a)
sstore(a, c)

into this:

let _0 := 0
let a := calldataload(_0)
let b := 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
let c := sub(iszero(and(a, 1)), and(a, 1))
sstore(a, c)

Comment on lines +1302 to +1329
During inlining, a heuristic is used to tell if the function call should be inlined or not.

A function call will never be inlined if:

- The function definition contains the ``leave`` opcode.
- The function is large and inlining is not *aggressive* (details below).
- The function definition contains any direct, recursive calls to itself or the call being inlined is recursive.
Note that indirect recursive calls (going through other functions) do not disqualify the function from inlining.
- Argument expressions passed into the call can potentially have side effects.
This check is very broad, rejecting anything that is not simply a literal or an identifier, since
the condition is easy to satisfy by running ExpressionSplitter beforehand.

Otherwise, if any of the following conditions is met, the call is inlined unconditionally:

- The function is tiny.
- The function is only ever called once.

In the remaining cases the decision to inline or not depends on the size of the function.
The size used here is not the actual bytecode size (which is not known to the optimizer), but rather
a score calculated at the Yul level, which approximates that size.
Inlining is performed if the size does not exceed a predefined threshold, which is a little higher
if the inlining is aggressive and yet higher when the call has some constant arguments.

The aggressive inlining is performed when:

- The :ref:`yul-memoryguard` is present.
- The function is not recursive (not even indirectly).
- The legacy Yul->EVM transform is not used (i.e. the target is not the ``homestead`` :ref:`EVM version <evm-version>`).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a summary of what FullInliner::shallInline() does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think it is good to document things into this level of detail :-). There is little merit in it, it only leads to people relying on some behaviour we want to be able to change and it's a maintenance hassle.

Copy link
Member Author

@cameel cameel May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I needed to know multiple times in the past and each time I just looked at the code of shallInline(), but it's nowhere as clear as having it spelled out in the docs and I never had time to dive into details so I actually never fully understood how it really worked. For example, that the m_noInlineFunctions check is simply a check for functions using leave. Or how some indirect recursive calls may actually be subject to inlining (there are three separate checks for recursive calls and it's not obvious how they differ). This time I needed it again to figure out why a function was not being inlined in one of my snippets. It always felt like something that should have been documented somewhere.

I could put it the docstring or inside FullInliner::shallInline(), but I really don't see much problem with having it in the docs proper. It's our main resource to understand how the optimizer works, not just for users but also for us and that fully justifies some extra effort spent keeping it up to date.

At the same time I don't think the extra detail really hurts users. It does not go into tiny details of how the inliner is implemented. It just describes the heuristic, which is useful to know when you're optimizing your code. But even then it omits all the specifics that could easily change, like numbers or details of the metric. There are step descriptions that are much more detailed than that. So if you're worried about users relying on them - we already have that problem. I don't think it's a good reason to avoid documenting things. I think that we in general lean too hard on that and I think it's one of the reasons why the code is so hard to get into.

In any case, I feel very strongly that the inliner heuristic should be documented.

@ekpyron
Copy link
Member

ekpyron commented May 6, 2024

The main problem is: do you really want to guarantee that, assuming we satisfy these ordering requirements, arbitrary sequences don't lead to buggy cases :-)? We did miss one such case before and filed it as low severity bug, since custom sequences are rarely used and a "you-need-to-know-what-you-do" case reflected in lack of docs. If we document this in detail now, the situation changes - and we don't test any of this for correctness (and also can't - there's no way we can cover the space here with tests - we do fuzz this, but the fuzzer here can't catch everything...)

I'll in any case, still have a look and check if anything strikes me as odd since this affects #15030.

We could also consider documenting this, but clearly stating that custom sequences are "at your own risk" anyways - even though that'd be a bit weird to start doing now out of the blue :-)... not sure yet :-).

:ref:`u <unused-pruner>` Disambiguator
:ref:`V <ssa-reverser>` Disambiguator CommonSubexpressionEliminator, UnusedPruner
:ref:`v <equivalent-function-combiner>` Disambiguator, FunctionHoister UnusedPruner
:ref:`x <expression-splitter>` ForLoopConditionIntoBody
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it may be true, it's a stretch to assume that this doesn't depend on the Disambiguator - the idea is that all steps do and there's no value on trying to be more precise than that. The same goes for the block flattener and the function hoister and the for loop init rewriter - basically the hgfo sequence we have due to that hardcoded and non-configurable in the sequence.
There's also not much point in omitting them for some steps as hard dependency, since disabling them is not a configuration option anyways :-).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And: slightly overapproximating is fully harmless here - while understating may lead to damage :-).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the info about prerequisites is not something my PR introduces. It only fills in the missing bits from the docstrings and also organizes them in a table so that it's now easy to how spotty this is, but most of that info was already there. Anyone reading the docs now will already come away with an impression that some steps don't have this prerequisite, because otherwise, why would it be mentioned for some of them.

So I guess I should take it as a confirmation of my suggestion in #15054 (comment) that it would be better to remove this info instead and add a general note that all steps should be assumed to depend on hgfo?

Copy link
Member Author

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem is: do you really want to guarantee that, assuming we satisfy these ordering requirements, arbitrary sequences don't lead to buggy cases :-)?

I really don't see how you can think that this PR changes anything in that regard. This long and detailed page with descriptions of optimizer steps already exists. We even have documented options for supplying arbitrary sequences, we already mostly document prerequisites, etc. The PR merely makes the info more consistent and summarizes it in a table. It does add some more detailed info from docstrings, but not that much of it and some steps already have even more detailed descriptions.

We could also consider documenting this, but clearly stating that custom sequences are "at your own risk" anyways - even though that'd be a bit weird to start doing now out of the blue :-)... not sure yet :-).

Then we should just add such a warning. I think that the page is already more detailed than a typical user needs, but I don't think we should remove all the useful info from it. That would just hurt us. For me, for example, it's the primary resource I refer to when I need to understand how the optimizer works. Maybe it should not be a part of the main docs if that prevents us from documenting things, but having this info collected in one place like that is immensely useful. Thanks to this I know more about the optimizer than about some other components, e.g. the EVM transform. I wish we had even more pages about internals like that somewhere.

For the time being it's a part of the docs, so I think it should be kept up to date there until/if we decide to move it elsewhere.

:ref:`u <unused-pruner>` Disambiguator
:ref:`V <ssa-reverser>` Disambiguator CommonSubexpressionEliminator, UnusedPruner
:ref:`v <equivalent-function-combiner>` Disambiguator, FunctionHoister UnusedPruner
:ref:`x <expression-splitter>` ForLoopConditionIntoBody
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the info about prerequisites is not something my PR introduces. It only fills in the missing bits from the docstrings and also organizes them in a table so that it's now easy to how spotty this is, but most of that info was already there. Anyone reading the docs now will already come away with an impression that some steps don't have this prerequisite, because otherwise, why would it be mentioned for some of them.

So I guess I should take it as a confirmation of my suggestion in #15054 (comment) that it would be better to remove this info instead and add a general note that all steps should be assumed to depend on hgfo?

Comment on lines +174 to +179
=========================================== ===================================================== =============================================================== ===========================================
Step Hard prerequisites Steps that should run before Steps that should run after
=========================================== ===================================================== =============================================================== ===========================================
:ref:`a <ssa-transform>` Disambiguator, ForLoopInitRewriter ExpressionSplitter, CommonSubexpressionEliminator UnusedAssignEliminator
:ref:`C <conditional-simplifier>` Disambiguator SSATransform, DeadCodeEliminator
:ref:`c <common-subexpression-eliminator>` Disambiguator, ForLoopInitRewriter ExpressionSplitter, SSATransform UnusedPruner, UnusedAssignEliminator
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this table is not new information. It only summarizes info already available in step docs (in some cases amended with some extra info I pulled from the docstrings). This is what it says currently:

This step is especially efficient if the ExpressionSplitter is run
before. If the code is in pseudo-SSA form,
the values of variables are available for a longer time and thus we
have a higher chance of expressions to be replaceable.

So you're basically saying that there's some nuance missing in the current docs - because they do make it sound like you might want to use SSATransform here. In that case I'll try to adjust them based on what you said and then remove this from the table.

Now you have the large comment exactly on the first line I wanted to comment on :-D...

You know that you can make multiple separate review comments for the same line, right?


Prerequisites: Disambiguator.

Important: Can only be used on EVM code.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, this still info makes some sense unless we know for sure that there will never be backends other than EVM. I suspect it's incomplete so it may be not very useful even then, but I don't know for sure so I didn't want to remove it without some confirmation.

Comment on lines 1168 to 1175
The inliner should be run afterwards to make sure that all references to ``f2`` are replaced by
``f``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That's very useful to know. I'll update the docs here.

Comment on lines +1302 to +1329
During inlining, a heuristic is used to tell if the function call should be inlined or not.

A function call will never be inlined if:

- The function definition contains the ``leave`` opcode.
- The function is large and inlining is not *aggressive* (details below).
- The function definition contains any direct, recursive calls to itself or the call being inlined is recursive.
Note that indirect recursive calls (going through other functions) do not disqualify the function from inlining.
- Argument expressions passed into the call can potentially have side effects.
This check is very broad, rejecting anything that is not simply a literal or an identifier, since
the condition is easy to satisfy by running ExpressionSplitter beforehand.

Otherwise, if any of the following conditions is met, the call is inlined unconditionally:

- The function is tiny.
- The function is only ever called once.

In the remaining cases the decision to inline or not depends on the size of the function.
The size used here is not the actual bytecode size (which is not known to the optimizer), but rather
a score calculated at the Yul level, which approximates that size.
Inlining is performed if the size does not exceed a predefined threshold, which is a little higher
if the inlining is aggressive and yet higher when the call has some constant arguments.

The aggressive inlining is performed when:

- The :ref:`yul-memoryguard` is present.
- The function is not recursive (not even indirectly).
- The legacy Yul->EVM transform is not used (i.e. the target is not the ``homestead`` :ref:`EVM version <evm-version>`).
Copy link
Member Author

@cameel cameel May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I needed to know multiple times in the past and each time I just looked at the code of shallInline(), but it's nowhere as clear as having it spelled out in the docs and I never had time to dive into details so I actually never fully understood how it really worked. For example, that the m_noInlineFunctions check is simply a check for functions using leave. Or how some indirect recursive calls may actually be subject to inlining (there are three separate checks for recursive calls and it's not obvious how they differ). This time I needed it again to figure out why a function was not being inlined in one of my snippets. It always felt like something that should have been documented somewhere.

I could put it the docstring or inside FullInliner::shallInline(), but I really don't see much problem with having it in the docs proper. It's our main resource to understand how the optimizer works, not just for users but also for us and that fully justifies some extra effort spent keeping it up to date.

At the same time I don't think the extra detail really hurts users. It does not go into tiny details of how the inliner is implemented. It just describes the heuristic, which is useful to know when you're optimizing your code. But even then it omits all the specifics that could easily change, like numbers or details of the metric. There are step descriptions that are much more detailed than that. So if you're worried about users relying on them - we already have that problem. I don't think it's a good reason to avoid documenting things. I think that we in general lean too hard on that and I think it's one of the reasons why the code is so hard to get into.

In any case, I feel very strongly that the inliner heuristic should be documented.

@cameel cameel force-pushed the optimizer-docs-style-cleanup branch from 6702af2 to 9599d85 Compare May 8, 2024 15:17
@cameel cameel force-pushed the optimizer-docs-add-missing-dependencies-and-info branch from 9747d70 to 098075d Compare May 8, 2024 15:25
Base automatically changed from optimizer-docs-style-cleanup to develop May 8, 2024 15:57
@cameel cameel removed the has dependencies The PR depends on other PRs that must be merged first label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants