New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove superfluous initial iteration of repeating parts of optimizer sequence #14854
Remove superfluous initial iteration of repeating parts of optimizer sequence #14854
Conversation
Looks like the actual differences are minimal, if any, and go in both directions. For most jobs they are within the margin of error.
The only one that stands out is the OpenZeppelin external test, where it's a 12% reduction, though, this could still be just CI variance. |
0a00253
to
14699de
Compare
@@ -38,7 +38,7 @@ contract test { | |||
} | |||
// ---- | |||
// constructor(), 20 wei -> | |||
// gas irOptimized: 252642 | |||
// gas irOptimized: 260928 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change does affect some gas costs but very minimally. The only test where difference is somewhat significant is this one (and the similar gas_and_value_brace_syntax.sol
) where it's ~3%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at the diff of optimized yul? Not sure how well it'd compare, though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also generally a problem that the code size metric used for this is merely a rather coarse estimate in itself...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. I'll check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the very few gas changes in this PR - could it be that the first pass retains the same code size, whilst actually optimizing the code - i.e. altering it, and then only reduces it during the second iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the relevant part of the diff:
revert(0, 0x24)
}
mstore(_2, _1)
+ _4 := 0
}
let memPos := mload(_2)
mstore(memPos, 0x01)
return(memPos, 32)
}
case 0x96dfcbea {
- if callvalue() { revert(0, 0) }
- if slt(add(calldatasize(), not(3)), 0) { revert(0, 0) }
- let value_1 := and(sload(0), sub(shl(160, 1), 1))
- let _5 := mload(_2)
- mstore(_5, shl(228, 0x0f963393))
- let _6 := call(gas(), value_1, 0, _5, _3, _5, 32)
- if iszero(_6)
+ if callvalue() { revert(_4, _4) }
+ if slt(add(calldatasize(), not(3)), _4) { revert(_4, _4) }
+ let value_1 := and(sload(_4), sub(shl(160, 1), 1))
+ let _6 := mload(_2)
+ mstore(_6, shl(228, 0x0f963393))
+ let _7 := call(gas(), value_1, _4, _6, _3, _6, 32)
+ if iszero(_7)
{
let pos_1 := mload(_2)
- returndatacopy(pos_1, 0, returndatasize())
+ returndatacopy(pos_1, _4, returndatasize())
revert(pos_1, returndatasize())
}
- let expr := 0
- if _6
+ let expr := _4
+ if _7
{
- let _7 := 32
- if gt(32, returndatasize()) { _7 := returndatasize() }
- finalize_allocation(_5, _7)
- if slt(sub(add(_5, _7), _5), 32) { revert(0, 0) }
- let value_2 := mload(_5)
- if iszero(eq(value_2, iszero(iszero(value_2)))) { revert(0, 0) }
+ let _8 := 32
+ if gt(32, returndatasize()) { _8 := returndatasize() }
+ finalize_allocation(_6, _8)
+ if slt(sub(add(_6, _8), _6), 32) { revert(_4, _4) }
+ let value_2 := mload(_6)
+ if iszero(eq(value_2, iszero(iszero(value_2)))) { revert(_4, _4) }
expr := value_2
}
let var_myBal := selfbalance()
I.e. it seems that a bunch of 0
literals used to be materialized and now remain as variables.
Which is odd - I wanted to suggest maybe adding another T
to the sequence but it's right there in the cleanup : fDnTOcmu
.
14699de
to
a2cabcb
Compare
// f() | ||
// f() | ||
// f_72() | ||
// f_73() | ||
// sstore(0, 1) | ||
// } | ||
// function f() | ||
// { | ||
// let a := 1 | ||
// let b := 10 | ||
// let a_1 := calldataload(0) | ||
// let _1 := iszero(a_1) | ||
// for { } iszero(b) { b := add(b, not(0)) } | ||
// { | ||
// a := a_1 | ||
// mstore(a_1, 0) | ||
// if _1 { leave } | ||
// } | ||
// } | ||
// function f_72() | ||
// { | ||
// let a := 2 | ||
// let b := 10 | ||
// let a_1 := calldataload(0) | ||
// let _1 := iszero(a_1) | ||
// for { } iszero(b) { b := add(b, not(0)) } | ||
// { | ||
// a := a_1 | ||
// mstore(a_1, 0) | ||
// if _1 { leave } | ||
// } | ||
// } | ||
// function f_73() | ||
// { | ||
// let a := 3 | ||
// let b := 10 | ||
// let a := calldataload(0) | ||
// let _1 := iszero(a) | ||
// let a_1 := calldataload(0) | ||
// let _1 := iszero(a_1) | ||
// for { } iszero(b) { b := add(b, not(0)) } | ||
// { | ||
// mstore(a, 0) | ||
// a := a_1 | ||
// mstore(a_1, 0) | ||
// if _1 { leave } | ||
// } | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the changes show up not only as gas differences. Two tests also got updated output.
In this case it looks like a clear downgrade. The optimizer is definitely leaving some possible optimizations on the table. Still, not sure how common this is in practice. I'm going to play with this when tweaking the optimizer sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, the optimizer is mainly being rather stupid here - it specializes all three calls and then trusts that since the arguments are unused, the result will collapse to identical functions that can be deduplicated... the simpler choice would have been to not specialize in the first place.
For cases like this it's not worth preserving optimizations that work out by chance. Even if someone thought it funny to include it as test case :-).
@@ -38,7 +38,7 @@ contract test { | |||
} | |||
// ---- | |||
// constructor(), 20 wei -> | |||
// gas irOptimized: 252642 | |||
// gas irOptimized: 260928 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the very few gas changes in this PR - could it be that the first pass retains the same code size, whilst actually optimizing the code - i.e. altering it, and then only reduces it during the second iteration?
5a544bd
to
5ea8fa2
Compare
5ea8fa2
to
55cb7a7
Compare
55cb7a7
to
481c3be
Compare
Gas diff stats
|
External tests (
|
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
brink | 0% |
||
colony | 0% |
||
elementfi | +0.21% ❌ |
||
ens | -0.5% ✅ |
-0% |
0% |
euler | +0.01% ❌ |
||
gnosis | |||
gp2 | 0% |
||
perpetual-pools | +0% |
+0% |
-0.02% ✅ |
pool-together | -0.14% ✅ |
||
uniswap | -0.01% ✅ |
||
yield_liquidator | +0.01% ❌ |
+0.01% ❌ |
+0% |
zeppelin | -0.01% ✅ |
-0.04% ✅ |
-0.06% ✅ |
Compilation time
Project | Before | After | Diff |
---|---|---|---|
brink | 5 s | 5 s | 0 s |
colony | 105 s | 117 s | 12 s |
elementfi | 172 s | 157 s | -15 s |
ens | 40 s | 39 s | -1 s |
euler | 52 s | 60 s | 8 s |
gnosis | |||
gp2 | 42 s | 46 s | 4 s |
perpetual-pools | 81 s | 77 s | -4 s |
pool-together | 56 s | 46 s | -10 s |
uniswap | 82 s | 73 s | -9 s |
yield_liquidator | 26 s | 38 s | 12 s |
zeppelin | 261 s | 245 s | -16 s |
Quick and dirty non-CI benchmarkSince CI timing is almost useless here due to a huge variance, I tried to benchmark it manually using a few projects that can be compiled right away via foundry. scriptfunction benchmark_foundry_project {
local subdir="$1"
local url="$2"
local solc_path="$3"
git clone "$url" "$subdir" --depth=1
pushd "$subdir"
forge install
/usr/bin/time --output ../timing.json --append --format "{\"$subdir\": {\"real\": %e, \"user\": %U, \"sys\": %S}}" \
forge build --use "$solc_path" --optimize --via-ir --evm-version cancun --offline --no-cache
popd
rm -rf "./${subdir}"
}
rm timing.json
solc_binary=/tmp/solc-develop
benchmark_foundry_project uniswap-before https://github.com/Uniswap/v4-core "$solc_binary"
benchmark_foundry_project seaport-before https://github.com/ProjectOpenSea/seaport-core "$solc_binary"
solc_binary=/tmp/solc-fix-superfluous-iterations-in-optimizer-sequence
benchmark_foundry_project uniswap-after https://github.com/Uniswap/v4-core "$solc_binary"
benchmark_foundry_project seaport-after https://github.com/ProjectOpenSea/seaport-core "$solc_binary"
cat timing.json | jq --slurp add I wanted to include OpenZeppelin as well but it sadly fails with "Stack Too Deep" for both binaries (despite the memory guard being present). Same with a 0.8.25 release binary. Results
Looks like this has a lot of variance too, but there's still a clear and reproducible difference in favor of the PR on the order of 10%. |
481c3be
to
6223915
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if you say this is a 10% improvement in compilation time, let's just merge this for now.
6223915
to
a4b5fb0
Compare
Yeah. It's not a large sample, but it was reproducible when I ran it several times (see the table) so I think it's legit. BTW, I had to resolve a changelog conflict - please reapprove. |
a4b5fb0
to
2a40919
Compare
…topping after the first cycle
2a40919
to
c9e2c20
Compare
The optimizer will repeat each bracketed segment of the optimizer step sequence at least twice, because the cost is only measured after the first iteration. This results in unnecessary repetitions in some cases.
Hard to tell how much it actually affects the optimization time in practice. From my observations, a single iteration is often insufficient so many contracts would not be affected. On the other hand, for the lucky ones where one iteration is enough, it could cut optimization time almost in half. I need to see CI timings to say more.
Looks like I unwittingly introduced this regression back in #12102. Originally this loop looked like this and I moved the check to the end without updating the initialization:
solidity/libyul/optimiser/Suite.cpp
Lines 84 to 92 in f35694f