Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove superfluous initial iteration of repeating parts of optimizer sequence #14854

Merged
merged 1 commit into from May 10, 2024

Conversation

cameel
Copy link
Member

@cameel cameel commented Feb 14, 2024

The optimizer will repeat each bracketed segment of the optimizer step sequence at least twice, because the cost is only measured after the first iteration. This results in unnecessary repetitions in some cases.

Hard to tell how much it actually affects the optimization time in practice. From my observations, a single iteration is often insufficient so many contracts would not be affected. On the other hand, for the lucky ones where one iteration is enough, it could cut optimization time almost in half. I need to see CI timings to say more.

Looks like I unwittingly introduced this regression back in #12102. Originally this loop looked like this and I moved the check to the end without updating the initialization:

size_t codeSize = 0;
for (size_t rounds = 0; rounds < 12; ++rounds)
{
{
size_t newSize = CodeSize::codeSizeIncludingFunctions(ast);
if (newSize == codeSize)
break;
codeSize = newSize;
}

@cameel cameel self-assigned this Feb 14, 2024
@cameel
Copy link
Member Author

cameel commented Feb 14, 2024

Looks like the actual differences are minimal, if any, and go in both directions. For most jobs they are within the margin of error.

Job develop this PR
b_bytecode_ubu-via-ir-optimize 9m 39s 10m 12s
b_bytecode_osx-via-ir-optimize 4m 23s 4m 18s
t_native_test_ext_zeppelin 12m 17s 10m 49s
t_native_test_ext_brink 1m 21s 1m 40s
t_native_test_ext_elementfi 4m 41s 4m 16s
t_native_test_ext_ens 2m 25s 2m 41s
t_native_test_ext_perpetual_pools 18m 52s 19m 16s
t_native_test_ext_prb_math 4m 10s 4m 9s
t_native_test_ext_uniswap 9m 35s 9m 22s
t_native_test_ext_yield_liquidator 2m 26s 2m 36s

The only one that stands out is the OpenZeppelin external test, where it's a 12% reduction, though, this could still be just CI variance.

@cameel cameel changed the title Fix superfluous initial iteration of repeating parts of optimizer sequence Remove superfluous initial iteration of repeating parts of optimizer sequence Feb 14, 2024
@cameel cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch 2 times, most recently from 0a00253 to 14699de Compare February 14, 2024 19:02
@@ -38,7 +38,7 @@ contract test {
}
// ----
// constructor(), 20 wei ->
// gas irOptimized: 252642
// gas irOptimized: 260928
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change does affect some gas costs but very minimally. The only test where difference is somewhat significant is this one (and the similar gas_and_value_brace_syntax.sol) where it's ~3%.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at the diff of optimized yul? Not sure how well it'd compare, though...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also generally a problem that the code size metric used for this is merely a rather coarse estimate in itself...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. I'll check this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the very few gas changes in this PR - could it be that the first pass retains the same code size, whilst actually optimizing the code - i.e. altering it, and then only reduces it during the second iteration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the relevant part of the diff:

                                 revert(0, 0x24)
                             }
                             mstore(_2, _1)
+                            _4 := 0
                         }
                         let memPos := mload(_2)
                         mstore(memPos, 0x01)
                         return(memPos, 32)
                     }
                     case 0x96dfcbea {
-                        if callvalue() { revert(0, 0) }
-                        if slt(add(calldatasize(), not(3)), 0) { revert(0, 0) }
-                        let value_1 := and(sload(0), sub(shl(160, 1), 1))
-                        let _5 := mload(_2)
-                        mstore(_5, shl(228, 0x0f963393))
-                        let _6 := call(gas(), value_1, 0, _5, _3, _5, 32)
-                        if iszero(_6)
+                        if callvalue() { revert(_4, _4) }
+                        if slt(add(calldatasize(), not(3)), _4) { revert(_4, _4) }
+                        let value_1 := and(sload(_4), sub(shl(160, 1), 1))
+                        let _6 := mload(_2)
+                        mstore(_6, shl(228, 0x0f963393))
+                        let _7 := call(gas(), value_1, _4, _6, _3, _6, 32)
+                        if iszero(_7)
                         {
                             let pos_1 := mload(_2)
-                            returndatacopy(pos_1, 0, returndatasize())
+                            returndatacopy(pos_1, _4, returndatasize())
                             revert(pos_1, returndatasize())
                         }
-                        let expr := 0
-                        if _6
+                        let expr := _4
+                        if _7
                         {
-                            let _7 := 32
-                            if gt(32, returndatasize()) { _7 := returndatasize() }
-                            finalize_allocation(_5, _7)
-                            if slt(sub(add(_5, _7), _5), 32) { revert(0, 0) }
-                            let value_2 := mload(_5)
-                            if iszero(eq(value_2, iszero(iszero(value_2)))) { revert(0, 0) }
+                            let _8 := 32
+                            if gt(32, returndatasize()) { _8 := returndatasize() }
+                            finalize_allocation(_6, _8)
+                            if slt(sub(add(_6, _8), _6), 32) { revert(_4, _4) }
+                            let value_2 := mload(_6)
+                            if iszero(eq(value_2, iszero(iszero(value_2)))) { revert(_4, _4) }
                             expr := value_2
                         }
                         let var_myBal := selfbalance()

I.e. it seems that a bunch of 0 literals used to be materialized and now remain as variables.

Which is odd - I wanted to suggest maybe adding another T to the sequence but it's right there in the cleanup : fDnTOcmu.

@cameel cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 14699de to a2cabcb Compare February 14, 2024 21:08
Comment on lines -23 to 65
// f()
// f()
// f_72()
// f_73()
// sstore(0, 1)
// }
// function f()
// {
// let a := 1
// let b := 10
// let a_1 := calldataload(0)
// let _1 := iszero(a_1)
// for { } iszero(b) { b := add(b, not(0)) }
// {
// a := a_1
// mstore(a_1, 0)
// if _1 { leave }
// }
// }
// function f_72()
// {
// let a := 2
// let b := 10
// let a_1 := calldataload(0)
// let _1 := iszero(a_1)
// for { } iszero(b) { b := add(b, not(0)) }
// {
// a := a_1
// mstore(a_1, 0)
// if _1 { leave }
// }
// }
// function f_73()
// {
// let a := 3
// let b := 10
// let a := calldataload(0)
// let _1 := iszero(a)
// let a_1 := calldataload(0)
// let _1 := iszero(a_1)
// for { } iszero(b) { b := add(b, not(0)) }
// {
// mstore(a, 0)
// a := a_1
// mstore(a_1, 0)
// if _1 { leave }
// }
// }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the changes show up not only as gas differences. Two tests also got updated output.

In this case it looks like a clear downgrade. The optimizer is definitely leaving some possible optimizations on the table. Still, not sure how common this is in practice. I'm going to play with this when tweaking the optimizer sequence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the optimizer is mainly being rather stupid here - it specializes all three calls and then trusts that since the arguments are unused, the result will collapse to identical functions that can be deduplicated... the simpler choice would have been to not specialize in the first place.
For cases like this it's not worth preserving optimizations that work out by chance. Even if someone thought it funny to include it as test case :-).

libyul/optimiser/Suite.cpp Show resolved Hide resolved
@@ -38,7 +38,7 @@ contract test {
}
// ----
// constructor(), 20 wei ->
// gas irOptimized: 252642
// gas irOptimized: 260928
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the very few gas changes in this PR - could it be that the first pass retains the same code size, whilst actually optimizing the code - i.e. altering it, and then only reduces it during the second iteration?

@cameel cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch 2 times, most recently from 5a544bd to 5ea8fa2 Compare March 5, 2024 17:10
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Mar 20, 2024
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Mar 20, 2024
@ethereum ethereum deleted a comment from github-actions bot Mar 25, 2024
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Apr 9, 2024
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Apr 9, 2024
@cameel cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 5ea8fa2 to 55cb7a7 Compare April 18, 2024 08:07
@ethereum ethereum deleted a comment from github-actions bot Apr 18, 2024
@cameel cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 55cb7a7 to 481c3be Compare April 22, 2024 14:17
@cameel
Copy link
Member Author

cameel commented Apr 22, 2024

Gas diff stats

File name IR optimized Legacy optimized Legacy
functionCall/gas_and_value_brace_syntax.sol 3%
functionCall/gas_and_value_basic.sol 3%
externalContracts/deposit_contract.sol +0%
libraries/internal_types_in_library.sol +0%
userDefinedValueType/calldata.sol +0%
types/mapping/copy_from_mapping_to_mapping.sol +0%
array/copying/array_to_mapping.sol +0%
array/copying/array_elements_to_mapping.sol +0%
array/copying/nested_array_of_structs_memory_to_storage.sol +0%
array/copying/elements_of_nested_array_of_structs_memory_to_storage.sol +0%
various/skip_dynamic_types_for_structs.sol +0%
externalContracts/snark.sol +0%
array/copying/nested_array_of_structs_calldata_to_storage.sol +0%
array/copying/elements_of_nested_array_of_structs_calldata_to_storage.sol +0%
array/copying/storage_memory_packed_dyn.sol -0%
structs/copy_to_mapping.sol -0%
structs/copy_substructures_from_mapping.sol -0%
structs/copy_substructures_to_mapping.sol -0%
array/copying/calldata_array_to_mapping.sol -0%

@cameel
Copy link
Member Author

cameel commented Apr 22, 2024

External tests (ir-optimize-evm+yul)

project bytecode_size deployment_gas method_gas
brink 0%
colony 0%
elementfi +0.21% ❌
ens -0.5% ✅ -0% 0%
euler +0.01% ❌
gnosis
gp2 0%
perpetual-pools +0% +0% -0.02% ✅
pool-together -0.14% ✅
uniswap -0.01% ✅
yield_liquidator +0.01% ❌ +0.01% ❌ +0%
zeppelin -0.01% ✅ -0.04% ✅ -0.06% ✅

Compilation time

Project Before After Diff
brink 5 s 5 s 0 s
colony 105 s 117 s 12 s
elementfi 172 s 157 s -15 s
ens 40 s 39 s -1 s
euler 52 s 60 s 8 s
gnosis
gp2 42 s 46 s 4 s
perpetual-pools 81 s 77 s -4 s
pool-together 56 s 46 s -10 s
uniswap 82 s 73 s -9 s
yield_liquidator 26 s 38 s 12 s
zeppelin 261 s 245 s -16 s

@cameel
Copy link
Member Author

cameel commented Apr 22, 2024

Quick and dirty non-CI benchmark

Since CI timing is almost useless here due to a huge variance, I tried to benchmark it manually using a few projects that can be compiled right away via foundry.

script
function benchmark_foundry_project {
    local subdir="$1"
    local url="$2"
    local solc_path="$3"

    git clone "$url" "$subdir" --depth=1
    pushd "$subdir"
    forge install
    /usr/bin/time --output ../timing.json --append --format "{\"$subdir\": {\"real\": %e, \"user\": %U, \"sys\": %S}}" \
        forge build --use "$solc_path" --optimize --via-ir --evm-version cancun --offline --no-cache
    popd
    rm -rf "./${subdir}"
}

rm timing.json

solc_binary=/tmp/solc-develop
benchmark_foundry_project uniswap-before https://github.com/Uniswap/v4-core "$solc_binary"
benchmark_foundry_project seaport-before https://github.com/ProjectOpenSea/seaport-core "$solc_binary"

solc_binary=/tmp/solc-fix-superfluous-iterations-in-optimizer-sequence
benchmark_foundry_project uniswap-after https://github.com/Uniswap/v4-core "$solc_binary"
benchmark_foundry_project seaport-after https://github.com/ProjectOpenSea/seaport-core "$solc_binary"

cat timing.json | jq --slurp add

I wanted to include OpenZeppelin as well but it sadly fails with "Stack Too Deep" for both binaries (despite the memory guard being present). Same with a 0.8.25 release binary.

Results

Project run develop this PR
uniswap 1 308 s 280 s
uniswap 2 296 s 279 s
uniswap 3 289 s 277 s
uniswap avg 298 s 278 s
seaport 1 25 s 17 s
seaport 2 19 s 17 s
seaport 3 19 s 23 s
seaport avg 21 s 19 s

Looks like this has a lot of variance too, but there's still a clear and reproducible difference in favor of the PR on the order of 10%.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label May 7, 2024
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label May 7, 2024
@cameel cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 481c3be to 6223915 Compare May 7, 2024 12:52
@ethereum ethereum deleted a comment from github-actions bot May 7, 2024
ekpyron
ekpyron previously approved these changes May 7, 2024
Copy link
Member

@ekpyron ekpyron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if you say this is a 10% improvement in compilation time, let's just merge this for now.

@cameel cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 6223915 to a4b5fb0 Compare May 8, 2024 15:27
@cameel
Copy link
Member Author

cameel commented May 8, 2024

Yeah. It's not a large sample, but it was reproducible when I ran it several times (see the table) so I think it's legit.

BTW, I had to resolve a changelog conflict - please reapprove.

@cameel cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from a4b5fb0 to 2a40919 Compare May 9, 2024 06:35
@cameel cameel force-pushed the fix-superfluous-iterations-in-optimizer-sequence branch from 2a40919 to c9e2c20 Compare May 10, 2024 16:23
@cameel cameel enabled auto-merge May 10, 2024 16:23
@cameel cameel merged commit e202d30 into develop May 10, 2024
72 checks passed
@cameel cameel deleted the fix-superfluous-iterations-in-optimizer-sequence branch May 10, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants