Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[seqbench] the-good-parts-mk3 sequence #15030

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

cameel
Copy link
Member

@cameel cameel commented Apr 15, 2024

Related to #14406
Depends on #15023 (only for benchmarking). Merged

This is a variant I built by addressing regressions that were still present in the-good-parts-mk2 compared to the default sequence. It does not contain all the fixes - some of them make benchmarks worse, especially in external tests. I chose only ones that were positive (or at least neutral while removing some regressions):

Those are the modifications I ultimately rejected:

@cameel cameel added optimizer has dependencies The PR depends on other PRs that must be merged first labels Apr 15, 2024
@cameel cameel self-assigned this Apr 15, 2024
@cameel cameel changed the title [seqbench] the-good-parts-mk3 sequence [seqbench] the-good-parts-mk3 sequence Apr 15, 2024
@cameel
Copy link
Member Author

cameel commented Apr 15, 2024

gas_diff_stats

File name IR optimized Legacy optimized Legacy
array/create_memory_array.sol 9%
array/array_storage_push_empty_length_address.sol 8%
externalContracts/base64.sol 8% -0%
array/array_storage_index_boundary_test.sol 7%
array/delete/bytes_delete_element.sol 7% -0%
array/copying/cleanup_during_multi_element_per_slot_copy.sol 6%
array/copying/storage_memory_packed_dyn.sol 5%
array/copying/array_copy_cleanup_uint40.sol 5%
abiEncoderV2/calldata_array.sol 3%
inheritance/inherited_function_calldata_memory_interface.sol 3%
abiEncoderV2/abi_encode_calldata_slice.sol 3% -0%
abiEncoderV1/abi_encode_calldata_slice.sol 3% -0%
array/array_storage_push_empty.sol 2%
array/array_storage_push_pop.sol 2%
array/array_storage_index_zeroed_test.sol 2%
isoltestTesting/balance_other_contract.sol 2%
externalContracts/prbmath_unsigned.sol 2%
array/dynamic_arrays_in_storage.sol 2%
externalContracts/FixedFeeRegistrar.sol 1% +0%
various/erc20.sol 1% +0%
abiEncoderV2/abi_encode_v2_in_function_inherited_in_v1_contract.sol 1% -0%
events/event_indexed_string.sol 1% -0%
array/fixed_arrays_as_return_type.sol 1% -0%
various/selfdestruct_pre_cancun_multiple_beneficiaries.sol 1%
various/selfdestruct_post_cancun_multiple_beneficiaries.sol 1%
userDefinedValueType/erc20.sol 1% +0%
array/copying/array_of_struct_calldata_to_storage.sol 1%
array/copying/array_copy_clear_storage.sol 1%
array/copying/memory_dyn_2d_bytes_to_storage.sol 1% -0%
constructor/bytes_in_constructors_packer.sol +0% 1%
events/event_dynamic_array_storage_v2.sol 1% +0%
events/event_dynamic_array_storage.sol 1% +0%
array/array_storage_index_access.sol 1%
externalContracts/prbmath_signed.sol 1%
array/push/push_no_args_bytes.sol 1%
various/value_complex.sol 1%
various/value_insane.sol +0%
salted_create/salted_create_with_value.sol +0%
byte_array_to_storage_cleanup.sol 1% -0%
array/push/array_push.sol +0%
structs/memory_structs_nested_load.sol +0%
externalContracts/snark.sol +0%
array/copying/array_of_struct_memory_to_storage.sol +0%
array/copying/storage_memory_nested_from_pointer.sol +0%
array/copying/storage_memory_nested.sol +0%
various/selfdestruct_post_cancun_redeploy.sol +0%
various/selfdestruct_pre_cancun_redeploy.sol +0%
array/copying/array_copy_storage_storage_different_base.sol +0%
array/copying/copy_byte_array_to_storage.sol +0% -0%
array/copying/array_copy_different_packing.sol +0%
externalContracts/deposit_contract.sol +0% +0%
structs/struct_memory_to_storage_function_ptr.sol +0%
array/copying/array_copy_storage_storage_dynamic_dynamic.sol +0%
structs/struct_delete_storage_with_array.sol +0%
array/push/nested_bytes_push.sol +0% -0%
array/constant_var_as_array_length.sol +0% +0%
array/copying/array_of_structs_containing_arrays_calldata_to_storage.sol +0%
array/dynamic_multi_array_cleanup.sol +0%
structs/struct_copy_via_local.sol +0%
abiEncoderV1/struct/struct_storage_ptr.sol +0%
libraries/using_library_mappings_public.sol +0%
array/fixed_arrays_in_constructors.sol +0%
array/push/array_push_struct.sol +0%
array/copying/nested_array_element_storage_to_storage.sol +0%
viaYul/copy_struct_invalid_ir_bug.sol +0%
userDefinedValueType/calldata.sol +0% -0%
array/copying/array_copy_storage_to_memory_nested.sol +0% +0%
abiEncoderV2/abi_encode_v2.sol +0% +0%
storage/packed_storage_structs_bytes.sol +0%
array/copying/calldata_array_dynamic_to_storage.sol +0% -0%
array/copying/bytes_storage_to_storage.sol +0% -0%
array/dynamic_array_cleanup.sol +0%
immutable/use_scratch.sol +0%
array/copying/array_nested_calldata_to_storage.sol +0%
constructor/constructor_static_array_argument.sol +0% -0%
events/event_dynamic_nested_array_storage_v2.sol +0% +0%
various/destructuring_assignment.sol +0% -0%
array/copying/array_copy_target_simple.sol +0%
functionCall/creation_function_call_with_salt.sol +0%
array/copying/array_copy_storage_storage_different_base_nested.sol +0%
array/push/array_push_struct_from_calldata.sol +0% -0%
array/copying/array_copy_including_array.sol +0%
array/copying/storage_memory_nested_struct.sol +0%
structs/copy_struct_array_from_storage.sol +0%
functionCall/creation_function_call_with_args.sol +0%
array/copying/array_nested_memory_to_storage.sol +0%
array/copying/nested_dynamic_array_element_calldata_to_storage.sol +0%
array/fixed_array_cleanup.sol +0%
array/copying/array_copy_storage_storage_static_dynamic.sol +0%
array/copying/elements_of_nested_array_of_structs_calldata_to_storage.sol +0%
array/copying/nested_array_of_structs_calldata_to_storage.sol +0%
abiencodedecode/abi_decode_simple_storage.sol +0% -0%
structs/calldata/calldata_struct_with_nested_array_to_storage.sol +0% +0%
array/bytes_length_member.sol +0%
array/reusing_memory.sol +0%
array/pop/array_pop_array_transition.sol +0%
types/mapping/copy_from_mapping_to_mapping.sol +0% +0%
array/copying/array_of_structs_containing_arrays_memory_to_storage.sol +0%
array/push/array_push_nested_from_calldata.sol +0% -0%
abiEncoderV1/abi_decode_v2_storage.sol +0% +0%
array/copying/nested_array_of_structs_memory_to_storage.sol +0%
array/push/push_no_args_2d.sol +0%
array/copying/array_storage_multi_items_per_slot.sol +0%
structs/struct_copy.sol +0%
abiEncoderV2/calldata_overlapped_dynamic_arrays.sol +0% -0%
array/copying/copy_function_internal_storage_array.sol +0%
array/copying/array_copy_storage_storage_dyn_dyn.sol +0%
various/staticcall_for_view_and_pure.sol -0%
array/array_storage_length_access.sol -0%
calldata/copy_from_calldata_removes_bytes_data.sol -0%
array/copying/array_copy_storage_storage_static_static.sol -0%
array/copying/bytes_inside_mappings.sol -0%
array/copying/nested_array_of_structs_storage_to_storage.sol -0%
array/invalid_encoding_for_storage_byte_array.sol -0% -0%
array/copying/array_of_function_external_storage_to_storage_dynamic_different_mutability.sol -0% +0%
constructor/no_callvalue_check.sol -0%
array/copying/copy_byte_array_in_struct_to_storage.sol -0% -0%
array/copying/array_copy_nested_array.sol -0% -0%
array/array_memory_index_access.sol -0%
array/copying/copy_removes_bytes_data.sol -0%
various/many_subassemblies.sol -0%
array/copying/elements_of_nested_array_of_structs_memory_to_storage.sol -0%
immutable/multi_creation.sol -0%
constructor/bytes_in_constructors_unpacker.sol -0% +0%
array/copying/array_of_function_external_storage_to_storage_dynamic.sol -0% -0%
array/copying/storage_memory_nested_bytes.sol -0% +0%
structs/struct_containing_bytes_copy_and_delete.sol -0% -0%
array/copying/array_copy_storage_storage_struct.sol -0%
array/copying/function_type_array_to_storage.sol -0% -0%
abiEncoderV2/storage_array_encoding.sol -0% -0%
structs/copy_substructures_from_mapping.sol -0% +0%
various/skip_dynamic_types_for_structs.sol -0% -0%
array/copying/array_copy_target_simple_2.sol -0%
storage/empty_nonempty_empty.sol -0% -0%
array/copying/array_to_mapping.sol -0% +0%
array/copying/array_elements_to_mapping.sol -0% +0%
array/copying/calldata_array_to_mapping.sol -0%
array/push/byte_array_push_transition.sol -0%
structs/copy_to_mapping.sol -0% +0%
structs/copy_from_mapping.sol -0% +0%
array/pop/byte_array_pop_long_storage_empty_garbage_ref.sol -0%
array/copying/array_copy_calldata_storage.sol -0% -0%
array/pop/array_pop_uint16_transition.sol -0%
array/copying/arrays_from_and_to_storage.sol -0%
libraries/internal_types_in_library.sol -0%
array/pop/array_pop_uint24_transition.sol -0%
functionCall/gas_and_value_brace_syntax.sol -0%
functionCall/gas_and_value_basic.sol -0%
structs/copy_substructures_to_mapping.sol -0% +0%
functionCall/external_call_to_nonexisting_debugstrings.sol -1% +0%
constructor/arrays_in_constructors.sol -1% +0%
structs/structs.sol -0%
functionCall/mapping_array_internal_argument.sol -0%
array/pop/byte_array_pop_masking_long.sol -0%
structs/conversion/recursive_storage_memory.sol -0%
externalContracts/strings.sol -0% +0%
libraries/using_library_mappings_return.sol -1%
array/copying/array_copy_target_leftover.sol -0% -0%
various/address_code.sol -0% -0%
array/byte_array_transitional_2.sol -1%
array/arrays_complex_from_and_to_storage.sol -1% +0%
various/create_calldata.sol -1%
functionCall/external_call_to_nonexisting.sol -1% 1%
array/pop/byte_array_pop_long_storage_empty.sol -1%
inlineAssembly/transient_storage_selfdestruct.sol -1%
array/function_array_cross_calls.sol -3% 1%
externalContracts/ramanujan_pi.sol -5%

@cameel
Copy link
Member Author

cameel commented Apr 15, 2024

Time benchmark

the-good-parts-mk3

test/benchmarks/run.sh modified to use the new sequence with solc 0.8.25 release binary.

File Pipeline Bytecode size Time Exit code
verifier.sol legacy 4899 bytes 0.14 s 0
verifier.sol via-ir 4324 bytes 0.35 s 0
OptimizorClub.sol legacy 0 bytes 0.33 s 1
OptimizorClub.sol via-ir 22175 bytes 1.79 s 0
chains.sol legacy 5869 bytes 0.11 s 0
chains.sol via-ir 21401 bytes 10.72 s 0

Compared to the-good-parts-mk2, timing is pretty much identical. Bytecode size differences are in single digits (but in favor of mk2).

default sequence for reference

File Pipeline Bytecode size Time Exit code
verifier.sol legacy 4874 bytes 0.15 s 0
verifier.sol via-ir 4351 bytes 0.58 s 0
OptimizorClub.sol legacy 0 bytes 0.47 s 1
OptimizorClub.sol via-ir 22193 bytes 3.68 s 0
chains.sol legacy 5845 bytes 0.16 s 0
chains.sol via-ir 23043 bytes 21.72 s 0

@cameel
Copy link
Member Author

cameel commented Apr 15, 2024

External test benchmark diff

CI run on timing-info-in-external-test-report vs CI run on seqbench-sequence-the-good-parts-mk3

ir-no-optimize

project bytecode_size deployment_gas method_gas
brink 0%
colony 0%
elementfi 0%
ens 0%
euler
gnosis
gp2 0%
perpetual-pools 0% 0% -0.01% ✅
pool-together 0%
uniswap 0%
yield_liquidator 0% -0% 0%
zeppelin

ir-optimize-evm+yul

project bytecode_size deployment_gas method_gas
brink +0.61% ❌
colony +0.13% ❌
elementfi -1.87% ✅
ens -1.2% ✅ -2.96% ✅ -0.01% ✅
euler +1% ❌
gnosis
gp2 +0.5% ❌
perpetual-pools -0.23% ✅ -0.52% ✅ +0.02% ❌
pool-together -1.29% ✅
uniswap +1.67% ❌
yield_liquidator +0.84% ❌ -0.2% ✅ +0.14% ❌
zeppelin -0.48% ✅ -1.3% ✅ -0.01% ✅

ir-optimize-evm-only

project bytecode_size deployment_gas method_gas
brink 0%
colony 0%
elementfi 0%
ens 0% 0% 0%
euler
gnosis
gp2 0%
perpetual-pools 0% +0% -0%
pool-together 0%
uniswap 0%
yield_liquidator 0% +0% 0%
zeppelin 0%

legacy-no-optimize

project bytecode_size deployment_gas method_gas
brink 0%
colony 0%
elementfi 0%
ens 0%
euler 0%
gnosis 0%
gp2 0%
perpetual-pools 0% -0% +0.01% ❌
pool-together 0%
uniswap 0%
yield_liquidator 0% +0% 0%
zeppelin 0% +0% -0.06% ✅

legacy-optimize-evm+yul

project bytecode_size deployment_gas method_gas
brink -0.03% ✅
colony +0.13% ❌
elementfi +0.11% ❌
ens +0.21% ❌ -0.01% ✅ -0%
euler +0.16% ❌
gnosis +0.19% ❌
gp2 +0.33% ❌
perpetual-pools -0.01% ✅ +0.01% ❌ +0.01% ❌
pool-together +0.04% ❌
uniswap +0.04% ❌
yield_liquidator +0.06% ❌ +0.02% ❌ -0.01% ✅
zeppelin +0.17% ❌ +0.18% ❌ +0.02% ❌

legacy-optimize-evm-only

project bytecode_size deployment_gas method_gas
brink 0%
colony 0%
elementfi 0%
ens 0% 0% 0%
euler 0%
gnosis 0%
gp2 0%
perpetual-pools 0% +0% -0%
pool-together 0%
uniswap 0%
yield_liquidator 0% +0% 0%
zeppelin 0% -0% +0.13% ❌

!V = version mismatch
!B = no value in the "before" version
!A = no value in the "after" version
!T = one or both values were not numeric and could not be compared
-0 = very small negative value rounded to zero
+0 = very small positive value rounded to zero

@cameel
Copy link
Member Author

cameel commented Apr 15, 2024

External test compilation time

Test Time (default) Time (mk2) Time (mk3) Change (mk3)
brink 5 s 4 s 4 s -20%
colony 110 s 97 s 117 s +6%
elementfi 168 s 105 s 98 s -42%
ens 41 s 26 s 32 s -22%
euler 64 s 36 s 42 s -34%
gp2 40 s 32 s 35 s -12%
perpetual-pools 77 s 47 s 65 s -16%
pool-together 83 s 36 s 31 s -63%
uniswap 105 s 56 s 49 s -53%
yield_liquidator 30 s 18 s 22 s -27%
zeppelin 263 s 160 s 139 s -47%

Table includes also timing of the default sequence (timing-info-in-external-test-report branch) and the-good-parts-mk2 sequence (#15031) for comparison.

Base automatically changed from timing-info-in-external-test-report to develop April 15, 2024 14:14
@cameel cameel removed the has dependencies The PR depends on other PRs that must be merged first label Apr 18, 2024
@cameel cameel force-pushed the seqbench-sequence-the-good-parts-mk3 branch from 4f23db2 to 2703127 Compare April 22, 2024 12:05
@cameel cameel marked this pull request as ready for review April 22, 2024 12:05
@cameel
Copy link
Member Author

cameel commented Apr 22, 2024

This is now reviewable as the final candidate to replace the default sequence.

See #14406 (comment) for optimization quality and timing comparison between sequences. Specifically the default one and different variants of the-good-parts.

@cameel
Copy link
Member Author

cameel commented Apr 22, 2024

@bshastry Can you do some fuzzing here?

@cameel
Copy link
Member Author

cameel commented Apr 27, 2024

Here's my analysis of hard and soft prerequisites for each step in the new sequence. Also, the same thing for the current sequence for comparison.

the-good-parts-mk3 sequence

Steps recommended before and after certain steps in the sequence:

dhfoDgvulfnTUtnIf   xa[r]EscLM Vcul [j] Trpeul xa[r]cL gvifM CTUca[r]LSsTFOtfDnca[r]Iulc scCTUt gvifM x[scCTUt] TOntnfDIul gvifM  jmul[jul] VcTOcul jmul :fDnTOcmuO

After:
......u....................u...uu.........u.........u...u.......u..............u.......u..u......u.......u..................u...............uu..u..............u...
.....................r.....r....r...............r...r...........rr.............rr......r..r..............r...................................r..r..............r...
...............................c..........i.................................................................................................c......................

Before:
...............t.............................................D......................t......D..............D............t...........................................
  • u = UnusedPruner
  • r = UnusedAssignEliminator
  • c = CommonSubexpressionEliminator
  • i = FullInliner
  • t = StructuralSimplifier
  • D = DeadCodeEliminator

Properties satisfied at each point in the sequence

dhfoDgvulfnTUtnIf   xa[r]EscLM Vcul [j] Trpeul xa[r]cL gvifM CTUca[r]LSsTFOtfDnca[r]Iulc scCTUt gvifM x[scCTUt] TOntnfDIul gvifM  jmul[jul] VcTOcul jmul :fDnTOcmuO
.............t.......x...xxx.x..x.........t.....x...x....x.x....xx.....x.ttt...xx......x.xx...t...x.x...xx...t...t.t.........x.x.............x.tx.............tx..t
....................XXXXXX+++++++++............XXXXXXXXXXXXXXX++++++++++++++++++++++++++++++++++++++++XXXXX+++++++++++++++++++++...................................

.........................aa.aa.......................a.....a.a.......aaa.................a.a........a...a.a....................a...................................
.....................AAAAAAAAA..................AAAAAAAAAAAA.....AAAAAAAAAAAAAAAAAAAAAAAAAA........................................................................

.....................c...cc.....................c................c.....c........c........c..............c..........................................................
...........................CCCCCCCCCCCCCCCC.........C...........CCCCC..........CCCCCCCCC..CCCCCCCC.......CCCCC...............................CCCCCCCCCCCCCCCCCCCCCC

....................i........i.................i...........i..........................................i........................i...................................
...............IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......IIIIIIIIIIIIIIIIIIIIIIII....................

Meaning of symbols (not that they're not the step abbreviations here):

  • a - SSA form recommended for this step.
  • A - Step creates or preserves SSA form.
  • x - Expression-split form recommended for this step.
  • t - Materialized literals recommended for this step.
  • X - Step creates or preserves expression-split form.
  • c - CSE recommended before this step.
  • C - Step is CSE or does not create significant new opportunities for CSE to act on.
  • i - Loop condition inside body recommended for this step.
  • I - Steps moves condition into loop body or keeps it there.
  • ~ - For repeated seqments: property not satisfied in the first pass but will be satisfied from the second one on.
  • + - Property may still be satisfied by a large part of the code.

Regarding C, this property is mostly speculation on my part. Based on my understanding of the docs I assume that these steps may be creating some new opportunities for CSE:

  • FullInliner (i)
  • ExpressionInliner (e)
  • FunctionSpecializer (F)
  • ExpressionSimplifier (s)
  • LoadResolver (L)

Also, for easy reference, steps that destroy SSA:

  • ConditionalSimplifier (C)
  • SSAReverser (V)

Steps that destroy expresion-split form (based on #15054)

  • ExpressionJoiner (j)
  • ForLoopConditionIntoBody (I)
  • LiteralRematerialiser (T)
  • Rematerialiser (m)
  • ExpressionSimplifier (s)

default sequence

Steps recommended before and after certain steps in the sequence:

dhfoDgvulfnTUtnIf [ xa[r]EscLM cCTUtTOntnfDIul Lcul Vcul [j] Tpeul xa[rul] xa[r]cL gvif CTUca[r]LSsTFOtfDnca[r]Iulc ]             jmul[jul] VcTOcul jmul :fDnTOcmu

After:
......u....................u...u................u...uu........u.................u...u......u..............u.......u.........................uu..u..............u..
.....................r.....r...r................r....r..............r.......r...r..........rr.............rr......r..........................r..r..............r..
....................................................c.........i.............................................................................c.....................

Before:
...............t................D..........t............................................D......................t..................................................

Properties satisfied at each point in the sequence

dhfoDgvulfnTUtnIf [ xa[r]EscLM cCTUtTOntnfDIul Lcul Vcul [j] Tpeul xa[rul] xa[r]cL gvif CTUca[r]LSsTFOtfDnca[r]Iulc ]             jmul[jul] VcTOcul jmul :fDnTOcmu
.............t.......x...x.x.x.x...t.t.t........x....x........t.....x.......x...x....x.....xx.....x.ttt...xx......x..........................x.tx.............tx..
....................XXXXXX++++++++++++++++++++++++++++++...........XXXXXXXXXXXXXXXXXXXXXX++++++++++++++++++++++++++++.............................................

.........................aa.aa..a..............a.................................a......a.......aaa...............................................................
....................~AAAAAAAAAAA....................................AAAAAAAAAAAAAAAAAAA.....AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.......................

.....................c...cc.........................................c.......c...............c.....c........c......................................................
....................~~~~~~~C...CCCCCCCCCCCCCCC..CCCCCCCCCCCCCCC.................C..........CCCCC..........CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

....................i........i.....................................i.......i......................................................................................
...............IIIIIIIIIIIIIIIIIIIIII......IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII..........IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...................

Comment on lines 26 to 34
// let p := mload(0x40)
// mstore(0x40, add(p, 96))
// let _1 := add(p, 128)
// mstore(_1, 2)
// mstore(0x40, 0x20)
// for { } iszero(1) { }
// { }
// sstore(0, _1)
// sstore(1, mload(0x40))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes as quite a bad regression - not that the case wasn't fully artificial to begin with, but the test case doesn't really make sense with this result anymore...

And it is rather suspicious that suddenly none of the things like removing empty for loops or resolving memory accesses happen anymore here :-). So the question is: can you easily explain why that is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The solution here may rather be changing the test case, rather than trying to tweak the sequence further, but it'd still be interesting to have an explanation for why this changes this drastically)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or well, maybe we don't even have to change the test case - we can still revisit this, if we're getting actual reports about real-world cases.

Copy link
Member Author

@cameel cameel May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated this regression for the previous sequence candidate: #14928 (review). I see that now the regression is a bit worse - that for loop wasn't there back then - but the same tweak should still resolve it here.

I ultimately decided not to include that tweak though, because when I benchmarked it, it seemed to make both bytecode size and runtime gas slightly worse in external tests and gas diff: #14928 (comment). Same with a few other tweaks that solved regressions - see links in the PR description.

It wasn't that much worse, so I could bring it back if you think it's worth it but unfortunately the general pattern I see is that resolving these regressions on their own can often be detrimental, at least in the contracts we use for benchmarking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead change the test case, s.t. it optimizes well again :-D?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are artificial cases, so we shouldn't hand-tweak the optimizer for these over the external tests - but the test case is a bit weird - I mean, arguably the way it's now this is a failing test :-).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I assume that the artificial part is the loop? I see it and the if above it were added in #5584 to test StructuralSimplifier. I removed it from the test and now the new sequence can deal with resolving the memory loads.

Still, I'm not sure if removing this is the right thing to do just because the new sequence can't handle it. It is a bit artificial, but it worked with the old sequence and the fact that it regressed is also potentially useful information.

I mean, maybe the case no longer makes sense with it (assuming it was meant to test memory load resolution - the name does not indicate that though), but then we could have a test specifically for the simplification:

{
    if sub(2, 1) {
        mstore(0x40, 0x20)
        for {} sub(1, 1) {} {}
    }
    sstore(1, mload(0x40))
}

I noticed that even this is a problem for the new sequence, even though it's much simpler. And to me it seems much less artificial. I could see some production code actually reducing to something close to this after passing through other steps. For now I added it as fullSuite/structural_simplification.sol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I noticed that StructuralSimplifier (specifically the one used in the TOntnfDIul segment) can deal with the loop if insert ExpressionSimplifier before it (TOnstnfDIul). And it makes sense because it looks like StructuralSimplifier on its own can't deal with even simple constant expressions, e.g. for {} iszero(1) {} {}. It needs the condition to be already resolved into false. Given that, shouldn't ExpressionSimplifier be recommended before StructuralSimplifier?

Also, ExpressionSimplifier only really helps it deal with the loop. The memory load is not resolved because simplification runs after the last occurrence of LoadResolver. I was curious if running the resolver once more, after the simplifier helps, and it does (when paired with CSE). So making it TOnstcLnfDIul lets the new sequence deal even with this artificial example. I did not add it to the sequence in the PR because it could again make things worse in other cases and would generally require rerunning all benchmarks. I can try it in a separate PR later though.

@cameel cameel force-pushed the seqbench-sequence-the-good-parts-mk3 branch from 2703127 to 136540b Compare May 11, 2024 17:37
@cameel
Copy link
Member Author

cameel commented May 11, 2024

As requested, I compared the new sequence with the old one, looking for steps that used to always have some properties satisfied due to how the sequence was structured and now no longer do. This is the case for the following steps:

  • UnusedAssignEliminator (r): will now run on non-expression-split form and non-SSA code
    • Note: even in the current sequence it sometimes ran on weak expression-split form.
  • FullInliner (i): will now run on weak expression-split form and non-SSA code
  • LoopInvariantCodeMotion (M): will now run run on non-SSA code
  • ExpressionSimplifier (s): will now run on non-SSA code
  • SSAReverser (V): will now run on non-CSE code
  • ExpressionJoiner (j): will now run on non-CSE code
  • Rematerialiser (m): will now run on non-CSE code
  • There are no steps that never ran with loop condition moved into loop body and now do.
  • There are no steps that never ran with loop condition moved out of loop body and now do.

Those are the properties I could easily check. Note that "non-CSE code" is defined as in my previous comment, which I'm not sure is correct. But this property does not strike me like something that could have impact on correctness, and I only really checked it only because I already had the data at hand.

By "weak expression-split form" I mean what's left of it after LiteralRematerialiser, ExpressionSimplifier and ForLoopConditionIntoBody run. Especially for the first one Daniel argues that the property is not destroyed in a meaningful way. I also updated the "diagrams" above to include this new information.

Overall, after looking at those steps, I would not expect any of the above to cause problems. At least not ones that can be easily predicted based on how they operate.

- The structural simplification here is pretty artificial while the main point of the test is resolution of memory loads.
- In exchange add a simpler case covering structural simplification for the whole sequence.
…didate

- `the-good-parts` variant found using seqbench.
- Third refinement:
    - adjusted to improve results for the erc20.sol contract.
    - adjustments to address regressions visible in test output of the previous version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants