[seqbench] `the-good-parts-mk3` sequence #15030

cameel · 2024-04-15T11:48:25Z

Related to #14406
~~Depends on #15023 (only for benchmarking).~~ Merged

This is a variant I built by addressing regressions that were still present in the-good-parts-mk2 compared to the default sequence. It does not contain all the fixes - some of them make benchmarks worse, especially in external tests. I chose only ones that were positive (or at least neutral while removing some regressions):

[seqbench] the-good-parts-mk2 sequence #14928 (comment) (r before p)
[seqbench] the-good-parts-mk2 sequence #14928 (comment) (M after gvif)
[seqbench] the-good-parts-mk2 sequence #14928 (comment) (x before [scCTUt])
[seqbench] the-good-parts-mk2 sequence #14928 (comment) (O in cleanup)

Those are the modifications I ultimately rejected:

[seqbench] the-good-parts-mk2 sequence #14928 (comment) (t and c before L)
[seqbench] the-good-parts-mk2 sequence #14928 (comment) (ciTFviu; Tr in cleanup)
[seqbench] the-good-parts-mk2 sequence #14928 (comment) (Tr in cleanup)
[seqbench] the-good-parts-mk2 sequence #14928 (comment) (U in cleanup)
[seqbench] the-good-parts-mk2 sequence #14928 (comment) (U + O in cleanup)

cameel · 2024-04-15T11:50:24Z

`gas_diff_stats`

File name	IR optimized	Legacy optimized
`array/create_memory_array.sol`	9%
`array/array_storage_push_empty_length_address.sol`	8%
`externalContracts/base64.sol`	8%	-0%
`array/array_storage_index_boundary_test.sol`	7%
`array/delete/bytes_delete_element.sol`	7%	-0%
`array/copying/cleanup_during_multi_element_per_slot_copy.sol`	6%
`array/copying/storage_memory_packed_dyn.sol`	5%
`array/copying/array_copy_cleanup_uint40.sol`	5%
`abiEncoderV2/calldata_array.sol`	3%
`inheritance/inherited_function_calldata_memory_interface.sol`		3%
`abiEncoderV2/abi_encode_calldata_slice.sol`	3%	-0%
`abiEncoderV1/abi_encode_calldata_slice.sol`	3%	-0%
`array/array_storage_push_empty.sol`	2%
`array/array_storage_push_pop.sol`	2%
`array/array_storage_index_zeroed_test.sol`	2%
`isoltestTesting/balance_other_contract.sol`	2%
`externalContracts/prbmath_unsigned.sol`	2%
`array/dynamic_arrays_in_storage.sol`	2%
`externalContracts/FixedFeeRegistrar.sol`	1%	+0%
`various/erc20.sol`	1%	+0%
`abiEncoderV2/abi_encode_v2_in_function_inherited_in_v1_contract.sol`	1%	-0%
`events/event_indexed_string.sol`	1%	-0%
`array/fixed_arrays_as_return_type.sol`	1%	-0%
`various/selfdestruct_pre_cancun_multiple_beneficiaries.sol`	1%
`various/selfdestruct_post_cancun_multiple_beneficiaries.sol`	1%
`userDefinedValueType/erc20.sol`	1%	+0%
`array/copying/array_of_struct_calldata_to_storage.sol`	1%
`array/copying/array_copy_clear_storage.sol`	1%
`array/copying/memory_dyn_2d_bytes_to_storage.sol`	1%	-0%
`constructor/bytes_in_constructors_packer.sol`	+0%	1%
`events/event_dynamic_array_storage_v2.sol`	1%	+0%
`events/event_dynamic_array_storage.sol`	1%	+0%
`array/array_storage_index_access.sol`	1%
`externalContracts/prbmath_signed.sol`	1%
`array/push/push_no_args_bytes.sol`	1%
`various/value_complex.sol`	1%
`various/value_insane.sol`	+0%
`salted_create/salted_create_with_value.sol`	+0%
`byte_array_to_storage_cleanup.sol`	1%	-0%
`array/push/array_push.sol`	+0%
`structs/memory_structs_nested_load.sol`	+0%
`externalContracts/snark.sol`	+0%
`array/copying/array_of_struct_memory_to_storage.sol`	+0%
`array/copying/storage_memory_nested_from_pointer.sol`	+0%
`array/copying/storage_memory_nested.sol`	+0%
`various/selfdestruct_post_cancun_redeploy.sol`	+0%
`various/selfdestruct_pre_cancun_redeploy.sol`	+0%
`array/copying/array_copy_storage_storage_different_base.sol`	+0%
`array/copying/copy_byte_array_to_storage.sol`	+0%	-0%
`array/copying/array_copy_different_packing.sol`	+0%
`externalContracts/deposit_contract.sol`	+0%	+0%
`structs/struct_memory_to_storage_function_ptr.sol`	+0%
`array/copying/array_copy_storage_storage_dynamic_dynamic.sol`	+0%
`structs/struct_delete_storage_with_array.sol`	+0%
`array/push/nested_bytes_push.sol`	+0%	-0%
`array/constant_var_as_array_length.sol`	+0%	+0%
`array/copying/array_of_structs_containing_arrays_calldata_to_storage.sol`	+0%
`array/dynamic_multi_array_cleanup.sol`	+0%
`structs/struct_copy_via_local.sol`	+0%
`abiEncoderV1/struct/struct_storage_ptr.sol`	+0%
`libraries/using_library_mappings_public.sol`	+0%
`array/fixed_arrays_in_constructors.sol`	+0%
`array/push/array_push_struct.sol`	+0%
`array/copying/nested_array_element_storage_to_storage.sol`	+0%
`viaYul/copy_struct_invalid_ir_bug.sol`	+0%
`userDefinedValueType/calldata.sol`	+0%	-0%
`array/copying/array_copy_storage_to_memory_nested.sol`	+0%	+0%
`abiEncoderV2/abi_encode_v2.sol`	+0%	+0%
`storage/packed_storage_structs_bytes.sol`	+0%
`array/copying/calldata_array_dynamic_to_storage.sol`	+0%	-0%
`array/copying/bytes_storage_to_storage.sol`	+0%	-0%
`array/dynamic_array_cleanup.sol`	+0%
`immutable/use_scratch.sol`	+0%
`array/copying/array_nested_calldata_to_storage.sol`	+0%
`constructor/constructor_static_array_argument.sol`	+0%	-0%
`events/event_dynamic_nested_array_storage_v2.sol`	+0%	+0%
`various/destructuring_assignment.sol`	+0%	-0%
`array/copying/array_copy_target_simple.sol`	+0%
`functionCall/creation_function_call_with_salt.sol`	+0%
`array/copying/array_copy_storage_storage_different_base_nested.sol`	+0%
`array/push/array_push_struct_from_calldata.sol`	+0%	-0%
`array/copying/array_copy_including_array.sol`	+0%
`array/copying/storage_memory_nested_struct.sol`	+0%
`structs/copy_struct_array_from_storage.sol`	+0%
`functionCall/creation_function_call_with_args.sol`	+0%
`array/copying/array_nested_memory_to_storage.sol`	+0%
`array/copying/nested_dynamic_array_element_calldata_to_storage.sol`	+0%
`array/fixed_array_cleanup.sol`	+0%
`array/copying/array_copy_storage_storage_static_dynamic.sol`	+0%
`array/copying/elements_of_nested_array_of_structs_calldata_to_storage.sol`	+0%
`array/copying/nested_array_of_structs_calldata_to_storage.sol`	+0%
`abiencodedecode/abi_decode_simple_storage.sol`	+0%	-0%
`structs/calldata/calldata_struct_with_nested_array_to_storage.sol`	+0%	+0%
`array/bytes_length_member.sol`	+0%
`array/reusing_memory.sol`	+0%
`array/pop/array_pop_array_transition.sol`	+0%
`types/mapping/copy_from_mapping_to_mapping.sol`	+0%	+0%
`array/copying/array_of_structs_containing_arrays_memory_to_storage.sol`	+0%
`array/push/array_push_nested_from_calldata.sol`	+0%	-0%
`abiEncoderV1/abi_decode_v2_storage.sol`	+0%	+0%
`array/copying/nested_array_of_structs_memory_to_storage.sol`	+0%
`array/push/push_no_args_2d.sol`	+0%
`array/copying/array_storage_multi_items_per_slot.sol`	+0%
`structs/struct_copy.sol`	+0%
`abiEncoderV2/calldata_overlapped_dynamic_arrays.sol`	+0%	-0%
`array/copying/copy_function_internal_storage_array.sol`	+0%
`array/copying/array_copy_storage_storage_dyn_dyn.sol`	+0%
`various/staticcall_for_view_and_pure.sol`	-0%
`array/array_storage_length_access.sol`	-0%
`calldata/copy_from_calldata_removes_bytes_data.sol`	-0%
`array/copying/array_copy_storage_storage_static_static.sol`	-0%
`array/copying/bytes_inside_mappings.sol`	-0%
`array/copying/nested_array_of_structs_storage_to_storage.sol`	-0%
`array/invalid_encoding_for_storage_byte_array.sol`	-0%	-0%
`array/copying/array_of_function_external_storage_to_storage_dynamic_different_mutability.sol`	-0%	+0%
`constructor/no_callvalue_check.sol`	-0%
`array/copying/copy_byte_array_in_struct_to_storage.sol`	-0%	-0%
`array/copying/array_copy_nested_array.sol`	-0%	-0%
`array/array_memory_index_access.sol`	-0%
`array/copying/copy_removes_bytes_data.sol`	-0%
`various/many_subassemblies.sol`	-0%
`array/copying/elements_of_nested_array_of_structs_memory_to_storage.sol`	-0%
`immutable/multi_creation.sol`	-0%
`constructor/bytes_in_constructors_unpacker.sol`	-0%	+0%
`array/copying/array_of_function_external_storage_to_storage_dynamic.sol`	-0%	-0%
`array/copying/storage_memory_nested_bytes.sol`	-0%	+0%
`structs/struct_containing_bytes_copy_and_delete.sol`	-0%	-0%
`array/copying/array_copy_storage_storage_struct.sol`	-0%
`array/copying/function_type_array_to_storage.sol`	-0%	-0%
`abiEncoderV2/storage_array_encoding.sol`	-0%	-0%
`structs/copy_substructures_from_mapping.sol`	-0%	+0%
`various/skip_dynamic_types_for_structs.sol`	-0%	-0%
`array/copying/array_copy_target_simple_2.sol`	-0%
`storage/empty_nonempty_empty.sol`	-0%	-0%
`array/copying/array_to_mapping.sol`	-0%	+0%
`array/copying/array_elements_to_mapping.sol`	-0%	+0%
`array/copying/calldata_array_to_mapping.sol`	-0%
`array/push/byte_array_push_transition.sol`	-0%
`structs/copy_to_mapping.sol`	-0%	+0%
`structs/copy_from_mapping.sol`	-0%	+0%
`array/pop/byte_array_pop_long_storage_empty_garbage_ref.sol`	-0%
`array/copying/array_copy_calldata_storage.sol`	-0%	-0%
`array/pop/array_pop_uint16_transition.sol`	-0%
`array/copying/arrays_from_and_to_storage.sol`		-0%
`libraries/internal_types_in_library.sol`	-0%
`array/pop/array_pop_uint24_transition.sol`	-0%
`functionCall/gas_and_value_brace_syntax.sol`	-0%
`functionCall/gas_and_value_basic.sol`	-0%
`structs/copy_substructures_to_mapping.sol`	-0%	+0%
`functionCall/external_call_to_nonexisting_debugstrings.sol`	-1%	+0%
`constructor/arrays_in_constructors.sol`	-1%	+0%
`structs/structs.sol`	-0%
`functionCall/mapping_array_internal_argument.sol`	-0%
`array/pop/byte_array_pop_masking_long.sol`	-0%
`structs/conversion/recursive_storage_memory.sol`	-0%
`externalContracts/strings.sol`	-0%	+0%
`libraries/using_library_mappings_return.sol`	-1%
`array/copying/array_copy_target_leftover.sol`	-0%	-0%
`various/address_code.sol`	-0%	-0%
`array/byte_array_transitional_2.sol`	-1%
`array/arrays_complex_from_and_to_storage.sol`	-1%	+0%
`various/create_calldata.sol`	-1%
`functionCall/external_call_to_nonexisting.sol`	-1%	1%
`array/pop/byte_array_pop_long_storage_empty.sol`	-1%
`inlineAssembly/transient_storage_selfdestruct.sol`	-1%
`array/function_array_cross_calls.sol`	-3%	1%
`externalContracts/ramanujan_pi.sol`	-5%

cameel · 2024-04-15T11:56:22Z

Time benchmark

`the-good-parts-mk3`

test/benchmarks/run.sh modified to use the new sequence with solc 0.8.25 release binary.

File	Pipeline	Bytecode size	Time	Exit code
`verifier.sol`	legacy	4899 bytes	0.14 s	0
`verifier.sol`	via-ir	4324 bytes	0.35 s	0
`OptimizorClub.sol`	legacy	0 bytes	0.33 s	1
`OptimizorClub.sol`	via-ir	22175 bytes	1.79 s	0
`chains.sol`	legacy	5869 bytes	0.11 s	0
`chains.sol`	via-ir	21401 bytes	10.72 s	0

Compared to the-good-parts-mk2, timing is pretty much identical. Bytecode size differences are in single digits (but in favor of mk2).

default sequence for reference

File	Pipeline	Bytecode size	Time	Exit code
`verifier.sol`	legacy	4874 bytes	0.15 s	0
`verifier.sol`	via-ir	4351 bytes	0.58 s	0
`OptimizorClub.sol`	legacy	0 bytes	0.47 s	1
`OptimizorClub.sol`	via-ir	22193 bytes	3.68 s	0
`chains.sol`	legacy	5845 bytes	0.16 s	0
`chains.sol`	via-ir	23043 bytes	21.72 s	0

cameel · 2024-04-15T12:41:31Z

External test benchmark diff

CI run on timing-info-in-external-test-report vs CI run on seqbench-sequence-the-good-parts-mk3

`ir-no-optimize`

project	bytecode_size	deployment_gas	method_gas
brink	`0%`
colony	`0%`
elementfi	`0%`
ens	`0%`
euler
gnosis
gp2	`0%`
perpetual-pools	`0%`	`0%`	`-0.01% ✅`
pool-together	`0%`
uniswap	`0%`
yield_liquidator	`0%`	`-0%`	`0%`
zeppelin

`ir-optimize-evm+yul`

project	bytecode_size	deployment_gas	method_gas
brink	`+0.61% ❌`
colony	`+0.13% ❌`
elementfi	`-1.87% ✅`
ens	`-1.2% ✅`	`-2.96% ✅`	`-0.01% ✅`
euler	`+1% ❌`
gnosis
gp2	`+0.5% ❌`
perpetual-pools	`-0.23% ✅`	`-0.52% ✅`	`+0.02% ❌`
pool-together	`-1.29% ✅`
uniswap	`+1.67% ❌`
yield_liquidator	`+0.84% ❌`	`-0.2% ✅`	`+0.14% ❌`
zeppelin	`-0.48% ✅`	`-1.3% ✅`	`-0.01% ✅`

`ir-optimize-evm-only`

project	bytecode_size	deployment_gas	method_gas
brink	`0%`
colony	`0%`
elementfi	`0%`
ens	`0%`	`0%`	`0%`
euler
gnosis
gp2	`0%`
perpetual-pools	`0%`	`+0%`	`-0%`
pool-together	`0%`
uniswap	`0%`
yield_liquidator	`0%`	`+0%`	`0%`
zeppelin	`0%`

`legacy-no-optimize`

project	bytecode_size	deployment_gas	method_gas
brink	`0%`
colony	`0%`
elementfi	`0%`
ens	`0%`
euler	`0%`
gnosis	`0%`
gp2	`0%`
perpetual-pools	`0%`	`-0%`	`+0.01% ❌`
pool-together	`0%`
uniswap	`0%`
yield_liquidator	`0%`	`+0%`	`0%`
zeppelin	`0%`	`+0%`	`-0.06% ✅`

`legacy-optimize-evm+yul`

project	bytecode_size	deployment_gas	method_gas
brink	`-0.03% ✅`
colony	`+0.13% ❌`
elementfi	`+0.11% ❌`
ens	`+0.21% ❌`	`-0.01% ✅`	`-0%`
euler	`+0.16% ❌`
gnosis	`+0.19% ❌`
gp2	`+0.33% ❌`
perpetual-pools	`-0.01% ✅`	`+0.01% ❌`	`+0.01% ❌`
pool-together	`+0.04% ❌`
uniswap	`+0.04% ❌`
yield_liquidator	`+0.06% ❌`	`+0.02% ❌`	`-0.01% ✅`
zeppelin	`+0.17% ❌`	`+0.18% ❌`	`+0.02% ❌`

`legacy-optimize-evm-only`

project	bytecode_size	deployment_gas	method_gas
brink	`0%`
colony	`0%`
elementfi	`0%`
ens	`0%`	`0%`	`0%`
euler	`0%`
gnosis	`0%`
gp2	`0%`
perpetual-pools	`0%`	`+0%`	`-0%`
pool-together	`0%`
uniswap	`0%`
yield_liquidator	`0%`	`+0%`	`0%`
zeppelin	`0%`	`-0%`	`+0.13% ❌`

!V = version mismatch
!B = no value in the "before" version
!A = no value in the "after" version
!T = one or both values were not numeric and could not be compared
-0 = very small negative value rounded to zero
+0 = very small positive value rounded to zero

cameel · 2024-04-15T12:49:38Z

External test compilation time

Test	Time (default)	Time (mk2)	Time (mk3)	Change (mk3)
brink	5 s	4 s	4 s	-20%
colony	110 s	97 s	117 s	+6%
elementfi	168 s	105 s	98 s	-42%
ens	41 s	26 s	32 s	-22%
euler	64 s	36 s	42 s	-34%
gp2	40 s	32 s	35 s	-12%
perpetual-pools	77 s	47 s	65 s	-16%
pool-together	83 s	36 s	31 s	-63%
uniswap	105 s	56 s	49 s	-53%
yield_liquidator	30 s	18 s	22 s	-27%
zeppelin	263 s	160 s	139 s	-47%

Table includes also timing of the default sequence (timing-info-in-external-test-report branch) and the-good-parts-mk2 sequence (#15031) for comparison.

cameel · 2024-04-22T12:08:56Z

This is now reviewable as the final candidate to replace the default sequence.

See #14406 (comment) for optimization quality and timing comparison between sequences. Specifically the default one and different variants of the-good-parts.

cameel · 2024-04-22T12:21:03Z

@bshastry Can you do some fuzzing here?

cameel · 2024-04-27T21:52:28Z

Here's my analysis of hard and soft prerequisites for each step in the new sequence. Also, the same thing for the current sequence for comparison.

`the-good-parts-mk3` sequence

Steps recommended before and after certain steps in the sequence:

dhfoDgvulfnTUtnIf   xa[r]EscLM Vcul [j] Trpeul xa[r]cL gvifM CTUca[r]LSsTFOtfDnca[r]Iulc scCTUt gvifM x[scCTUt] TOntnfDIul gvifM  jmul[jul] VcTOcul jmul :fDnTOcmuO

After:
......u....................u...uu.........u.........u...u.......u..............u.......u..u......u.......u..................u...............uu..u..............u...
.....................r.....r....r...............r...r...........rr.............rr......r..r..............r...................................r..r..............r...
...............................c..........i.................................................................................................c......................

Before:
...............t.............................................D......................t......D..............D............t...........................................

u = UnusedPruner
r = UnusedAssignEliminator
c = CommonSubexpressionEliminator
i = FullInliner
t = StructuralSimplifier
D = DeadCodeEliminator

Properties satisfied at each point in the sequence

dhfoDgvulfnTUtnIf   xa[r]EscLM Vcul [j] Trpeul xa[r]cL gvifM CTUca[r]LSsTFOtfDnca[r]Iulc scCTUt gvifM x[scCTUt] TOntnfDIul gvifM  jmul[jul] VcTOcul jmul :fDnTOcmuO
.............t.......x...xxx.x..x.........t.....x...x....x.x....xx.....x.ttt...xx......x.xx...t...x.x...xx...t...t.t.........x.x.............x.tx.............tx..t
....................XXXXXX+++++++++............XXXXXXXXXXXXXXX++++++++++++++++++++++++++++++++++++++++XXXXX+++++++++++++++++++++...................................

.........................aa.aa.......................a.....a.a.......aaa.................a.a........a...a.a....................a...................................
.....................AAAAAAAAA..................AAAAAAAAAAAA.....AAAAAAAAAAAAAAAAAAAAAAAAAA........................................................................

.....................c...cc.....................c................c.....c........c........c..............c..........................................................
...........................CCCCCCCCCCCCCCCC.........C...........CCCCC..........CCCCCCCCC..CCCCCCCC.......CCCCC...............................CCCCCCCCCCCCCCCCCCCCCC

....................i........i.................i...........i..........................................i........................i...................................
...............IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......IIIIIIIIIIIIIIIIIIIIIIII....................

Meaning of symbols (not that they're not the step abbreviations here):

a - SSA form recommended for this step.
A - Step creates or preserves SSA form.
x - Expression-split form recommended for this step.
t - Materialized literals recommended for this step.
X - Step creates or preserves expression-split form.
c - CSE recommended before this step.
C - Step is CSE or does not create significant new opportunities for CSE to act on.
i - Loop condition inside body recommended for this step.
I - Steps moves condition into loop body or keeps it there.
~ - For repeated seqments: property not satisfied in the first pass but will be satisfied from the second one on.
+ - Property may still be satisfied by a large part of the code.

Regarding C, this property is mostly speculation on my part. Based on my understanding of the docs I assume that these steps may be creating some new opportunities for CSE:

FullInliner (i)
ExpressionInliner (e)
FunctionSpecializer (F)
ExpressionSimplifier (s)
LoadResolver (L)

Also, for easy reference, steps that destroy SSA:

ConditionalSimplifier (C)
SSAReverser (V)

Steps that destroy expresion-split form (based on #15054)

ExpressionJoiner (j)
ForLoopConditionIntoBody (I)
LiteralRematerialiser (T)
Rematerialiser (m)
ExpressionSimplifier (s)

`default` sequence

Steps recommended before and after certain steps in the sequence:

dhfoDgvulfnTUtnIf [ xa[r]EscLM cCTUtTOntnfDIul Lcul Vcul [j] Tpeul xa[rul] xa[r]cL gvif CTUca[r]LSsTFOtfDnca[r]Iulc ]             jmul[jul] VcTOcul jmul :fDnTOcmu

After:
......u....................u...u................u...uu........u.................u...u......u..............u.......u.........................uu..u..............u..
.....................r.....r...r................r....r..............r.......r...r..........rr.............rr......r..........................r..r..............r..
....................................................c.........i.............................................................................c.....................

Before:
...............t................D..........t............................................D......................t..................................................

Properties satisfied at each point in the sequence

dhfoDgvulfnTUtnIf [ xa[r]EscLM cCTUtTOntnfDIul Lcul Vcul [j] Tpeul xa[rul] xa[r]cL gvif CTUca[r]LSsTFOtfDnca[r]Iulc ]             jmul[jul] VcTOcul jmul :fDnTOcmu
.............t.......x...x.x.x.x...t.t.t........x....x........t.....x.......x...x....x.....xx.....x.ttt...xx......x..........................x.tx.............tx..
....................XXXXXX++++++++++++++++++++++++++++++...........XXXXXXXXXXXXXXXXXXXXXX++++++++++++++++++++++++++++.............................................

.........................aa.aa..a..............a.................................a......a.......aaa...............................................................
....................~AAAAAAAAAAA....................................AAAAAAAAAAAAAAAAAAA.....AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.......................

.....................c...cc.........................................c.......c...............c.....c........c......................................................
....................~~~~~~~C...CCCCCCCCCCCCCCC..CCCCCCCCCCCCCCC.................C..........CCCCC..........CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

....................i........i.....................................i.......i......................................................................................
...............IIIIIIIIIIIIIIIIIIIIII......IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII..........IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...................

ekpyron · 2024-05-06T12:04:01Z

test/libyul/yulOptimizerTests/fullSuite/medium.yul

+//         let p := mload(0x40)
+//         mstore(0x40, add(p, 96))
+//         let _1 := add(p, 128)
+//         mstore(_1, 2)
+//         mstore(0x40, 0x20)
+//         for { } iszero(1) { }
+//         { }
+//         sstore(0, _1)
+//         sstore(1, mload(0x40))


This strikes as quite a bad regression - not that the case wasn't fully artificial to begin with, but the test case doesn't really make sense with this result anymore...

And it is rather suspicious that suddenly none of the things like removing empty for loops or resolving memory accesses happen anymore here :-). So the question is: can you easily explain why that is?

(The solution here may rather be changing the test case, rather than trying to tweak the sequence further, but it'd still be interesting to have an explanation for why this changes this drastically)

Or well, maybe we don't even have to change the test case - we can still revisit this, if we're getting actual reports about real-world cases.

I investigated this regression for the previous sequence candidate: #14928 (review). I see that now the regression is a bit worse - that for loop wasn't there back then - but the same tweak should still resolve it here.

I ultimately decided not to include that tweak though, because when I benchmarked it, it seemed to make both bytecode size and runtime gas slightly worse in external tests and gas diff: #14928 (comment). Same with a few other tweaks that solved regressions - see links in the PR description.

It wasn't that much worse, so I could bring it back if you think it's worth it but unfortunately the general pattern I see is that resolving these regressions on their own can often be detrimental, at least in the contracts we use for benchmarking.

Can we instead change the test case, s.t. it optimizes well again :-D?

These are artificial cases, so we shouldn't hand-tweak the optimizer for these over the external tests - but the test case is a bit weird - I mean, arguably the way it's now this is a failing test :-).

So, I assume that the artificial part is the loop? I see it and the if above it were added in #5584 to test StructuralSimplifier. I removed it from the test and now the new sequence can deal with resolving the memory loads.

Still, I'm not sure if removing this is the right thing to do just because the new sequence can't handle it. It is a bit artificial, but it worked with the old sequence and the fact that it regressed is also potentially useful information.

I mean, maybe the case no longer makes sense with it (assuming it was meant to test memory load resolution - the name does not indicate that though), but then we could have a test specifically for the simplification:

{ if sub(2, 1) { mstore(0x40, 0x20) for {} sub(1, 1) {} {} } sstore(1, mload(0x40)) }

I noticed that even this is a problem for the new sequence, even though it's much simpler. And to me it seems much less artificial. I could see some production code actually reducing to something close to this after passing through other steps. For now I added it as fullSuite/structural_simplification.sol.

By the way, I noticed that StructuralSimplifier (specifically the one used in the TOntnfDIul segment) can deal with the loop if insert ExpressionSimplifier before it (TOnstnfDIul). And it makes sense because it looks like StructuralSimplifier on its own can't deal with even simple constant expressions, e.g. for {} iszero(1) {} {}. It needs the condition to be already resolved into false. Given that, shouldn't ExpressionSimplifier be recommended before StructuralSimplifier?

Also, ExpressionSimplifier only really helps it deal with the loop. The memory load is not resolved because simplification runs after the last occurrence of LoadResolver. I was curious if running the resolver once more, after the simplifier helps, and it does (when paired with CSE). So making it TOnstcLnfDIul lets the new sequence deal even with this artificial example. I did not add it to the sequence in the PR because it could again make things worse in other cases and would generally require rerunning all benchmarks. I can try it in a separate PR later though.

cameel · 2024-05-11T18:01:44Z

As requested, I compared the new sequence with the old one, looking for steps that used to always have some properties satisfied due to how the sequence was structured and now no longer do. This is the case for the following steps:

UnusedAssignEliminator (r): will now run on non-expression-split form and non-SSA code
- Note: even in the current sequence it sometimes ran on weak expression-split form.
FullInliner (i): will now run on weak expression-split form and non-SSA code
LoopInvariantCodeMotion (M): will now run run on non-SSA code
ExpressionSimplifier (s): will now run on non-SSA code
SSAReverser (V): will now run on non-CSE code
ExpressionJoiner (j): will now run on non-CSE code
Rematerialiser (m): will now run on non-CSE code
There are no steps that never ran with loop condition moved into loop body and now do.
There are no steps that never ran with loop condition moved out of loop body and now do.

Those are the properties I could easily check. Note that "non-CSE code" is defined as in my previous comment, which I'm not sure is correct. But this property does not strike me like something that could have impact on correctness, and I only really checked it only because I already had the data at hand.

By "weak expression-split form" I mean what's left of it after LiteralRematerialiser, ExpressionSimplifier and ForLoopConditionIntoBody run. Especially for the first one Daniel argues that the property is not destroyed in a meaningful way. I also updated the "diagrams" above to include this new information.

Overall, after looking at those steps, I would not expect any of the above to cause problems. At least not ones that can be easily predicted based on how they operate.

- The structural simplification here is pretty artificial while the main point of the test is resolution of memory loads. - In exchange add a simpler case covering structural simplification for the whole sequence.

…didate - `the-good-parts` variant found using seqbench. - Third refinement: - adjusted to improve results for the erc20.sol contract. - adjustments to address regressions visible in test output of the previous version.

cameel added optimizer has dependencies The PR depends on other PRs that must be merged first labels Apr 15, 2024

cameel self-assigned this Apr 15, 2024

cameel changed the title ~~[seqbench] the-good-parts-mk3 sequence~~ [seqbench] the-good-parts-mk3 sequence Apr 15, 2024

cameel mentioned this pull request Apr 15, 2024

[seqbench] the-good-parts-mk2 sequence (on Cancun) #15031

Closed

Base automatically changed from timing-info-in-external-test-report to develop April 15, 2024 14:14

cameel removed the has dependencies The PR depends on other PRs that must be merged first label Apr 18, 2024

cameel mentioned this pull request Apr 22, 2024

Improve default optimizer steps sequence in via-ir #14406

Open

cameel force-pushed the seqbench-sequence-the-good-parts-mk3 branch from 4f23db2 to 2703127 Compare April 22, 2024 12:05

cameel marked this pull request as ready for review April 22, 2024 12:05

cameel mentioned this pull request Apr 25, 2024

Add missing dependencies and extra info to optimizer step docs #15054

Open

ekpyron reviewed May 6, 2024

View reviewed changes

cameel force-pushed the seqbench-sequence-the-good-parts-mk3 branch from 2703127 to 136540b Compare May 11, 2024 17:37

cameel added 4 commits May 11, 2024 20:02

Make fullSuite medium test case less artificial

10afe0d

- The structural simplification here is pretty artificial while the main point of the test is resolution of memory loads. - In exchange add a simpler case covering structural simplification for the whole sequence.

Update gas expectations

c12fe8a

Update other test expectations

e19abe4

cameel force-pushed the seqbench-sequence-the-good-parts-mk3 branch from 136540b to e19abe4 Compare May 11, 2024 18:10

cameel mentioned this pull request May 11, 2024

Remove redundant prerequisite steps from the default optimizer sequence #15092

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[seqbench] `the-good-parts-mk3` sequence #15030

[seqbench] `the-good-parts-mk3` sequence #15030

cameel commented Apr 15, 2024 •

edited

cameel commented Apr 15, 2024

cameel commented Apr 15, 2024 •

edited

cameel commented Apr 15, 2024

cameel commented Apr 15, 2024 •

edited

cameel commented Apr 22, 2024

cameel commented Apr 22, 2024

cameel commented Apr 27, 2024 •

edited

ekpyron May 6, 2024

ekpyron May 6, 2024

ekpyron May 6, 2024

cameel May 6, 2024 •

edited

ekpyron May 6, 2024

ekpyron May 6, 2024

cameel May 11, 2024

cameel May 11, 2024

cameel commented May 11, 2024 •

edited

[seqbench] the-good-parts-mk3 sequence #15030

Are you sure you want to change the base?

[seqbench] the-good-parts-mk3 sequence #15030

Conversation

cameel commented Apr 15, 2024 • edited

cameel commented Apr 15, 2024

gas_diff_stats

cameel commented Apr 15, 2024 • edited

Time benchmark

the-good-parts-mk3

default sequence for reference

cameel commented Apr 15, 2024

External test benchmark diff

ir-no-optimize

ir-optimize-evm+yul

ir-optimize-evm-only

legacy-no-optimize

legacy-optimize-evm+yul

legacy-optimize-evm-only

cameel commented Apr 15, 2024 • edited

External test compilation time

cameel commented Apr 22, 2024

cameel commented Apr 22, 2024

cameel commented Apr 27, 2024 • edited

the-good-parts-mk3 sequence

Steps recommended before and after certain steps in the sequence:

Properties satisfied at each point in the sequence

default sequence

Steps recommended before and after certain steps in the sequence:

Properties satisfied at each point in the sequence

ekpyron May 6, 2024

Choose a reason for hiding this comment

ekpyron May 6, 2024

Choose a reason for hiding this comment

ekpyron May 6, 2024

Choose a reason for hiding this comment

cameel May 6, 2024 • edited

Choose a reason for hiding this comment

ekpyron May 6, 2024

Choose a reason for hiding this comment

ekpyron May 6, 2024

Choose a reason for hiding this comment

cameel May 11, 2024

Choose a reason for hiding this comment

cameel May 11, 2024

Choose a reason for hiding this comment

cameel commented May 11, 2024 • edited

[seqbench] `the-good-parts-mk3` sequence #15030

[seqbench] `the-good-parts-mk3` sequence #15030

cameel commented Apr 15, 2024 •

edited

`gas_diff_stats`

cameel commented Apr 15, 2024 •

edited

`the-good-parts-mk3`

`ir-no-optimize`

`ir-optimize-evm+yul`

`ir-optimize-evm-only`

`legacy-no-optimize`

`legacy-optimize-evm+yul`

`legacy-optimize-evm-only`

cameel commented Apr 15, 2024 •

edited

cameel commented Apr 27, 2024 •

edited

`the-good-parts-mk3` sequence

`default` sequence

cameel May 6, 2024 •

edited

cameel commented May 11, 2024 •

edited