Attempt to recognize "seeding-only" chunks early #537

ldmberman · 2024-03-21T21:51:08Z

Send a distinct reply to POST /chunk.

apps/arweave/src/ar_data_sync.erl

JamesPiechota · 2024-03-28T01:42:12Z

apps/arweave/src/ar_storage_module.erl

+%% @doc Return true if the given range is covered by the configured storage modules.
+has_range(Start, End) ->
+	{ok, Config} = application:get_env(arweave, config),
+	Intervals = get_unique_sorted_intervals(Config#config.storage_modules),


I'm definitely not suggesting we worry about optimizing anything without testing - but subjectively/qualitatively: why do we think that sorting all the intervals on each POST /chunk request won't materially impact performance?

Is it because we believe that almost all the time the number of intervals to be sorted, even for a large-ish miner, isn't that many (i.e. order of thousands rather than millions)?

Hm, that's about 10 extra ms, do you prefer to cache it?

It looks like has_range is called from is_offset_vicinity_covered which is in turn called once or twice for every call to add_chunk. So we're estimating that this would add 10-20ms every time we write a chunk? Is that correct? Or is only called under certain scenarios when writing a chunk (I couldn't quite trace through all the branches)

If it's every time we write a chunk, it sounds like it could impact the syncing phase runtime. Maybe?

e.g. if we assume 14,400,000 chunks per partition, and 200 concurrent sync jobs. Then we're looking at an increase in time of (14,400,000 / 200) * 10-20ms = 720-1440seconds or 12-24 minutes.

Okay even in that case it actually doesn't sound super problematic. Syncing and packing a full partition on a Ryzen 9 will take 11 hours, so we're adding maybe 2-3%. It's not nothing, but not sure anyone will notice.

Assuming all the math above is correct - what do you think? How hard would it be to cache and is it worth the engineering time and code complexity to save 2-3% of the sync/pack time per partition?

There was an extra call (fixed) - so we invoke is_offset_vicinity_covered once per add_chunk. Also, forgot to mention - I assumed ~10k storage modules.

I've made an easy patch which caches the list. It's a bit of a compromise due to some memory-movement overhead - ideally, we create an ordered ets set and use it inside has_range.

oh wait it only adds 10ms for 10k storage modules? In that case no need to cache I don't think!

So at current size it's a fraction of an millisecond, right? I don't think that's something we need worry about then! By the time it takes a long time to search, I'm sure we'll have other, bigger bottlenecks :)

What do you think about keeping the cached version anyway?

Does the cache ever need to be invalidated? Like if the weave grows?

The configuration (including storage modules) is static at the moment.

JamesPiechota

LGTM!

Send a distinct reply to POST /chunk.

ldmberman requested a review from JamesPiechota March 21, 2024 22:47

JamesPiechota reviewed Mar 28, 2024

View reviewed changes

apps/arweave/src/ar_data_sync.erl Show resolved Hide resolved

JamesPiechota reviewed Mar 28, 2024

View reviewed changes

ldmberman force-pushed the feature/long-term-chunk-keeping-estimation branch from 8ef211e to c844a19 Compare April 3, 2024 08:21

JamesPiechota approved these changes Apr 17, 2024

View reviewed changes

Attempt to recognize "seeding-only" chunks early

80fb925

Send a distinct reply to POST /chunk.

ldmberman force-pushed the feature/long-term-chunk-keeping-estimation branch from c844a19 to 80fb925 Compare April 25, 2024 19:16

ldmberman merged commit a9817e4 into master May 1, 2024
58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to recognize "seeding-only" chunks early #537

Attempt to recognize "seeding-only" chunks early #537

ldmberman commented Mar 21, 2024

JamesPiechota Mar 28, 2024

ldmberman Mar 28, 2024

JamesPiechota Mar 29, 2024

ldmberman Mar 29, 2024

JamesPiechota Mar 29, 2024

JamesPiechota Mar 29, 2024

ldmberman Mar 29, 2024

JamesPiechota Mar 29, 2024

ldmberman Apr 3, 2024

JamesPiechota left a comment

Attempt to recognize "seeding-only" chunks early #537

Attempt to recognize "seeding-only" chunks early #537

Conversation

ldmberman commented Mar 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JamesPiechota left a comment

Choose a reason for hiding this comment