satellite/overlay decrease NodeCheckInWaitPeriod #6795

littleskunk · 2024-02-21T17:27:00Z

What: Decrease NodeCheckInWaitPeriod from 2h down to 1h10m

Why: NodeCheckInWaitPeriod is designed to suppress unnecessary DB commits. At the same time, we want to allow the storage node to update its last_contact_success once every 2 hours. It turns out as long as the value is set to 2 hours the storage node has a high chance to get last_contact_success updates once every 3 hours because the second check-in is almost perfect on the 2 hour mark. This is a bit unfair. It means with just 1 hour of downtime the storage node might end up with an last_contact_success that is more than 4 hours old. At that point repair kicks in and moves pieces away from the node.

Reducing NodeCheckInWaitPeriod to 1h10m will make sure the node is allowed to checkin every 2 hours and even accounts for a possible restart after 1h30m or so. It would still commit the node checkin.

Please describe the tests:

Test 1:
Test 2:

Please describe the performance impact:

Code Review Checklist (to be filled out by reviewer)

onionjake · 2024-02-21T23:36:23Z

Does it already have jitter as well? Should probably have some random jitter to prevent thundering herds?

littleskunk · 2024-03-11T11:02:27Z

Does it already have jitter as well? Should probably have some random jitter to prevent thundering herds?

Partially yes and partially no but in any case that is not connected to my pull request. I don't change any of that code. If too many nodes checkin all at the same time that might cause problems no matter what the config value is set to.

satellite/overlay decrease NodeCheckInWaitPeriod

59d4221

cla-bot bot added the cla-signed label Feb 21, 2024

littleskunk mentioned this pull request Feb 26, 2024

Diagnose why we have so many slow nodes on the network #6759

Open

Merge branch 'main' into jh/checkin

5b141d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

satellite/overlay decrease NodeCheckInWaitPeriod #6795

satellite/overlay decrease NodeCheckInWaitPeriod #6795

littleskunk commented Feb 21, 2024

onionjake commented Feb 21, 2024

littleskunk commented Mar 11, 2024

satellite/overlay decrease NodeCheckInWaitPeriod #6795

Are you sure you want to change the base?

satellite/overlay decrease NodeCheckInWaitPeriod #6795

Conversation

littleskunk commented Feb 21, 2024

Code Review Checklist (to be filled out by reviewer)

onionjake commented Feb 21, 2024

littleskunk commented Mar 11, 2024