Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take into account network latency when syncing #55

Merged
merged 6 commits into from Mar 16, 2022
Merged

Conversation

heifner
Copy link
Member

@heifner heifner commented Mar 13, 2022

Take into account network latency when syncing from a node to avoid getting stuck in an always lib catchup state.

Co-authored-by: Farhad Shahabi farhad.shahabi@block.one

…etting stuck in an always lib catchup state.

Co-authored-by: Farhad Shahabi <farhad.shahabi@block.one>
plugins/net_plugin/include/eosio/net_plugin/protocol.hpp Outdated Show resolved Hide resolved
@@ -1642,15 +1644,25 @@ namespace eosio {

sync_reset_lib_num(c);

auto current_time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
auto network_latency_ns = current_time_ns - msg.time; // net latency in nanoseconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this is negative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative would mean time skew between the nodes, should just make it 0 if < 0 I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense if the clock skew is known to be small... but you removed the check for skew. If the clock skew is close to the latency, then one side will see double latency and the other will see 0 latency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for skew that was removed never worked (see comment I just added to PR for that section of code). I'm open for suggestions on alternatives, but I don't think there is any way to improve that, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have to be based on RTT (which can be measured independent of clock skew) rather than one-way latency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would work, but require a new protocol version and RT message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably fine to assume low clock skew for now, since we've survived so long without a working check for clock skew. This PR doesn't make things worse in that regard.

@heifner heifner requested a review from swatanabe March 14, 2022 12:50
("peer", msg.p2p_address)("time", "1 second")); // TODO Add to_variant for std::chrono::system_clock::duration
return false;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding note to this PR here for future documentation of why this was removed. Removed this code because it could never have worked. time is in microseconds where msg.time is in nanoseconds so time - msg_time is always negative.

Also there is no way to do what this was trying to do. You don't know how much network latency is involved so you have no idea what clock skew is involved.

}
// number of blocks syncing node is behind from a peer node
uint32_t nblk_behind_by_net_latency = static_cast<uint32_t>(network_latency_ns / block_interval_ns);
// Multiplied by 2 to compensate the time it takes for message to reach peer node, and plus 1 to compensate for integer division truncation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we change it to "to reach back to that peer node" I think the 2 times will be clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@heifner heifner merged commit 54d4286 into main Mar 16, 2022
@heifner heifner deleted the fsh-sync-to-chain branch March 16, 2022 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants