Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disconnect on a malformed version message, and reject duplicate verack messages #6781

Merged
merged 5 commits into from
May 22, 2024

Conversation

daira
Copy link
Contributor

@daira daira commented Oct 31, 2023

This is loosely based on and supercedes #6780 by @AndreaLanfranchi, but takes a different approach. It makes the following changes:

  • Deserialization of fields of a "version" message now uses types of platform-independent size.
  • If deserialization of a "version" message fails due to a std::ios_base::failure, the "version" message will be rejected and the peer will be marked to disconnect (pfrom->fDisconnect = true).
  • A peer's nVersion and nServices fields are no longer set (i.e. they will be left at 0) until after the mandatory fields of a "version" message have been successfully parsed and validated. Therefore, the nVersion field will never be set to a non-zero value below MIN_PEER_PROTO_VERSION (or MIN_TESTNET_PEER_PROTO_VERSION for Testnet). This makes the peer version handling simpler to reason about.
  • pfrom->fSuccessfullyConnected is not set until after pfrom->nTimeOffset has been initialized.
  • If a duplicate "verack" message is received, it will be rejected in the same way as a duplicate "version" message.

@daira
Copy link
Contributor Author

daira commented Oct 31, 2023

@AndreaLanfranchi I used "Co-authored-by: Andrea Lanfranchi <andrea.lanfranchi@gmail.com>" to attribute you as a co-author for the last three commits. However I can't sign those with your gpg key, and you have "vigilant mode" set (which warns if any commit attributed to you is not signed by your key). Would you prefer me to remove those per-commit attributions and only acknowledge you in the PR description, or is that okay as-is despite the "Unverified" warning?

Copy link
Contributor

@nuttycom nuttycom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 54bdbf5

src/main.cpp Outdated
pfrom->nVersion = nVersion;
pfrom->nServices = nServices;

if (!vRecv.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these check the length remaining to ensure sufficient bytes remain for successful parsing? Or is the approach here to rely upon exception handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of the message is allowed to be either the minimum length (just the mandatory fields), or exactly one of the lengths obtained by adding each optional set of fields in turn, or anything longer than the longest such length (to allow for future expansion). That's exactly what the current code allows, and yes, we're relying on exception handling to reject anything else.

@daira
Copy link
Contributor Author

daira commented Oct 31, 2023

Since this potentially affects peer-to-peer protocol compatibility (and involves parsing code and exceptions in C++), I'd like two reviews please :-)

I suggest commit-by-commit review, with whitespace changes ignored for the third commit.

@daira
Copy link
Contributor Author

daira commented Oct 31, 2023

Incidentally, I also checked that nVersion and nStartingHeight were the only places in ProcessMessage where we deserialized into a platform-dependent type (int in both cases). We don't support platforms with int less than 32 bits, but int technically could be longer.

Deserialization of bool is well-defined (any non-zero byte is treated as true):

template<typename Stream> inline void Unserialize(Stream& s, bool& a) { char f=ser_readdata8(s); a=f; }

@daira daira added the safe-to-build Used to send PR to prod CI environment label Nov 1, 2023
@ECC-CI ECC-CI removed the safe-to-build Used to send PR to prod CI environment label Nov 1, 2023
@AndreaLanfranchi
Copy link
Contributor

@AndreaLanfranchi I used "Co-authored-by: Andrea Lanfranchi <andrea.lanfranchi@gmail.com>" to attribute you as a co-author for the last three commits. However I can't sign those with your gpg key, and you have "vigilant mode" set (which warns if any commit attributed to you is not signed by your key). Would you prefer me to remove those per-commit attributions and only acknowledge you in the PR description, or is that okay as-is despite the "Unverified" warning?

Hi @daira,
I have no problems with either choice. It mostly depends on repo constraints whether it allow the application of unverfieid commits. And thank you for asking.

@daira
Copy link
Contributor Author

daira commented Nov 2, 2023

@AndreaLanfranchi wrote:

I have no problems with either choice.

In that case it can stay as-is.

Copy link
Contributor

@str4d str4d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed as of 54bdbf5.

src/main.cpp Outdated
Comment on lines 6992 to 6999
LOCK(cs_main);
CNodeState* state = State(pfrom->GetId());
if (state->fCurrentlyConnected) {
pfrom->PushMessage("reject", strCommand, REJECT_DUPLICATE, std::string("Duplicate verack message"));
Misbehaving(pfrom->GetId(), 1);
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that State() is used elsewhere without being guarded by pfrom->fNetworkNode (which makes sense as it is intended for decoupling node-relevant network state from the network locks).

Currently the new reject will never trigger when a duplicate verack message is received from a non-fNetworkNode, which AFAICT includes all inbound connections (fNetworkNode is only set to true in OpenNetworkConnection for outbound connections). This implies that if we want to reject duplicate verack messages from all non-local CNodes (both outbound and inbound), then we need a new boolean to track that state (because fCurrentlyConnected is from upstream, and so we shouldn't go changing its semantics inside the node).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the presence, actually, of two separate booleans (fCurrentlyConnected from State and fSuccessfullyConnected from CNode member) are a bit confusing and, IMHO, misleading. Wrt specifically to the latter its meaning is ambiguous as it may appear as a redundant statement of what already is in State but at the same time it doesn't mean the handshake process has been fully completed (exchange amongst the two ends of the socket of both version and verack).
I believe it would be beneficial to move the valuation of fSuccessfullyConnected to true only after verack message has been successfully parsed and processed and at the same time move there as well :

  1. the getaddr request (if applicable)
  2. the advertising of local address (if applicable)
  3. the tagging as "good" (in addrman) of the sender's address (if applicable): actually a "good" address, IMHO, should be tagged as such only after a succesfully completed protocol handshake.

In addition to that any ping sendout should be prohibited if the handshake has not been completed in full.

All of this should be helpful to solve issues #5168 and #6038 (protocol handshake should NOT be interleaved by any message other than version and verack).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth mentioning on the matter that latest bitcoin core implementation totally disregards the 26 bytes of the "from" address in version message while acknowledging the remote peer address from its addr messages.

Copy link
Contributor Author

@daira daira Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As #6038 demonstrates, we cannot drop messages that are received before we have sent "verack", because previous versions of zcashd send them at that point and expect a response. We can stop sending them before we have received "verack", but we must accept them when received.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, #5168 and #6038 are a separate issue or issues that we should not attempt to fix in this PR, which is already complicated enough to analyse. I'm with @str4d that we should not be changing the semantics of variables that are present in Bitcoin Core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As #6038 demonstrates, we cannot drop messages that are received before we have sent "verack", because previous versions of zcashd send them at that point and expect a response. We can stop sending them before we have received "verack", but we must accept them when received.

I think abstain from consuming them before the handshake has been completed in full could be a valuable intermediate solution. Anyhow, just for my knowledge and education, latest bitcoin core's code base have moved the valuation of fSuccessfullyConnected flag from CNode's into the processing of verack message.

In any case I'm not pushing to have more changes in this PR specifically.

@daira daira added this to the Release 5.10.0 milestone Apr 18, 2024
@daira daira force-pushed the disconnect-on-invalid-version-message branch from 54bdbf5 to 09637cd Compare May 16, 2024 19:14
nuttycom
nuttycom previously approved these changes May 16, 2024
Copy link
Contributor

@nuttycom nuttycom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 09637cd with nonblocking question.

src/main.cpp Show resolved Hide resolved
daira and others added 5 commits May 17, 2024 13:30
>= MIN_PEER_PROTO_VERSION (170100) at this point.

Signed-off-by: Daira Emma Hopwood <daira@jacaranda.org>
* fields were deserialized into variables of platform-dependent length;
* pfrom->nVersion was set even if it does not meet the minimum version
  requirements;
* make sure that pfrom->nTimeOffset is set before
  pfrom->fSuccessfullyConnected.

At this point it is still possible for parsing to fail due to a
`std::ios_base::failure`, which will not cause a disconnect.

Co-authored-by: Andrea Lanfranchi <andrea.lanfranchi@gmail.com>
Signed-off-by: Daira Emma Hopwood <daira@jacaranda.org>
Co-authored-by: Andrea Lanfranchi <andrea.lanfranchi@gmail.com>
Signed-off-by: Daira Emma Hopwood <daira@jacaranda.org>
Co-authored-by: Andrea Lanfranchi <andrea.lanfranchi@gmail.com>
Signed-off-by: Daira Emma Hopwood <daira@jacaranda.org>
This should make occurrences of NodeId wrapping essentially impossible for
real-world usage.

Backport of bitcoin/bitcoin#10176

[zcashd] I have checked that zcashd has no current uses of NodeId that
depend on it being an `int`. All accesses to the global `nLastNodeId`
are under lock. `NodeId` *is* required to be a signed integral type,
because `-1` is used as a sentinel value. It is also formatted using
the `%d` tinyformat specifier, but unlike the C format specifier it is
inspired by, this correctly handles integral types of arbitrary width.

There are `NodeId` fields in `CNodeStats`, `NodeEvictionCandidate`, and
(test-only) `COrphanTx`, but those types are not serializable, and there
is no other ad-hoc serialization of `NodeId` values apart from its use
in the "id" field of the output from the `getpeerinfo` RPC. `UniValue` has
an override of `pushKV` for `int64_t`, and so that use will correctly
handle values up to the [maximum safe JSON/JavaScript integer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER),
i.e. $2^{53} - 1$. As upstream did, we argue that it is not feasible to
cause that value to be exceeded.

Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
Signed-off-by: Daira-Emma Hopwood <daira@jacaranda.org>
Copy link
Contributor

@str4d str4d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK abe934c.

I re-confirmed that, per my above comment, this PR only rejects duplicate verack messages for outbound connections (because it reuses the existing state->fCurrentlyConnected boolean which is used for tracking outbound connection attempts to addresses in addrman). If duplicate verack rejection is desired for inbound connections, that will need to be implemented in a separate PR with a new boolean that isn't coupled to outbound connection attempts.

@str4d str4d merged commit e78b319 into zcash:master May 22, 2024
139 of 148 checks passed
@str4d str4d modified the milestones: Release 5.10.0, Release 5.9.1 May 22, 2024
@daira daira deleted the disconnect-on-invalid-version-message branch May 27, 2024 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants