-
Notifications
You must be signed in to change notification settings - Fork 35.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/BIP324: disconnection scenarios during v2 handshake #29431
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
@sr-gi, i tried this manually sending idea but i still think intermittent failures are possible there.
here's a branch where i tweaked the code you shared a bit with a sleep statement for making the test crash and reintroducing |
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
added two comments I will continue to test these changes as I am still trying to understand fully what the tests do exactly
ec9005c
to
22cba89
Compare
I've been playing around sending anything different than the first 4 bytes as the network magic and test fail on my end (which is expected given Do you have any working example I can try to reproduce? |
yes! that's expected behaviour and sending anything other than the first 4 bytes as network magic will make the test fail on master. what i was trying to say was:
so i was just trying to say that we can't remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
The one thing I'm a bit unsure about is that this duplicates quite a lot of code, especially in complete_handshake
- the main implementation in v2_p2p.py
and the overwritten version from this test could run out of sync in the future. But I don't have a good idea how to avoid this yet...
22cba89
to
9ca0439
Compare
Early key response test is a special kind of test which requires modified v2 handshake functions. More such tests can be added where v2 handshake functions send incorrect garbage terminator, excess garbage bytes etc.. Hence, rename p2p_v2_earlykey.py to a general test file name - p2p_v2_misbehaving.py. random_bitflip function (used in signature tests prior to this commit) can be used in p2p_v2_misbehaving test to generate wrong garbage terminator, wrong garbage bytes etc.. So, move the function to util.
Adds a new boolean parameter `expect_success` to the `add_p2p_connection` method. If set, the node under test doesn't wait for connection to be established and is useful for testing scenarios when disconnection is expected. Without this parameter, intermittent test failures can happen if the disconnection happens before wait_until for is_connected is hit inside `add_p2p_connection`. Co-Authored-By: Sebastian Falbesoner <sebastian.falbesoner@gmail.com>
9ca0439
to
67708b6
Compare
hmm good point. will need to think about it more. simplest solution would be keeping it all in main implementation in |
One idea to avoid at least reimplementing diff --git a/test/functional/test_framework/v2_p2p.py b/test/functional/test_framework/v2_p2p.py
index 8f79623bd8..c27cd5e2fe 100644
--- a/test/functional/test_framework/v2_p2p.py
+++ b/test/functional/test_framework/v2_p2p.py
@@ -111,10 +111,11 @@ class EncryptedP2PState:
# Responding, place their public key encoding first.
return TaggedHash("bip324_ellswift_xonly_ecdh", ellswift_theirs + ellswift_ours + ecdh_point_x32)
- def generate_keypair_and_garbage(self):
+ def generate_keypair_and_garbage(self, garbage_len=None):
"""Generates ellswift keypair and 4095 bytes garbage at max"""
self.privkey_ours, self.ellswift_ours = ellswift_create()
- garbage_len = random.randrange(MAX_GARBAGE_LEN + 1)
+ if garbage_len is None:
+ garbage_len = random.randrange(MAX_GARBAGE_LEN + 1)
self.sent_garbage = random.randbytes(garbage_len)
logger.debug(f"sending {garbage_len} bytes of garbage data")
return self.ellswift_ours + self.sent_garbage The overruled method of
(For the |
I find the approach fairly hard to follow here. Having all the logic in the constructor with if/elses based on the connection type instead of having different constructors/a test for each type of failure feels really error-prone and difficult to make sure if a test is doing what we expect. I think it'd be better to have multiple classes that build from Just an example: In the |
67708b6
to
a256640
Compare
that's a great suggestion @sr-gi! it's possible to design those classes in such a way that it avoids code duplication and the file is almost the same size!
because we don't send a version message to ensure that the disconnection happens in the v2 handshake phase. in your case (different from what we are testing here), we send the correct garbage terminator but because we don't send a version message - disconnection is still expected. |
Currently, we log the number of bytes of garbage when it is generated. The log is a better fit for when the garbage actually gets sent to the transport layer.
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
a256640
to
b062556
Compare
I've updated the PR to have different child class implementations for Also introduced 2 more commits for cleaner code:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach much better, it is way easier to follow and reason about. I left some comments inline.
because we don't send a version message to ensure that the disconnection happens in the v2 handshake phase. in your case (different from what we are testing here), we send the correct garbage terminator but because we don't send a version message - disconnection is still expected.
What is weird is that the test is expecting a specific log message, which seems to also match in this case. This is still the case by the way. Is there an alternative way of making sure this does not happen?
"""Send ellswift and garbage bytes in 2 parts when TestType = (EARLY_KEY_RESPONSE)""" | ||
# Here, the 64 bytes ellswift is assumed to have it's first 4 bytes match network magic bytes. | ||
# It is sent in 2 phases: | ||
# 1. when `magic_sent` = False, send first 4 bytes of ellswift (matches network magic bytes) | ||
# 2. when `magic_sent` = True, send remaining 60 bytes of ellswift | ||
if not self.magic_sent: | ||
self.generate_keypair_and_garbage() | ||
self.magic_sent = True | ||
return MAGIC_BYTES[self.net] | ||
else: | ||
# `can_data_be_received` is a variable used to assert if data is received on recvbuf. | ||
# 1. v2 TestNode shouldn't respond back if we send V1_PREFIX and data shouldn't be received on recvbuf. | ||
# This state is represented using `can_data_be_received` = False. | ||
# 2. v2 TestNode responds back when mismatch from V1_PREFIX happens and data can be received on recvbuf. | ||
# This state is represented using `can_data_be_received` = True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 6d9df28
I still think this can be simplified. initiate_v2_handshake
is going to be called, asynchronously, when creating this object. You can reduce a lot of the boilerplate, and get rid of magic_sent
, as long as you do the last bit manually on the test (or just create another method, like continue_v2_handshake
).
It just feels weird to me that this initialization method is called twice (one async, once manually), it doesn't feel too intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this can be simplified. initiate_v2_handshake is going to be called, asynchronously, when creating this object. You can reduce a lot of the boilerplate, and get rid of magic_sent, as long as you do the last bit manually on the test (or just create another method, like continue_v2_handshake).
yes, that can be done since the last bit is being sent from MainThread
anyways.
before:
- send both 4 bytes network magic (first 4 bytes of ellswift) -
NetworkThread
- remaining
ellswift_garbage
bytes -MainThread
now:
- send both 4 bytes network magic (first 4 bytes of ellswift) -
MainThread
- remaining
ellswift_garbage
bytes -MainThread
realised my earlier comment is the situation in master too and a sleep in MainThread
where there's no waiting for some other tasks is unlikely. 😬
I've used your original suggestion and updated the PR to send everything manually!
Currently, transport version is a global variable declared as TRANSPORT_VERSION in v2_p2p.py. Making it an instance variable would help in sending non empty transport version packets for testing purposes. It might also help EncryptedP2PState be more extensible in far future protocol upgrades.
Prior to this commit, TestEncryptedP2PState would always send initial_v2_handshake bytes in 2 parts (as required by early key response test). For generalising this test and having different v2 handshake behaviour based on the test type, special behaviours like sending initial_v2_handshake bytes in 2 parts are executed only if test_type is set to EARLY_KEY_RESPONSE.
This test type is represented using EXCESS_GARBAGE.
…is sent This test type is represented using WRONG_GARBAGE_TERMINATOR. since the wrong garbage terminator is sent to TestNode, TestNode will interpret all of the gabage bytes, wrong garbage terminator, decoy messages and version packet it receives as garbage bytes. If the length of all these is more than 4095 + 16, it will result in a missing garbage terminator error. otherwise, it will result in a V2 handshake timeout error. Send only MAX_GARBAGE_LEN//2 bytes of garbage data to TestNode so that the total length received by the TestNode is at max = (MAX_GARBAGE_LEN//2) + 16 + 10*120 + 20 = 3283 bytes (which is less than 4095 + 16 bytes) and we get a consistent V2 handshake timeout error message. If we do not limit the garbage length sent, we will intermittently get both missing garbage terminator error and V2 handshake timeout error based on the garbage length and decoy packets length which are chosen at random.
… different This test type is represented using WRONG_GARBAGE. Here, garbage bytes sent to TestNode are assumed to be tampered with and do not correspond to the garbage bytes which P2PInterface calculated and uses.
This test type is represented using SEND_NO_AAD. If AAD of the first encrypted packet sent after the garbage terminator (optional decoy packet/version packet) hasn't been filled, disconnection happens.
…tion happens This test type is represented using SEND_NON_EMPTY_VERSION_PACKET.
b062556
to
ae528cb
Compare
thank you for the helpful reviews @sr-gi! I've updated the PR:
hmm fair point, i've added a log message in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it now 🚀
ACK ae528cb
Add tests for the following v2 handshake scenarios:
MAX_GARBAGE_LEN
bytes garbage is sentAll these tests require a modified v2 P2P class (different from
EncryptedP2PState
used inv2_p2p.py
) to implement our custom handshake behaviour based on different scenarios and have been kept in a single test file (test/functional/p2p_v2_misbehaving.py
). Shifted the test intest/functional/p2p_v2_earlykeyresponse.py
which is of the same pattern to this file too.