Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: network connection tracepoints #25832

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

0xB10C
Copy link
Contributor

@0xB10C 0xB10C commented Aug 12, 2022

This adds five new tracepoints with documentation and tests for network connections:

  • established connections with net:inbound_connection and net:outbound_connection
  • closed connections (both closed by us or by the peer) with net:closed_connnection
  • inbound connections that we choose to evict with net:evicted_inbound_connection
  • connections that are misbehaving and punished with net:misbehaving_connection

I've been using these tracepoints for a few months now to monitor connection lifetimes, re-connection frequency by IP and netgroup, misbehavior, peer discouragement, and eviction and more. Together with the two existing P2P message tracepoints they allow for a good overview of local P2P network activity. Also sort-of addresses #22006 (comment).

I've been back and forth on which arguments to include. For example, net:evicted_connection could also include some of the eviction metrics like e.g. last_block_time, min_ping_time, ... but I've left them out for now. If wanted, this can be added here or in a follow-up. I've tried to minimize a potential performance impact by measuring executed instructions with gdb where possible (method described here). I don't think a few hundred extra instructions are too crucial, as connection opens/closes aren't too frequent (compared to e.g. P2P messages). Note: e.g. CreateNodeFromAcceptedSocket() usually executes between 80k and 90k instructions for each new inbound connection.

tracepoint instructions
net:inbound_connection 390 ins
net:outbound_connection between 700 and 1000 ins
net:closed_connnection 473 ins
net:evicted_inbound_connection not measured; likely similar to net:closed_connnection
net:misbehaving_connection not measured

Also added a bpftrace (tested with v0.14.1) log_p2p_connections.bt example script that produces output similar to:

Attaching 6 probes...
Logging opened, closed, misbehaving, and evicted P2P connections
OUTBOUND conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, total_out=1
INBOUND conn from 127.0.0.1:45324: id=1, type=inbound, network=0, total_in=1
MISBEHAVING conn id=1, score_before=0, score_increase=20, message='getdata message size = 50001', threshold_exceeded=false
CLOSED conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, established=1231006505
EVICTED conn to 127.0.0.1:45324: id=1, type=inbound, network=0, established=1612312312

@jb55
Copy link
Contributor

jb55 commented Aug 12, 2022 via email

kouloumos added a commit to kouloumos/bitcoin that referenced this pull request Aug 17, 2022
@russeree
Copy link
Contributor

russeree commented Aug 18, 2022

Just reporting in that everything is functional. Console log looks correct.

Tests Passed
image

0xB10C added a commit to 0xB10C/peer-observer that referenced this pull request Sep 12, 2022
0xB10C added a commit to 0xB10C/peer-observer that referenced this pull request Sep 12, 2022
0xB10C added a commit to 0xB10C/peer-observer that referenced this pull request Sep 12, 2022
0xB10C added a commit to 0xB10C/peer-observer that referenced this pull request Sep 12, 2022
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 13, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK virtu
Concept ACK dergoegge
Stale ACK jb55

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@0xB10C
Copy link
Contributor Author

0xB10C commented Nov 16, 2022

Note to self: might be worth checking if we can also pass AS information and not only the net_group.

@0xB10C
Copy link
Contributor Author

0xB10C commented Jan 20, 2023

Note to self: might be worth checking if we can also pass AS information and not only the net_group.

This is possible by accessing the CNodeStats, but don't think we should do that in the tracepoint. We have the IP and can lookup the AS in a tracing scirpt, if necessary.


Fixed up the linter complains and rebased. This is ready for further review.

@dergoegge
Copy link
Member

Concept ACK

@0xB10C
Copy link
Contributor Author

0xB10C commented Mar 21, 2023

The linter started complaining about "E275 missing whitespace after keyword". Fixed this up in the individual commits.

ac438b1 -> 6de4fc7 (2022-05-connection-tracepoints.pr.1 -> 2022-05-connection-tracepoints.pr.2; compare)

src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
@virtu
Copy link
Contributor

virtu commented Mar 24, 2023

ACK 6de4fc7

Code reviewed tracepoints, tests, and demo script. Tracepoint placement looks good, so do tests. Successfully ran functional tests and the demo script.

Some points:

  • Any reason why only inbound_connection and outbound_connection explicitly cast all arguments in the demo script?
  • I think some of the format specifiers in the demo script are lightly off, leading to outputs with negative netgroup values on my system. I believe it should be %lld for int64 and %llu for uint64.
  • As brought up previously, it would be nice to switch from nKeyedNetGroup to netgroup ids that are consistent over time

@DrahtBot DrahtBot requested a review from jb55 March 24, 2023 13:44
@jb55
Copy link
Contributor

jb55 commented Oct 19, 2023

crACK af20707

(although I noticed the mispelling is still there)

0xB10C and others added 5 commits January 18, 2024 09:19
refactor: The result of the check if the discuragement threshold is
exceeded with this call to Misbehaving() is stored in a boolean
called `discourage_threshold_exceeded`. This allows using it in the
tracepoint.
@0xB10C 0xB10C force-pushed the 2022-05-connection-tracepoints branch from af20707 to 996029b Compare January 18, 2024 09:59
@0xB10C
Copy link
Contributor Author

0xB10C commented Jan 18, 2024

Hm the CI failure of previous releases, qt5 dev package and depends packages, DEBUG (CI job used to find silent-merge conflicts (?)) four days ago seems weird. Didn't see any conflicts when rebasing.

Anyway, rebased. Sorry for invalidating your ACK again, @jb55. Fixed the discurage -> discourage misspelling. I thought I did it in an earlier push.

@jb55
Copy link
Contributor

jb55 commented Jan 22, 2024 via email

@0xB10C
Copy link
Contributor Author

0xB10C commented Mar 7, 2024

Can we re-trigger the Cirrus CI / ASan + LSan + UBSan + integer, no depends, USDT job? The failing wallet_send.py --descriptors test looks unrelated.

@DrahtBot DrahtBot removed the CI failed label Mar 7, 2024
1. Peer ID as `int64`
2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters)
3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters)
4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`.
Copy link
Member

@laanwj laanwj Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly inconsistent to pass this parameter as int, while the other enum (Connection Type) gets passed as string. Might similarly pass GetNetworkName(...).c_str(). Especially after #26593 which would remove the call overhead in the happy path.
No strong opinion though, I think the integer value is easier to use if you're going to make a histogram or such. My personal preference is also string-less APIs. But it's less consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree what you've said. I'm leaving as is for now. If #26593 is merged and I retouch this, I'll reconsider changing this. I'll leave this as "unresolved" for now.

Copy link
Contributor

@virtu virtu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 996029b (modulo comment and this)

Successfully ran log_p2p_connections.bt over the weekend on both x86_64/clang and aarch64/gcc.

Extended functional test (interface_usdt_net.py) equally passed on both machines.

contrib/tracing/README.md Outdated Show resolved Hide resolved
A bpftrace (v0.14.1) script that logs information from the
net:*_connection tracepoints.
@0xB10C 0xB10C force-pushed the 2022-05-connection-tracepoints branch from 996029b to 8109319 Compare May 10, 2024 08:25
@virtu
Copy link
Contributor

virtu commented May 21, 2024

LGTM! ACK 8109319

@0xB10C
Copy link
Contributor Author

0xB10C commented May 21, 2024

note that #29575 removes the misbehavior score (which is exposed via the net:misbehaving_connection tracepoint). If/when #29575 is merged, the misbehaving_connection tracepoint will change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants