New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: network connection tracepoints #25832
base: master
Are you sure you want to change the base?
Conversation
nice, utACK 7252c60
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Note to self: might be worth checking if we can also pass AS information and not only the net_group. |
7252c60
to
8528781
Compare
This is possible by accessing the CNodeStats, but don't think we should do that in the tracepoint. We have the IP and can lookup the AS in a tracing scirpt, if necessary. Fixed up the linter complains and rebased. This is ready for further review. |
Concept ACK |
8528781
to
ac438b1
Compare
ac438b1
to
6de4fc7
Compare
The linter started complaining about "E275 missing whitespace after keyword". Fixed this up in the individual commits. ac438b1 -> 6de4fc7 (2022-05-connection-tracepoints.pr.1 -> 2022-05-connection-tracepoints.pr.2; compare) |
ACK 6de4fc7 Code reviewed tracepoints, tests, and demo script. Tracepoint placement looks good, so do tests. Successfully ran functional tests and the demo script. Some points:
|
crACK af20707 (although I noticed the mispelling is still there) |
refactor: The result of the check if the discuragement threshold is exceeded with this call to Misbehaving() is stored in a boolean called `discourage_threshold_exceeded`. This allows using it in the tracepoint.
af20707
to
996029b
Compare
Hm the CI failure of Anyway, rebased. Sorry for invalidating your ACK again, @jb55. Fixed the discurage -> discourage misspelling. I thought I did it in an earlier push. |
On Thu, Jan 18, 2024 at 02:03:35AM -0800, 0xB10C wrote:
Hm the CI failure of `previous releases, qt5 dev package and depends
packages, DEBUG` (CI job used to find silent-merge conflicts (?)) four
days ago seems weird. Didn't see any conflicts when rebasing.
Anyway, rebased. Sorry for invalidating your ACK again, @jb55. Fixed
the discurage -> discourage misspelling. I thought I did it in an
earlier push.
sorry I haven't been able to sit down and actually test this yet. The
changes look good though!
|
Can we re-trigger the |
1. Peer ID as `int64` | ||
2. Peer address and port (IPv4, IPv6, Tor v3, I2P, ...) as `pointer to C-style String` (max. length 68 characters) | ||
3. Connection Type (inbound, feeler, outbound-full-relay, ...) as `pointer to C-style String` (max. length 20 characters) | ||
4. Network the peer connects from as `uint32` (1 = IPv4, 2 = IPv6, 3 = Onion, 4 = I2P, 5 = CJDNS). See `Network` enum in `netaddress.h`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's slightly inconsistent to pass this parameter as int, while the other enum (Connection Type) gets passed as string. Might similarly pass GetNetworkName(...).c_str()
. Especially after #26593 which would remove the call overhead in the happy path.
No strong opinion though, I think the integer value is easier to use if you're going to make a histogram or such. My personal preference is also string-less APIs. But it's less consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree what you've said. I'm leaving as is for now. If #26593 is merged and I retouch this, I'll reconsider changing this. I'll leave this as "unresolved" for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bpftrace (v0.14.1) script that logs information from the net:*_connection tracepoints.
996029b
to
8109319
Compare
LGTM! ACK 8109319 |
This adds five new tracepoints with documentation and tests for network connections:
net:inbound_connection
andnet:outbound_connection
net:closed_connnection
net:evicted_inbound_connection
net:misbehaving_connection
I've been using these tracepoints for a few months now to monitor connection lifetimes, re-connection frequency by IP and netgroup, misbehavior, peer discouragement, and eviction and more. Together with the two existing P2P message tracepoints they allow for a good overview of local P2P network activity. Also sort-of addresses #22006 (comment).
I've been back and forth on which arguments to include. For example,
net:evicted_connection
could also include some of the eviction metrics like e.g.last_block_time
,min_ping_time
, ... but I've left them out for now. If wanted, this can be added here or in a follow-up. I've tried to minimize a potential performance impact by measuring executed instructions withgdb
where possible (method described here). I don't think a few hundred extra instructions are too crucial, as connection opens/closes aren't too frequent (compared to e.g. P2P messages). Note: e.g.CreateNodeFromAcceptedSocket()
usually executes between 80k and 90k instructions for each new inbound connection.Also added a bpftrace (tested with v0.14.1)
log_p2p_connections.bt
example script that produces output similar to: