Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate local IP address from BPF space #1829

Open
oazizi000 opened this issue Jan 24, 2024 · 1 comment
Open

Populate local IP address from BPF space #1829

oazizi000 opened this issue Jan 24, 2024 · 1 comment

Comments

@oazizi000
Copy link
Contributor

Is your feature request related to a problem? Please describe.
With #1808, we collect local IP addresses from eBPF space when doing server-side tracing. For consistency, we should investigate adding support for collecting the local IP address with client-side tracing. This will provide more consistency in understanding of our tables for all users.

Describe the solution you'd like
Collect and populate local IP address in any tables that contain the local IP address.

Describe alternatives you've considered
Alternative solutions could try to populate it from user-space, but BPF-based collection is preferred for performance and reliability reasons.

Additional context
For background see #1807 #1808 and #1809

@benkilimnik
Copy link
Member

benkilimnik commented Jan 25, 2024

I believe we can merge http_events with tcp_stats_events using the tcp stats connector.

Ran a quick test in PxL and this seems to do the trick.

import px

# Load http events data
df = px.DataFrame(table='http_events', start_time='-300s')
df = df['time_', 'local_port', 'local_addr', 'remote_port', 'remote_addr', 'trace_role']
df = df[df.trace_role == 1]
df = df[df.local_port == -1]

# Load tcp stats data
tcp_stats_df = px.DataFrame(table="tcp_stats_events", start_time='-300s', select=["time_","local_addr", "local_port", "remote_addr", "remote_port"])

# Group by remote_addr and remote_port and aggregate to get one local_addr per remote_addr:port tuple (drop local port column, which presumably contains several local ports).
tcp_stats_df = tcp_stats_df.groupby(['local_addr', 'remote_addr', 'remote_port']).agg()

# merge on connection tuple
# note that there may be duplicates because multiple Pods configured with hostNetwork: true may be using their respective host node's IP addresses, leading them to share the same local IP addresses as those of their nodes.
merged_df = df.merge(tcp_stats_df, how='inner', left_on=["remote_addr", "remote_port"], right_on=["remote_addr", "remote_port"], suffixes=['_http', '_tcp'])

px.display(df, "http events")
px.display(tcp_stats_df, "tcp")
px.display(merged_df, "http merged with tcp")

For this to work, the TCP stats source connector needs to enabled in stirling. Perhaps it would be worth adding it to kProd?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants