Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CLI usability around ringbuffer limits #460

Open
joestringer opened this issue Dec 16, 2020 · 1 comment
Open

Improve CLI usability around ringbuffer limits #460

joestringer opened this issue Dec 16, 2020 · 1 comment

Comments

@joestringer
Copy link
Member

joestringer commented Dec 16, 2020

Some example usage of hubble:

I want to find out if any apps are reaching out to 8.8.8.8:

# hubble observe --to-ip=8.8.8.8
requested data has been overwritten and is no longer available

I'm not sure how many flows are kept in the ringbuffer or the timeline that represents, so I tried listing the last 30m:

# hubble observe --namespace default -o json --since=30m
requested data has been overwritten and is no longer available

These seem to be both derived from the error in the hubble server side around attempting to list more flows than the current ringbuffer contents. But as a user, I don't know or necessarily care about the ringbuffer size, I just want to query these flows and get whatever information is available.

Furthermore, the error itself is pretty generic, so I know I am doing something wrong but it's unclear what I should try next. I was informed there is also --all CLI in the latest version (not yet available in Cilium containers) and I can do some analysis of hubble status to figure out how many flows are likely to be present, but this will not catch all cases and these are very complicated mitigations if I want to just try to find as much information as is available in Hubble.

If the response from the Hubble server was clearly "Here are the N flows out of M" or "From the last N minutes (since timestamp X), I found these relevant flows" then this would help to provide the context around whether the flows are likely to include the information I'm looking for or not.

@joestringer joestringer changed the title Improve usability around ringbuffer limits Improve CLI usability around ringbuffer limits Dec 16, 2020
@glibsm
Copy link
Member

glibsm commented Dec 16, 2020

These seem to be both derived from the error in the hubble server side around attempting to list more flows than the current ringbuffer contents

You are allowed to ask for more flows than the buffer contains. The problem is that in a chatty cluster (and a lockless buffer) there is not enough time for us to rewind the buffer and read the flows before the writer writes over them (thus producing this error).

There are fundamentally no problems with asking for --last 100000 or --since 100y, apart from the timing issue described above.

--all was added to the hubble CLI but you read the PR (https://github.com/cilium/hubble/pull/411/files) it requests for --last MAX_INT so that doesn't solve the timing issues.

@rolinh is the last person to work on this and is very familiar with this. Last I heard there were some attempts to create more than one read pointer, but I don't have the status of that in my head currently.

We currently only reserve one flow between the reader and the writer. So the amount of time we have to respond to the request is (1 / (#flows/s). We may want to reserve more flows between the reader and writer pointers to allow us some more time to respond to queries, but that's not guaranteed to work in all cases either.

Other solutions are welcome, but it's difficult since there is no read/write locking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants