New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSDOCS-9959: NetObserv Health dashboard updates #75319
base: no-1.6
Are you sure you want to change the base?
Conversation
@skrthomas: This pull request references OSDOCS-9959 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
🤖 Thu May 16 17:20:54 - Prow CI generated the docs preview: https://75319--ocpdocs-pr.netlify.app/ |
f0f4394
to
f9744a0
Compare
@skrthomas: This pull request references OSDOCS-9959 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ad763fe
to
a92c924
Compare
@skrthomas: This pull request references OSDOCS-9959 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@memodi @msherif1234 can you PTAL at this PR for docs needed for the eBPF enhancements? Mehul, I was unsure whether to tag you or Nathan. Mostly you verified the alert but I see Nathan as the QE contact for the bigger epic. Let me know if I should reassign. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just couple of questions
1a91995
to
89699eb
Compare
@skrthomas: This pull request references OSDOCS-9959 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/label qe-approved
@skrthomas: This pull request references OSDOCS-9959 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
95daee6
to
8b9fd98
Compare
@skrthomas: This pull request references OSDOCS-9959 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@skrthomas: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some minor suggestions - this looks really great! Nice job!
[id="network-observability-netobserv-dashboard-ebpf-agent-alerts_{context}"] | ||
= Using the eBPF agent alert | ||
|
||
When the Network Observability eBPF agent hashmap table is full, the eBPF agent processes flows at a degraded performance. When this is the case, an alert shows `NetObservAgentFlowsDropped`. If you see this alert, consider increasing the `cacheMaxFlows` in the `FlowCollector`, as shown in the following example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the Network Observability eBPF agent hashmap table is full, the eBPF agent processes flows at a degraded performance. When this is the case, an alert shows `NetObservAgentFlowsDropped`. If you see this alert, consider increasing the `cacheMaxFlows` in the `FlowCollector`, as shown in the following example. | |
When the Network Observability eBPF agent hashmap table is full, the eBPF agent processes flows with degraded performance. When this is the case, an alert shows `NetObservAgentFlowsDropped`. If you see this alert, consider increasing the `cacheMaxFlows` in the `FlowCollector`, as shown in the following example. |
|
||
[NOTE] | ||
==== | ||
Increasing the `cacheMaxFlows` may increase the memory usage of the eBPF agent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increasing the `cacheMaxFlows` may increase the memory usage of the eBPF agent. | |
Increasing the `cacheMaxFlows` might increase the memory usage of the eBPF agent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per IBM Style
|
||
.Procedure | ||
|
||
. In the web console, navigate to *Operators* → *Installed Operators*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been told not to use the → symbol, and instead use ->
because it can cause issues. I know ->
does get translated into the symbol, but...this is just what I was told.
|
||
. Under the *Provided APIs* heading for the *Network Observability Operator*, select *Flow Collector*. | ||
|
||
. Select *cluster* then select the *YAML* tab. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. Select *cluster* then select the *YAML* tab. | |
. Select *cluster*, and then select the *YAML* tab. |
|
||
. Select *cluster* then select the *YAML* tab. | ||
|
||
. Increase the `spec.agent.ebpf.cacheMaxFlows` value, as in the following YAML sample: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. Increase the `spec.agent.ebpf.cacheMaxFlows` value, as in the following YAML sample: | |
. Increase the `spec.agent.ebpf.cacheMaxFlows` value, as shown in the following YAML sample: |
To see eBPF metrics on the *NetObserv/Health* dashboard, you must first enable them. | ||
|
||
.Procedure | ||
. In the web console, navigate to *Operators* → *Installed Operators*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other comment about ->
|
||
. Under the *Provided APIs* heading for the *Network Observability Operator*, select *Flow Collector*. | ||
|
||
. Select *cluster* then select the *YAML* tab. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. Select *cluster* then select the *YAML* tab. | |
. Select *cluster*, and then select the *YAML* tab. |
[id="network-observability-health-alert-overview_{context}"] | ||
= Health alerts | ||
|
||
A health alert banner that directs you to the dashboard can appear on the *Network Traffic* and *Home* pages in the event that an alert is triggered. Alerts are generated in the following cases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A health alert banner that directs you to the dashboard can appear on the *Network Traffic* and *Home* pages in the event that an alert is triggered. Alerts are generated in the following cases: | |
A health alert banner that directs you to the dashboard can appear on the *Network Traffic* and *Home* pages if an alert is triggered. Alerts are generated in the following cases: |
|
||
* The `NetObservLokiError` alert occurs if the `flowlogs-pipeline` workload is dropping flows because of Loki errors, such as if the Loki ingestion rate limit has been reached. | ||
* The `NetObservNoFlows` alert occurs if no flows are ingested for a certain amount of time. | ||
* The `NetObservFlowsDropped` alert occurs if the Network Observability eBPF agent hashmap table is full, and the eBPF agent processes flows at a degraded performance, or when the capacity limiter is triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The `NetObservFlowsDropped` alert occurs if the Network Observability eBPF agent hashmap table is full, and the eBPF agent processes flows at a degraded performance, or when the capacity limiter is triggered. | |
* The `NetObservFlowsDropped` alert occurs if the Network Observability eBPF agent hashmap table is full, and the eBPF agent processes flows with degraded performance, or when the capacity limiter is triggered. |
* *Dropped flows per second* | ||
* *Flowlogs-pipeline statistics* | ||
* *Flowlogs-pipleine statistics views* | ||
** Flows per second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are labels also, I think they should all be bold? (Wasn't sure since I haven't seen this GUI)
Version(s):
Issue:
https://issues.redhat.com/browse/OSDOCS-9959
Link to docs preview:
Health dashboards: https://75319--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/network-observability-operator-monitoring.html#network-observability-health-dashboard-overview_network_observability
Health alerts: https://75319--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/network-observability-operator-monitoring.html#network-observability-health-alert-overview_network_observability
Enabling eBPF metrics: https://75319--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/network-observability-operator-monitoring#network-observability-netobserv-dashboard-ebpf-metrics-enable_network_observability
Using the eBPF agent alert: https://75319--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/network-observability-operator-monitoring.html#network-observability-netobserv-dashboard-ebpf-agent-alerts_network_observability
QE review:
Additional information: