Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSDOCS-9959: NetObserv Health dashboard updates #75319

Open
wants to merge 1 commit into
base: no-1.6
Choose a base branch
from

Conversation

@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 29, 2024

@skrthomas: This pull request references OSDOCS-9959 which is a valid jira issue.

In response to this:

Version(s):

Issue:

https://issues.redhat.com/browse/OSDOCS-9959
Link to docs preview:

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 29, 2024
@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 29, 2024
@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented Apr 29, 2024

@skrthomas skrthomas force-pushed the OSDOCS-9959 branch 2 times, most recently from f0f4394 to f9744a0 Compare May 2, 2024 20:45
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 2, 2024

@skrthomas: This pull request references OSDOCS-9959 which is a valid jira issue.

In response to this:

Version(s):

Issue:

https://issues.redhat.com/browse/OSDOCS-9959
Link to docs preview:

Using the eBPF agent alert: https://75319--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/network-observability-operator-monitoring.html#network-observability-netobserv-dashboard-ebpf-agent-alerts_network_observability

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@skrthomas skrthomas force-pushed the OSDOCS-9959 branch 5 times, most recently from ad763fe to a92c924 Compare May 3, 2024 18:27
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 3, 2024

@skrthomas: This pull request references OSDOCS-9959 which is a valid jira issue.

In response to this:

Version(s):

Issue:

https://issues.redhat.com/browse/OSDOCS-9959
Link to docs preview:

Viewing health information: https://75319--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/network-observability-operator-monitoring.html#network-observability-alert-dashboard_network_observability

Using the eBPF agent alert: https://75319--ocpdocs-pr.netlify.app/openshift-enterprise/latest/observability/network_observability/network-observability-operator-monitoring.html#network-observability-netobserv-dashboard-ebpf-agent-alerts_network_observability

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@skrthomas
Copy link
Contributor Author

@memodi @msherif1234 can you PTAL at this PR for docs needed for the eBPF enhancements? Mehul, I was unsure whether to tag you or Nathan. Mostly you verified the alert but I see Nathan as the QE contact for the bigger epic. Let me know if I should reassign.

Copy link

@memodi memodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just couple of questions

modules/network-observability-ebpf-agent-alert.adoc Outdated Show resolved Hide resolved
@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 7, 2024
@skrthomas skrthomas force-pushed the OSDOCS-9959 branch 4 times, most recently from 1a91995 to 89699eb Compare May 7, 2024 18:00
Copy link

@memodi memodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label May 13, 2024
@skrthomas skrthomas force-pushed the OSDOCS-9959 branch 2 times, most recently from 95daee6 to 8b9fd98 Compare May 15, 2024 22:10
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 15, 2024

Copy link

openshift-ci bot commented May 16, 2024

@skrthomas: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@skrthomas skrthomas added the peer-review-needed Signifies that the peer review team needs to review this PR label May 16, 2024
@kcarmichael08 kcarmichael08 added peer-review-in-progress Signifies that the peer review team is reviewing this PR and removed peer-review-needed Signifies that the peer review team needs to review this PR labels May 17, 2024
Copy link
Contributor

@kcarmichael08 kcarmichael08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some minor suggestions - this looks really great! Nice job!

[id="network-observability-netobserv-dashboard-ebpf-agent-alerts_{context}"]
= Using the eBPF agent alert

When the Network Observability eBPF agent hashmap table is full, the eBPF agent processes flows at a degraded performance. When this is the case, an alert shows `NetObservAgentFlowsDropped`. If you see this alert, consider increasing the `cacheMaxFlows` in the `FlowCollector`, as shown in the following example.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the Network Observability eBPF agent hashmap table is full, the eBPF agent processes flows at a degraded performance. When this is the case, an alert shows `NetObservAgentFlowsDropped`. If you see this alert, consider increasing the `cacheMaxFlows` in the `FlowCollector`, as shown in the following example.
When the Network Observability eBPF agent hashmap table is full, the eBPF agent processes flows with degraded performance. When this is the case, an alert shows `NetObservAgentFlowsDropped`. If you see this alert, consider increasing the `cacheMaxFlows` in the `FlowCollector`, as shown in the following example.


[NOTE]
====
Increasing the `cacheMaxFlows` may increase the memory usage of the eBPF agent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Increasing the `cacheMaxFlows` may increase the memory usage of the eBPF agent.
Increasing the `cacheMaxFlows` might increase the memory usage of the eBPF agent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per IBM Style


.Procedure

. In the web console, navigate to *Operators* → *Installed Operators*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been told not to use the → symbol, and instead use -> because it can cause issues. I know -> does get translated into the symbol, but...this is just what I was told.


. Under the *Provided APIs* heading for the *Network Observability Operator*, select *Flow Collector*.

. Select *cluster* then select the *YAML* tab.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
. Select *cluster* then select the *YAML* tab.
. Select *cluster*, and then select the *YAML* tab.


. Select *cluster* then select the *YAML* tab.

. Increase the `spec.agent.ebpf.cacheMaxFlows` value, as in the following YAML sample:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
. Increase the `spec.agent.ebpf.cacheMaxFlows` value, as in the following YAML sample:
. Increase the `spec.agent.ebpf.cacheMaxFlows` value, as shown in the following YAML sample:

To see eBPF metrics on the *NetObserv/Health* dashboard, you must first enable them.

.Procedure
. In the web console, navigate to *Operators* → *Installed Operators*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other comment about ->


. Under the *Provided APIs* heading for the *Network Observability Operator*, select *Flow Collector*.

. Select *cluster* then select the *YAML* tab.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
. Select *cluster* then select the *YAML* tab.
. Select *cluster*, and then select the *YAML* tab.

[id="network-observability-health-alert-overview_{context}"]
= Health alerts

A health alert banner that directs you to the dashboard can appear on the *Network Traffic* and *Home* pages in the event that an alert is triggered. Alerts are generated in the following cases:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A health alert banner that directs you to the dashboard can appear on the *Network Traffic* and *Home* pages in the event that an alert is triggered. Alerts are generated in the following cases:
A health alert banner that directs you to the dashboard can appear on the *Network Traffic* and *Home* pages if an alert is triggered. Alerts are generated in the following cases:


* The `NetObservLokiError` alert occurs if the `flowlogs-pipeline` workload is dropping flows because of Loki errors, such as if the Loki ingestion rate limit has been reached.
* The `NetObservNoFlows` alert occurs if no flows are ingested for a certain amount of time.
* The `NetObservFlowsDropped` alert occurs if the Network Observability eBPF agent hashmap table is full, and the eBPF agent processes flows at a degraded performance, or when the capacity limiter is triggered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The `NetObservFlowsDropped` alert occurs if the Network Observability eBPF agent hashmap table is full, and the eBPF agent processes flows at a degraded performance, or when the capacity limiter is triggered.
* The `NetObservFlowsDropped` alert occurs if the Network Observability eBPF agent hashmap table is full, and the eBPF agent processes flows with degraded performance, or when the capacity limiter is triggered.

* *Dropped flows per second*
* *Flowlogs-pipeline statistics*
* *Flowlogs-pipleine statistics views*
** Flows per second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are labels also, I think they should all be bold? (Wasn't sure since I haven't seen this GUI)

@kcarmichael08 kcarmichael08 added peer-review-done Signifies that the peer review team has reviewed this PR and removed peer-review-in-progress Signifies that the peer review team is reviewing this PR labels May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. peer-review-done Signifies that the peer review team has reviewed this PR qe-approved Signifies that QE has signed off on this PR size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants