Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obs AI Assistant] Connector documentation #181282

Closed
klacabane opened this issue Apr 22, 2024 · 17 comments
Closed

[Obs AI Assistant] Connector documentation #181282

klacabane opened this issue Apr 22, 2024 · 17 comments
Assignees
Labels
Milestone

Comments

@klacabane
Copy link
Contributor

Summary

While the connector is in tech preview and has limited capabilities we should create public documentation

@klacabane klacabane added documentation Team:Obs AI Assistant Team:obs-knowledge Observability Experience Knowledge team labels Apr 22, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-knowledge-team (Team:obs-knowledge)

@emma-raffenne emma-raffenne added this to the 8.14 milestone Apr 23, 2024
@dedemorton
Copy link
Contributor

Adding this to the observability docs project because it sounds like someone on our team should work on these docs.

@klacabane We will need more information, including links to related issues/PRs and a list of contacts, to help us get started. Thanks!

@klacabane
Copy link
Contributor Author

Hi @dedemorton,

As an overview, the connector can be attached to an alert and configured with a message that will be passed to the AI assistant. When an alert fires, the assistant will be called with an initial prompt providing contextual information about the alert (eg when it fired, the service impacted, threshold breached..), and the message provided by the user when configuring the connector.
The user message can be thought and designed as task(s) for the assistant to execute at that point, for example I'm an SRE, create a report of the alert including other active alerts relevant to the impacted service.. The assistant will execute the provided tasks and create a conversation out of it. Users can access that conversation to continue chatting with the assistant to eg help them troubleshoot the issue.
Regarding tasks that can be asked, the assistant is able to call available connectors (limited to slack/email/webhook/jira/pagerduty), so one can also ask Create a report of the alert and send it to the slack connector if a slack connector is already configured.

Some technical details:

  • the connector will be called when alert fires and when alert recovers
  • users need api:observabilityAIAssistant and app:observabilityAIAssistant privileges to use the connector
  • the conversation created by the assistant will be public and accessible to every users with permissions to the assistant
  • connector is in tech preview

Links:

You can reach out to me or @dgieselaar for more informations!

@dedemorton
Copy link
Contributor

dedemorton commented Apr 25, 2024

cc'ing @lcawl for awareness. She is working on other system action feature docs and may want to contribute to these docs.

@emma-raffenne
Copy link
Contributor

@dedemorton Do you have an update on this documentation? Is there anything needed from us?

@dedemorton
Copy link
Contributor

Is there anything needed from us?

@emma-raffenne Not right now, but I'll let you know. This issue came in too late for our docs sprint 20, but it's towards the top of my list for sprint 21, which starts today.

@dedemorton
Copy link
Contributor

Here's my preliminary plan for the documentation after playing around with the Obs AI Connector today:

  • In the Kibana Guide, create a new topic about the Observability AI Assistant connector and add it to the list of connectors.
  • In the Obs Guide, add the Observability AI Assistant connector to the list of valid connectors for all the rules documented under the container topic.
  • In the Obs Guide under Interact with the AI Assistant, add a section about using the Observability AI Assistant connector and explain why/when you might want to do that.
  • We should also list any limitations or requirements.

I ran into some flaky behavior when I was playing around with the connector. I received the slack messages and links to the conversation, but the visualizations didn't work. Eventually the messages stopped arriving, but I was also editing/deleting rules and might have broken something. Perhaps I generated too many alerts and ended up exceeding the token limit. I kept track of some questions that came up when I was testing:

  • Are there limitations on how many alerts can be analyzed by the AI Assistant?
  • Is it normal for there to be a significant delay in the time it takes for the action to execute and the message appear in slack?
  • What happens if I edit the rule after I've started running it?
  • Is it good enough for the message to say “send it to slack connector” or should you give the name of the slack connector in case there is more than one?
  • Is there any way to diagnose whether (and why) actions are failing (for example, sending messages to slack failed)? Could I have exceeded the token limit and caused the Observability AI Assistant connector to fail sending a message to slack? Is there any way to see where things failed?
  • How do I avoid exceeding the token limit? After playing around with the Observability AI Assistant connector, I tried using the “Help me understand this alert” option and got the message: “The conversation has exceeded the token limit. The maximum token limit is 32768, but the current conversation has 118700 tokens. Please start a new conversation to continue.”

@emma-raffenne
Copy link
Contributor

Thank you @dedemorton
cc @jasonrhodes for awareness about the Alerting documentation.

@jasonrhodes
Copy link
Member

cc @jasonrhodes for awareness about the Alerting documentation.

Thanks, @emma-raffenne - I've had a brief scan of this comment thread and I'm not seeing the reference to Alerting documentation. Can you point me to it?

@klacabane
Copy link
Contributor Author

Hi @dedemorton!

Are there limitations on how many alerts can be analyzed by the AI Assistant?

No but we generate a prompt that grows with the number of alerts being passed to the connector, and having several alerts being processed in the same connector execution may lead to many function calls analyzing the alerts and reach the function call limit. If that's the case we would not be able to call the connector. This behavior should be surfaced in the generated conversations, any chance you still have them stored ?

Is it normal for there to be a significant delay in the time it takes for the action to execute and the message appear in slack?

What is significant, 5minutes ? It should take around ~60 seconds if everything goes as expected but several function callings and errors may lead to additional processing time or a failure. In any case a conversation will be created and looking at this conversation would be the best way to troubleshoot any underlying issues.

What happens if I edit the rule after I've started running it?

I don't have the specifics of the rule inner workings but I expect any new alert to pick up the new settings/prompt. Did you experience bizarre behaviors when doing so ?

Is it good enough for the message to say “send it to slack connector” or should you give the name of the slack connector in case there is more than one?

The more accurate the better. The assistant is given the list of connector with their configurations (configured name, ID and any other configured properties), given a large list of ie slack connectors one should ideally provide an identifier unique enough for the assistant to make a good decision, in our case the connector name would be appropriate.

Is there any way to diagnose whether (and why) actions are failing (for example, sending messages to slack failed)? Could I have exceeded the token limit and caused the Observability AI Assistant connector to fail sending a message to slack? Is there any way to see where things failed?

Looking at the generated conversation would be the best way to track any errors that happened during the connector execution. Each function call (ie calling the connector) will appear in the conversation timeline and will have debugging informations attached to it

How do I avoid exceeding the token limit? After playing around with the Observability AI Assistant connector, I tried using the “Help me understand this alert” option and got the message: “The conversation has exceeded the token limit. The maximum token limit is 32768, but the current conversation has 118700 tokens. Please start a new conversation to continue.”

Could you provide details on your setup, how did you trigger the alert and what was the configured prompt in the connector ?

@dedemorton
Copy link
Contributor

Could you provide details on your setup, how did you trigger the alert and what was the configured prompt in the connector ?

@klacabane Unfortunately my data got blown away when the cluster was updated. I will go through the process again after I've finished the docs and want to test them.

I triggered the alert by creating a custom threshold rule that I knew would fire. The rule looked for max(system.filesystem.used.pct) over 22. (I know, pretty low...but there were only a couple of hosts at the time that were over that threshold.) It generated quite a few alerts (I think about 40) when I was playing around with things. I played around with a few different prompts, but one of them was something like:

High disk usage alert has triggered. Execute the following steps:
  - create a graph of the disk usage for the service impacted by the alert for the last 24h
  - to help troubleshoot, recall past occurrences of this alarm and any other active alerts. Generate a report with all the found information and send it to the slack connector as a single message. Also include the link to this conversation in the report

I don't think I expanded all the function calls so I might have missed something.
I'll pay more attention when I go through my final testing and take better notes.

I think we should definitely consider adding more guidance to help users construct rules and prompts that avoid causing them to run into limits...and also tell them what to do when they run into limits.

@dedemorton
Copy link
Contributor

dedemorton commented May 23, 2024

@klacabane I played around a bit with this today, and I am definitely exceeding limits. Maybe the rules I'm creating are too contrived (meant to generate alerts quickly, but perhaps generating too many alerts)? Today I tried using the Custom Threshold rule to test for max(system.filesystem.used.pct) > 80, and I am seeing messages like this in the Azure OpenAI GPT-4 connector logs:

action execution failure: .gen-ai:azure-open-ai: Azure OpenAI GPT-4 - an error occurred while running the action: Status code: 400. Message: API Error: model_error - This model's maximum context length is 32768 tokens. However, your messages resulted in 133449 tokens (132115 in the messages, 1334 in the functions). Please reduce the length of the messages or functions.; retry: true

The weird thing is that it worked beautifully the very first time I tried it out. :-/ Now that I want to take screen captures, nothing is working.

So I have a couple asks:

  1. Can you suggest a different rule (type, threshold, and AI connector configuration/message) that would trigger a reasonable number of alerts so I don't keep exceeding limits? I've been using the edge-lite-oblt test cluster, but let me know if I should use a different environment.

  2. I think newbie users will probably play around with this and may end up in the same situation. What can we tell users to help them avoid this situation?

Thanks in advance for you help.

@klacabane
Copy link
Contributor Author

klacabane commented May 23, 2024

Hi @dedemorton,

I'm not able to reproduce this issue atm and still working on it.

meant to generate alerts quickly, but perhaps generating too many alerts

The latter could be the culprit. We generate a summary and get context for every alerts that is passed to the connector. I suspect in your case a high number of alerts gets passed and as a result a large prompt is generated which would lead to reaching the token limit early in the conversation. If that's the culprit we should limit the number of alerts we summarize in the prompt but I'll need confirmation this is the root cause.

Since you're able to generate this error consistently, could you either ping me the steps you're taking and/or provide a copy of the generated conversation that leads to the token limit being reached ?

Screenshot 2024-05-23 at 11 12 20

I'm also working against edge-lite-oblt and have no issues triggering the connector successfully. Could you try with an Error count threshold rule instead of Custom threshold ?

@emma-raffenne
Copy link
Contributor

@jasonrhodes

I've had a brief scan of this comment thread and I'm not seeing the reference to Alerting documentation. Can you point me to it?

Here is the quote from Dede's comment:

In the Obs Guide, add the Observability AI Assistant connector to the list of valid connectors for all the rules documented under the container topic.

@dedemorton
Copy link
Contributor

dedemorton commented May 23, 2024

I've created a rule that does not generate a lot of alerts, and I am seeing the same problem. This rule has created a single alert in the past 30 min. There are currently only 3 active alerts total, but there are a bunch of untracked alerts.

Here's the API call for the rule:

PUT kbn:/api/alerting/rule/03820ed4-fd57-487d-894a-39e7301524fa
{
  "name": "Log threshold rule",
  "tags": [],
  "schedule": {
    "interval": "1m"
  },
  "params": {
    "criteria": [
      {
        "comparator": ">",
        "metrics": [
          {
            "name": "A",
            "filter": "log.level: (\"error\")",
            "aggType": "count"
          }
        ],
        "threshold": [
          30
        ],
        "timeSize": 1,
        "timeUnit": "m"
      }
    ],
    "alertOnNoData": true,
    "alertOnGroupDisappear": true,
    "searchConfiguration": {
      "query": {
        "query": "",
        "language": "kuery"
      },
      "index": "logs-*"
    },
    "groupBy": ""
  },
  "actions": [
    {
      "id": "system-connector-.observability-ai-assistant",
      "params": {
        "connector": "e88c7248-da89-481e-af3b-566ed06728a1",
        "message": "High error count alert has triggered. Execute the following steps:\n  - create a graph of the error count for the service impacted by the alert for the last 24h\n  - to help troubleshoot recall past occurrences of this alarm, also any other active alerts. Generate a report with all the found informations and send it to slack connector as a single message. Also include the link to this conversation in the report\n"
      },
      "uuid": "91c485d5-52ea-41c4-92ef-09f1c4d37b4d"
    }
  ]
}

Here’s the message I am seeing under Stack Management > Connectors > Logs:

action execution failure: .gen-ai:e88c7248-da89-481e-af3b-566ed06728a1: My AI Connector - an error occurred while running the action: Status code: 400. Message: API Error: model_error - This model's maximum context length is 32768 tokens. However, your messages resulted in 126567 tokens (125233 in the messages, 1334 in the functions). Please reduce the length of the messages or functions.; retry: true

Also note that there is no conversation created.

@dedemorton
Copy link
Contributor

OK, so I've tried a second round of testing using the latest 7.14.0 snapshot at staging.found.no (I wanted to create a very simple environment with limited data ingested using the System integration and Elastic Agent).

It works fine! I think the takeaway here is that we need to provide users with some guidance on how to avoid exceeding the token limit when they create their rules + messages for the AI Assistant connector...and also some steps to diagnose problems.

dedemorton added a commit that referenced this issue May 31, 2024
## Summary

Adds reference documentation about the Obs AI Assistant connector
(requested in #181282)

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue May 31, 2024
## Summary

Adds reference documentation about the Obs AI Assistant connector
(requested in elastic#181282)

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit 310f4ff)
kibanamachine added a commit that referenced this issue May 31, 2024
# Backport

This will backport the following commits from `main` to `8.14`:
- [[DOCS] Obs AI Assistant connector
(#183792)](#183792)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"DeDe
Morton","email":"dede.morton@elastic.co"},"sourceCommit":{"committedDate":"2024-05-31T18:26:12Z","message":"[DOCS]
Obs AI Assistant connector (#183792)\n\n## Summary\r\n\r\nAdds reference
documentation about the Obs AI Assistant connector\r\n(requested in
#181282)\r\n\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"310f4ff79cbe5d2ec7e699d9ffb3aefdc51da9ec","branchLabelMapping":{"^v8.15.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","docs","Feature:Actions/ConnectorTypes","Team:obs-knowledge","v8.14.0","v8.15.0"],"title":"[DOCS]
Obs AI Assistant
connector","number":183792,"url":"#183792
Obs AI Assistant connector (#183792)\n\n## Summary\r\n\r\nAdds reference
documentation about the Obs AI Assistant connector\r\n(requested in
#181282)\r\n\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"310f4ff79cbe5d2ec7e699d9ffb3aefdc51da9ec"}},"sourceBranch":"main","suggestedTargetBranches":["8.14"],"targetPullRequestStates":[{"branch":"8.14","label":"v8.14.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.15.0","branchLabelMappingKey":"^v8.15.0$","isSourceBranch":true,"state":"MERGED","url":"#183792
Obs AI Assistant connector (#183792)\n\n## Summary\r\n\r\nAdds reference
documentation about the Obs AI Assistant connector\r\n(requested in
#181282)\r\n\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"310f4ff79cbe5d2ec7e699d9ffb3aefdc51da9ec"}}]}]
BACKPORT-->

Co-authored-by: DeDe Morton <dede.morton@elastic.co>
@dedemorton
Copy link
Contributor

Closed by #183792 and elastic/observability-docs#3906

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants