Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] [Attack discovery] Overrides default Attack discovery timeouts #183575

Conversation

andrew-goldstein
Copy link
Contributor

@andrew-goldstein andrew-goldstein commented May 15, 2024

[Security Solution] [Attack discovery] Overrides default Attack discovery timeouts

Summary

This PR fixes an issue where Attack discovery requests may be retried when responses from the LLM take longer than two minutes.

In LangSmith, the retry looks like the following before screenshot:

Before

langsmith_before

Above: Before the fix, a retry, shown in LangSmith, for an LLM call > 2 minutes

After the fix, a single pair for runs > 2 minutes are observed in LangSmith:

After

langsmith_after

Above: After the fix, a single pair in LangSmith, for an LLM call > 2 minutes

Details

This PR overrides the following default timeouts:

  1. The attack discovery route's idleSocket socket timeout in x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts

  2. The connector timeout (also in x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts)

  3. The chain timeout in x-pack/plugins/security_solution/server/assistant/tools/attack_discovery/attack_discovery_tool.ts

with the following defaults:

const ROUTE_HANDLER_TIMEOUT = 10 * 60 * 1000; // 10 * 60 seconds = 10 minutes
const LANG_CHAIN_TIMEOUT = ROUTE_HANDLER_TIMEOUT - 10_000; // 9 minutes 50 seconds
const CONNECTOR_TIMEOUT = LANG_CHAIN_TIMEOUT - 10_000; // 9 minutes 40 seconds

Desk testing

  1. Verify there are ~ 100 open alerts in the last 24 hours in your testing environment

  2. Navigate to Security > Attack discovery

  3. Select an Azure / OpenAI connector

  4. Click Generate

Expected results

  • LangSmith displays a single pair of LLMChain and AttackDiscovery runs when the LLM responds (with the final answer) in less than 2 minutes
  • LangSmith displays a single pair of LLMChain and AttackDiscovery runs when the LLM takes longer than two minutes to respond (with the final answer), as illustrated by the before / after screenshots in the description above

…scovery timeouts

### Summary

This PR fixes an issue where Attack discovery requests may be retried when responses from the LLM take longer than two minutes.

In LangSmith, the retry looks like the following _before_ screenshot:

#### Before

![langsmith_before](https://github.com/elastic/kibana/assets/4459398/b02f016c-c260-43f3-a6cc-1260ca8d99c2)

_Above: Before the fix, a retry, shown in LangSmith, for an LLM call > 2 minutes_

After the fix, a single pair for runs > 2 minutes are observed in LangSmith:

#### After

![langsmith_after](https://github.com/elastic/kibana/assets/4459398/864ef2d4-f845-4d62-ab30-686211aadf30)

_Above: After the fix, a single pair in LangSmith, for an LLM call > 2 minutes_

### Details

This PR overrides the following default timeouts:

1) The attack discovery route's `idleSocket` socket timeout in `x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts`

2) The connector timeout (also in `x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts`)

3) The chain timeout in `x-pack/plugins/security_solution/server/assistant/tools/attack_discovery/attack_discovery_tool.ts`

with the following defaults:

```typescript
const ROUTE_HANDLER_TIMEOUT = 10 * 60 * 1000; // 10 * 60 seconds = 10 minutes
const LANG_CHAIN_TIMEOUT = ROUTE_HANDLER_TIMEOUT - 10_000; // 9 minutes 50 seconds
const CONNECTOR_TIMEOUT = LANG_CHAIN_TIMEOUT - 10_000; // 9 minutes 40 seconds
```

### Desk testing

1) Verify there are ~ 100 open alerts in the last 24 hours in your testing environment

2) Navigate to Security > Attack discovery

3) Select an Azure / OpenAI connector

4) Click Generate

**Expected results**

- LangSmith displays a single pair of `LLMChain` and `AttackDiscovery` runs when the LLM responds (with the final answer) in less than 2 minutes
- LangSmith displays a single pair of `LLMChain` and `AttackDiscovery` runs when the LLM takes longer than two minutes to respond (with the final answer), as illustrated by the `before` / `after` screenshots in the description above
@andrew-goldstein andrew-goldstein added bug Fixes for quality problems that affect the customer experience release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Security Generative AI Security Generative AI v8.14.0 v8.15.0 Feature:Attack Discovery Attack discovery uses generative AI to identify active attacks labels May 15, 2024
@andrew-goldstein andrew-goldstein self-assigned this May 15, 2024
@andrew-goldstein andrew-goldstein requested review from a team as code owners May 15, 2024 23:08
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
elasticAssistant 31 32 +1
Unknown metric groups

API count

id before after diff
elasticAssistant 45 46 +1

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @andrew-goldstein

@andrew-goldstein andrew-goldstein merged commit 1c96c31 into elastic:main May 16, 2024
52 checks passed
@andrew-goldstein andrew-goldstein deleted the increase_attack_discovery_timeout branch May 16, 2024 00:26
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request May 16, 2024
…very timeouts (elastic#183575)

## [Security Solution] [Attack discovery] Overrides default Attack discovery timeouts

### Summary

This PR fixes an issue where Attack discovery requests may be retried when responses from the LLM take longer than two minutes.

In LangSmith, the retry looks like the following _before_ screenshot:

#### Before

![langsmith_before](https://github.com/elastic/kibana/assets/4459398/b02f016c-c260-43f3-a6cc-1260ca8d99c2)

_Above: Before the fix, a retry, shown in LangSmith, for an LLM call > 2 minutes_

After the fix, a single pair for runs > 2 minutes are observed in LangSmith:

#### After

![langsmith_after](https://github.com/elastic/kibana/assets/4459398/864ef2d4-f845-4d62-ab30-686211aadf30)

_Above: After the fix, a single pair in LangSmith, for an LLM call > 2 minutes_

### Details

This PR overrides the following default timeouts:

1) The attack discovery route's `idleSocket` socket timeout in `x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts`

2) The connector timeout (also in `x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts`)

3) The chain timeout in `x-pack/plugins/security_solution/server/assistant/tools/attack_discovery/attack_discovery_tool.ts`

with the following defaults:

```typescript
const ROUTE_HANDLER_TIMEOUT = 10 * 60 * 1000; // 10 * 60 seconds = 10 minutes
const LANG_CHAIN_TIMEOUT = ROUTE_HANDLER_TIMEOUT - 10_000; // 9 minutes 50 seconds
const CONNECTOR_TIMEOUT = LANG_CHAIN_TIMEOUT - 10_000; // 9 minutes 40 seconds
```

### Desk testing

1) Verify there are ~ 100 open alerts in the last 24 hours in your testing environment

2) Navigate to Security > Attack discovery

3) Select an Azure / OpenAI connector

4) Click Generate

**Expected results**

- LangSmith displays a single pair of `LLMChain` and `AttackDiscovery` runs when the LLM responds (with the final answer) in less than 2 minutes
- LangSmith displays a single pair of `LLMChain` and `AttackDiscovery` runs when the LLM takes longer than two minutes to respond (with the final answer), as illustrated by the `before` / `after` screenshots in the description above

(cherry picked from commit 1c96c31)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.14

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request May 16, 2024
…k discovery timeouts (#183575) (#183581)

# Backport

This will backport the following commits from `main` to `8.14`:
- [[Security Solution] [Attack discovery] Overrides default Attack
discovery timeouts
(#183575)](#183575)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Andrew
Macri","email":"andrew.macri@elastic.co"},"sourceCommit":{"committedDate":"2024-05-16T00:26:20Z","message":"[Security
Solution] [Attack discovery] Overrides default Attack discovery timeouts
(#183575)\n\n## [Security Solution] [Attack discovery] Overrides default
Attack discovery timeouts\r\n\r\n### Summary\r\n\r\nThis PR fixes an
issue where Attack discovery requests may be retried when responses from
the LLM take longer than two minutes.\r\n\r\nIn LangSmith, the retry
looks like the following _before_ screenshot:\r\n\r\n####
Before\r\n\r\n![langsmith_before](https://github.com/elastic/kibana/assets/4459398/b02f016c-c260-43f3-a6cc-1260ca8d99c2)\r\n\r\n_Above:
Before the fix, a retry, shown in LangSmith, for an LLM call > 2
minutes_\r\n\r\nAfter the fix, a single pair for runs > 2 minutes are
observed in LangSmith:\r\n\r\n####
After\r\n\r\n![langsmith_after](https://github.com/elastic/kibana/assets/4459398/864ef2d4-f845-4d62-ab30-686211aadf30)\r\n\r\n_Above:
After the fix, a single pair in LangSmith, for an LLM call > 2
minutes_\r\n\r\n### Details\r\n\r\nThis PR overrides the following
default timeouts:\r\n\r\n1) The attack discovery route's `idleSocket`
socket timeout in
`x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts`\r\n\r\n2)
The connector timeout (also in
`x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts`)\r\n\r\n3)
The chain timeout in
`x-pack/plugins/security_solution/server/assistant/tools/attack_discovery/attack_discovery_tool.ts`\r\n\r\nwith
the following defaults:\r\n\r\n```typescript\r\nconst
ROUTE_HANDLER_TIMEOUT = 10 * 60 * 1000; // 10 * 60 seconds = 10
minutes\r\nconst LANG_CHAIN_TIMEOUT = ROUTE_HANDLER_TIMEOUT - 10_000; //
9 minutes 50 seconds\r\nconst CONNECTOR_TIMEOUT = LANG_CHAIN_TIMEOUT -
10_000; // 9 minutes 40 seconds\r\n```\r\n\r\n### Desk testing\r\n\r\n1)
Verify there are ~ 100 open alerts in the last 24 hours in your testing
environment\r\n\r\n2) Navigate to Security > Attack discovery\r\n\r\n3)
Select an Azure / OpenAI connector\r\n\r\n4) Click
Generate\r\n\r\n**Expected results**\r\n\r\n- LangSmith displays a
single pair of `LLMChain` and `AttackDiscovery` runs when the LLM
responds (with the final answer) in less than 2 minutes\r\n- LangSmith
displays a single pair of `LLMChain` and `AttackDiscovery` runs when the
LLM takes longer than two minutes to respond (with the final answer), as
illustrated by the `before` / `after` screenshots in the description
above","sha":"1c96c31aee5dc3b9b4f21639cd21e81c200bf76d","branchLabelMapping":{"^v8.15.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","release_note:skip","Team:
SecuritySolution","Team:Security Generative
AI","v8.14.0","v8.15.0","Feature:Attack Discovery"],"title":"[Security
Solution] [Attack discovery] Overrides default Attack discovery
timeouts","number":183575,"url":"#183575
Solution] [Attack discovery] Overrides default Attack discovery timeouts
(#183575)\n\n## [Security Solution] [Attack discovery] Overrides default
Attack discovery timeouts\r\n\r\n### Summary\r\n\r\nThis PR fixes an
issue where Attack discovery requests may be retried when responses from
the LLM take longer than two minutes.\r\n\r\nIn LangSmith, the retry
looks like the following _before_ screenshot:\r\n\r\n####
Before\r\n\r\n![langsmith_before](https://github.com/elastic/kibana/assets/4459398/b02f016c-c260-43f3-a6cc-1260ca8d99c2)\r\n\r\n_Above:
Before the fix, a retry, shown in LangSmith, for an LLM call > 2
minutes_\r\n\r\nAfter the fix, a single pair for runs > 2 minutes are
observed in LangSmith:\r\n\r\n####
After\r\n\r\n![langsmith_after](https://github.com/elastic/kibana/assets/4459398/864ef2d4-f845-4d62-ab30-686211aadf30)\r\n\r\n_Above:
After the fix, a single pair in LangSmith, for an LLM call > 2
minutes_\r\n\r\n### Details\r\n\r\nThis PR overrides the following
default timeouts:\r\n\r\n1) The attack discovery route's `idleSocket`
socket timeout in
`x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts`\r\n\r\n2)
The connector timeout (also in
`x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts`)\r\n\r\n3)
The chain timeout in
`x-pack/plugins/security_solution/server/assistant/tools/attack_discovery/attack_discovery_tool.ts`\r\n\r\nwith
the following defaults:\r\n\r\n```typescript\r\nconst
ROUTE_HANDLER_TIMEOUT = 10 * 60 * 1000; // 10 * 60 seconds = 10
minutes\r\nconst LANG_CHAIN_TIMEOUT = ROUTE_HANDLER_TIMEOUT - 10_000; //
9 minutes 50 seconds\r\nconst CONNECTOR_TIMEOUT = LANG_CHAIN_TIMEOUT -
10_000; // 9 minutes 40 seconds\r\n```\r\n\r\n### Desk testing\r\n\r\n1)
Verify there are ~ 100 open alerts in the last 24 hours in your testing
environment\r\n\r\n2) Navigate to Security > Attack discovery\r\n\r\n3)
Select an Azure / OpenAI connector\r\n\r\n4) Click
Generate\r\n\r\n**Expected results**\r\n\r\n- LangSmith displays a
single pair of `LLMChain` and `AttackDiscovery` runs when the LLM
responds (with the final answer) in less than 2 minutes\r\n- LangSmith
displays a single pair of `LLMChain` and `AttackDiscovery` runs when the
LLM takes longer than two minutes to respond (with the final answer), as
illustrated by the `before` / `after` screenshots in the description
above","sha":"1c96c31aee5dc3b9b4f21639cd21e81c200bf76d"}},"sourceBranch":"main","suggestedTargetBranches":["8.14"],"targetPullRequestStates":[{"branch":"8.14","label":"v8.14.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.15.0","branchLabelMappingKey":"^v8.15.0$","isSourceBranch":true,"state":"MERGED","url":"#183575
Solution] [Attack discovery] Overrides default Attack discovery timeouts
(#183575)\n\n## [Security Solution] [Attack discovery] Overrides default
Attack discovery timeouts\r\n\r\n### Summary\r\n\r\nThis PR fixes an
issue where Attack discovery requests may be retried when responses from
the LLM take longer than two minutes.\r\n\r\nIn LangSmith, the retry
looks like the following _before_ screenshot:\r\n\r\n####
Before\r\n\r\n![langsmith_before](https://github.com/elastic/kibana/assets/4459398/b02f016c-c260-43f3-a6cc-1260ca8d99c2)\r\n\r\n_Above:
Before the fix, a retry, shown in LangSmith, for an LLM call > 2
minutes_\r\n\r\nAfter the fix, a single pair for runs > 2 minutes are
observed in LangSmith:\r\n\r\n####
After\r\n\r\n![langsmith_after](https://github.com/elastic/kibana/assets/4459398/864ef2d4-f845-4d62-ab30-686211aadf30)\r\n\r\n_Above:
After the fix, a single pair in LangSmith, for an LLM call > 2
minutes_\r\n\r\n### Details\r\n\r\nThis PR overrides the following
default timeouts:\r\n\r\n1) The attack discovery route's `idleSocket`
socket timeout in
`x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts`\r\n\r\n2)
The connector timeout (also in
`x-pack/plugins/elastic_assistant/server/routes/attack_discovery/post_attack_discovery.ts`)\r\n\r\n3)
The chain timeout in
`x-pack/plugins/security_solution/server/assistant/tools/attack_discovery/attack_discovery_tool.ts`\r\n\r\nwith
the following defaults:\r\n\r\n```typescript\r\nconst
ROUTE_HANDLER_TIMEOUT = 10 * 60 * 1000; // 10 * 60 seconds = 10
minutes\r\nconst LANG_CHAIN_TIMEOUT = ROUTE_HANDLER_TIMEOUT - 10_000; //
9 minutes 50 seconds\r\nconst CONNECTOR_TIMEOUT = LANG_CHAIN_TIMEOUT -
10_000; // 9 minutes 40 seconds\r\n```\r\n\r\n### Desk testing\r\n\r\n1)
Verify there are ~ 100 open alerts in the last 24 hours in your testing
environment\r\n\r\n2) Navigate to Security > Attack discovery\r\n\r\n3)
Select an Azure / OpenAI connector\r\n\r\n4) Click
Generate\r\n\r\n**Expected results**\r\n\r\n- LangSmith displays a
single pair of `LLMChain` and `AttackDiscovery` runs when the LLM
responds (with the final answer) in less than 2 minutes\r\n- LangSmith
displays a single pair of `LLMChain` and `AttackDiscovery` runs when the
LLM takes longer than two minutes to respond (with the final answer), as
illustrated by the `before` / `after` screenshots in the description
above","sha":"1c96c31aee5dc3b9b4f21639cd21e81c200bf76d"}}]}] BACKPORT-->

Co-authored-by: Andrew Macri <andrew.macri@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Attack Discovery Attack discovery uses generative AI to identify active attacks release_note:skip Skip the PR/issue when compiling release notes Team:Security Generative AI Security Generative AI Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.14.0 v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants