Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libbeat/management]: support filebeat inputs to report their status to elastic-agent #39209

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pkoutsovasilis
Copy link
Contributor

@pkoutsovasilis pkoutsovasilis commented Apr 25, 2024

Proposed commit message

This PR introduces the following:

  • Hierarchical StatusUnit that essentially wraps elastic-agent-client Units and calculates always the appropriate Status based on the both the State of the input but also the ones of individual streams. This hierarchy is the following:
    • When the client Unit state is anything besides Healthy this is immediately the status of the StatusUnit. This allows the existing runner allocation/deallocation logic to properly propagate any given status and highlight that something is changing, e.g. when a unit is modified the propagated status is configuring which is what we want.
    • When the client Unit state is Healthy then all statuses of stream are taken into consideration to calculate the final one. Here we care about Degraded and Failed stream statuses which we account and emit the appropriate status.
    • All active stream states are emitted in the payload of the checkin message to the agent and hopefully it can be used to augment even more the User experience.
  • Inject StatusReporter in v2.Context (used by v2.Plugins), this reporter affects only a specific stream status and thus the respective input can emit stream-wise statuses.
  • Add support of status reporting for CEL input. With this implementation I propose the following semantic meaning for statuses
    • Running: everything is happy, no error or warning produced during the operation of an input
    • Failed: when the input encountered an error that it can't continue from
    • Degraded: when the input encountered something abnormal but, due to lack of a better expression, it hasn't given up yet 😄 CEL does that a lot, it denies to say bb.
    • Configuring, Stopping, Stopped, Starting: These statuses are most suitable to be used by the input allocation/deallocation code and not directly from inside the input, as the former can and should override the status of the whole input.

Noteworthy code changes:

  • When we get a unitRemoved change from the elastic-agent-client, we don't directly remove the unit from the map of the manager but we mark it as soft deleted. Instead this is gonna be removed from the map just before we reload the runners. This is kinda necessary because if another unit change happens before the input runners are reloaded and the same input with the same stream ID is re-introduced the corresponding runner won't reload but now it won't hold an association to the StatusUnit as it was removed with the original code.
  • While I was writing an integration test to check all of the above, I noticed that the elastic-agent-client didn't pick up the change of input cfg and thus not Unit Modified change was received by the manager. Thus I did this; I will speak with the agent team and validate if this is an actual issue

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Remove custom elastic-agent-client replace in go.mod
  • Try to run this beats through an actual agent

How to test this PR locally

elastic-package stack up -vd
go test -v x-pack/filebeat/input/cel/integration/integration_test.go

Related issues

Use cases

N/A

Screenshots

N/A

Logs

=== RUN   TestCELInput
    integration_test.go:431: observed: version_info:{name:"beat-v2-client" meta:{key:"build_time" value:"0001-01-01 00:00:00 +0000 UTC"} meta:{key:"commit" value:"unknown"} build_hash:"unknown"}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 message:"Starting"} units:{id:"input-unit-1" message:"Starting" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" message:"Starting" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" message:"Starting" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:1}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" message:"Starting" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:1}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:DEGRADED message:"Some streams are Degraded" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:"failed eval: ERROR: <input>:1:30: failed to unmarshal JSON message: invalid character 'i' looking for beginning of value\n | bytes(get(state.url).Body).as(body,{\"events\":[body.decode_json()]})\n | .............................^"}} fields:{key:"state" value:{number_value:4}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:DEGRADED message:"All streams are Degraded" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:"failed eval: ERROR: <input>:1:30: failed to unmarshal JSON message: invalid character 'i' looking for beginning of value\n | bytes(get(state.url).Body).as(body,{\"events\":[body.decode_json()]})\n | .............................^"}} fields:{key:"state" value:{number_value:4}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:"failed eval: ERROR: <input>:1:30: failed to unmarshal JSON message: invalid character 'i' looking for beginning of value\n | bytes(get(state.url).Body).as(body,{\"events\":[body.decode_json()]})\n | .............................^"}} fields:{key:"state" value:{number_value:4}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:DEGRADED message:"Some streams are Degraded" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:"failed eval: ERROR: <input>:1:30: failed to unmarshal JSON message: invalid character 'i' looking for beginning of value\n | bytes(get(state.url).Body).as(body,{\"events\":[body.decode_json()]})\n | .............................^"}} fields:{key:"state" value:{number_value:4}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:DEGRADED message:"Some streams are Degraded" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:"failed eval: ERROR: <input>:1:30: failed to unmarshal JSON message: invalid character 'i' looking for beginning of value\n | bytes(get(state.url).Body).as(body,{\"events\":[body.decode_json()]})\n | .............................^"}} fields:{key:"state" value:{number_value:4}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:CONFIGURING message:"Configuring" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" message:"Starting" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:0}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:1}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"} units:{id:"input-unit-1" state:HEALTHY message:"Healthy" payload:{fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-23ed71c482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}} fields:{key:"cel-cel.cel-1e8b33de-d54a-45cd-90da-ffffffc482e2" value:{struct_value:{fields:{key:"msg" value:{string_value:""}} fields:{key:"state" value:{number_value:3}}}}}}}
    integration_test.go:431: observed: units:{id:"output-unit" type:OUTPUT config_state_idx:1 state:HEALTHY message:"Healthy"}

@pkoutsovasilis pkoutsovasilis added the discuss Issue needs further discussion. label Apr 25, 2024
@pkoutsovasilis pkoutsovasilis self-assigned this Apr 25, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 25, 2024
Copy link
Contributor

mergify bot commented Apr 25, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @pkoutsovasilis? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 118 min 16 sec

Pipeline error 1

This error is likely related to the pipeline itself. Click here
and then you will see the error (either incorrect syntax or an invalid configuration).

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@pkoutsovasilis pkoutsovasilis force-pushed the pkoutsovasilis/input_agent_health branch from 149a0c5 to c3ef49b Compare May 9, 2024 11:36
Copy link
Contributor

mergify bot commented May 9, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b pkoutsovasilis/input_agent_health upstream/pkoutsovasilis/input_agent_health
git merge upstream/main
git push upstream pkoutsovasilis/input_agent_health

@pkoutsovasilis pkoutsovasilis force-pushed the pkoutsovasilis/input_agent_health branch from c3ef49b to c6294b7 Compare May 9, 2024 11:46
@pkoutsovasilis pkoutsovasilis added the Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution label May 9, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 9, 2024
@pkoutsovasilis pkoutsovasilis changed the title [Do not merge]: WIP allow inputs to report their state to the agent [libbeat/management]: allow inputs to report their state to the agent May 9, 2024
@pkoutsovasilis pkoutsovasilis changed the title [libbeat/management]: allow inputs to report their state to the agent [libbeat/management]: support filebeat inputs to report their status to elastic-agent May 9, 2024
@pkoutsovasilis pkoutsovasilis marked this pull request as ready for review May 9, 2024 13:22
@pkoutsovasilis pkoutsovasilis requested review from a team as code owners May 9, 2024 13:22
@elasticmachine
Copy link
Collaborator

Pinging @elastic/sec-deployment-and-devices (Team:Security-Deployment and Devices)

@pkoutsovasilis
Copy link
Contributor Author

@andrewkroh @cmacknz @belimawr there are some rough edges (checking if the issue I spotted is actually an issue with elastic-agent-client, making the integration test use the integration stack and not the one I spawned from elastic-package) but the logic is pretty much there

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label May 10, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label May 10, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@ycombinator ycombinator removed the request for review from andrzej-stencel May 10, 2024 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants