[Metricbeat] Improve the `elasticsearch` module when used for Stack Monitoring #39058

consulthys · 2024-04-18T15:58:38Z

While investigating the root cause of indexing failures (also reported here in the past), we discovered that when using Metricbeat to feed Stack Monitoring, the elasticsearch module of Metricbeat ships elasticsearch.shard documents with concrete IDs that are made of the current cluster state (i.e., state_uuid) and some other constant data. Since the cluster state doesn't change at the same pace as Metricbeat collection rounds (10s by default), those version conflicts happen all the time.

Those version conflicts are probably a side-effect of switching to data streams in 8.0.0 (i.e. put if absent semantics with concrete ID) and weren't apparent earlier when the data was stored in simple indexes. Since each elasticsearch.shard document is about a shard placement in the cluster, the logic makes sense, i.e. there's no point re-indexing a document whose content hasn't changed since the last collection round.

However, we could/should go one step further and detect if the cluster state hasn't changed between two collection rounds. I'm naively thinking about "simply" comparing the old and new state_uuid, but it might be more involved than that. Anyway, if there's no change, there's no point in even rebuilding those documents and sending them again, since we know they'll bounce anyway, generate a version conflict and increase the indexing failure counter for no reason. In addition to that, that wastes network bandwidth and CPU/RAM resource on ES side. For big clusters with many thousands of shards, that can make a big difference.

The text was updated successfully, but these errors were encountered:

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 18, 2024

consulthys mentioned this issue Apr 20, 2024

Improve information about _stats index_failures elastic/elasticsearch#80802

Open

cmacknz added Team:Monitoring Stack Monitoring team Team:Infra Monitoring UI Infrastructure Monitoring UI team labels Apr 23, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat] Improve the `elasticsearch` module when used for Stack Monitoring #39058

[Metricbeat] Improve the `elasticsearch` module when used for Stack Monitoring #39058

consulthys commented Apr 18, 2024 •

edited

[Metricbeat] Improve the elasticsearch module when used for Stack Monitoring #39058

[Metricbeat] Improve the elasticsearch module when used for Stack Monitoring #39058

Comments

consulthys commented Apr 18, 2024 • edited

[Metricbeat] Improve the `elasticsearch` module when used for Stack Monitoring #39058

[Metricbeat] Improve the `elasticsearch` module when used for Stack Monitoring #39058

consulthys commented Apr 18, 2024 •

edited