Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Station behavior anomaly-detection policy #1314

Open
1 of 5 tasks
yanivbh1 opened this issue Sep 13, 2023 · 3 comments
Open
1 of 5 tasks

Feature: Station behavior anomaly-detection policy #1314

yanivbh1 opened this issue Sep 13, 2023 · 3 comments
Assignees
Labels
💟 Community involvement A feature that the community is invloved with Feature Request New feature or request good first issue Good for newcomers

Comments

@yanivbh1
Copy link
Contributor

yanivbh1 commented Sep 13, 2023

Description

Hey,
In multiple scenarios, data stopped being produced/consumed to/from a Memphis station for various reasons.
A bug was found on some occasions, and in others, it was a client coding issue. Both scenarios had no crash, so clients did not write any logs. They appeared connected to Memphis, and Memphis itself did not get into an issue. Therefore, no report was made.

To overcome such a scenario and to be able to provide a higher level of observability and protection, I suggest creating a per-station ability to define a policy that will state a range of number of messages in a second that should be produced/consumed to/from a station and a difference threshold in %, meaning "if there is 50% smaller number of produced messages in a second" meaning that we have some issue and a notification should be sent.

That policy should be entirely defined by the users and per station. No pre-assumptions should be taken.

Involved components

  • GUI
  • SDKs
  • Broker
  • Notifications channels/notifications integrations

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@yanivbh1 yanivbh1 added Feature Request New feature or request good first issue Good for newcomers 💟 Community involvement A feature that the community is invloved with labels Sep 13, 2023
@itajenglish
Copy link

@yanivbh1 I think this is a great idea! I think there is even some potential to take advantage of machine learning using the historical throughput of a station to alert on in conjunction with the manually set policy. Maybe automatic anomaly detection could be a cloud feature 👀

@g41797
Copy link

g41797 commented Nov 7, 2023

Simple "ping/pong" - periodical exchange with adapter will be good enough
Adapter should run as regular client - external (not a part of multi-container)

@yanivbh1
Copy link
Contributor Author

yanivbh1 commented Nov 8, 2023

@g41797, it's not answering the challenge.
The scenario I want to tackle here is, for example: In a certain station, every 24 hours, there should be at least 100GB of produced data and 300GB of consumed data, and all of a sudden, there was only 20GB in and 50GB out.
It might be nothing, but it can also be some alert that something is not working. Btw, it arose from one of our customers.

ping/pong won't be good in such a scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💟 Community involvement A feature that the community is invloved with Feature Request New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants