Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better monitoring of Fatman's condition #77

Open
iszulcdeepsense opened this issue Nov 4, 2022 · 1 comment
Open

Better monitoring of Fatman's condition #77

iszulcdeepsense opened this issue Nov 4, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@iszulcdeepsense
Copy link
Collaborator

Let's improve the way of reporting current Fatman's status on Dashboard by extending it with information such as:

  • memory usage status - eg. warning when you're running out of memory.
  • how many times the container has been crashed / restarted / OOM-killed.

This will give a better insight about what's happening with the workload.

Some deployers other than Kubernetes won't have access to these extra data. Therefore it should be achieved in a general way, taking advantage of plugins system.
The plugin should just report status (Green, Yellow or Red) with an explanation field, describing the reason of malfunction.

@JosefAssadERST
Copy link
Member

Good scope. I suspect it'll be a bit finicky to design right, because we can't predict what kind of deployment targets will be implemented. So maybe a common framework is the way to do it?

Agree on the red/yellow/green. Maybe to begin with only red and green, since yellow becomes a matter of definition and can be subjective?

Maybe it's something like this:

  • RT supports displaying the fatman status
  • The job type is responsible for giving RT the functionality to know whether a given fatman is red or green. So for example, if we deployed a k8s fatman, then the k8s job type plugin has a method e.g. check_fatman_status(fatman_id) which RT can execute, and it maybe returns red/green along with a hint?

This way it's RT that has the responsibility and the job type plugin which has the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants