feat: add monitoring plugin interface #2830

cademirch · 2024-04-20T04:27:21Z

Description

@johanneskoester, just wanted to start the discussion on adding the monitoring plugin interface.

I've implemented a monitoring plugin interface in this repo. In this PR I've replaced the --log-service and --wms CLI options with --monitoring-provider, as well as wired up the use of the monitoring provider.

I designed the MonitoringProvider base class around using the log_handler, as this is what logging.WMSLogger and logging.SlackLogger used. New monitoring plugins will just need to define a log_handler and will be good to go.

In order to test this, I reimplemented the WMSLogger into a monitoring plugin in this repo.

Off the top of my head, things left to do are:

Write tests and docs
Publish the interface to PyPi
Implement SlackLogger as a plugin and publish
Publish WMSLogger plugin
Remove WMSLogger and SlackLogger from logging.py

Shouldn't take long to do all of the above, just want to make sure this is the way to go first.. I'll keep this as a draft until we decide on the way forward.

Extra:
It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

# workflow.py
def _build_dag(self):
        logger.info("Building DAG of jobs...")
        async_run(self.dag.init())
        async_run(self.dag.update_checkpoint_dependencies())
+      logger.debug_dag(self.dag.rule_dot()) # Could also define new func for this instead of debug_dag

Doing this in _build_dag would allow the monitor to update its DAG visualization if the workflow has checkpoints. Of course this assumes we use the log_handler for the monitoring plugin interface. I can't think of any issues logging the rulegraph in _build_dag could cause, but feedback here would be helpful.

cademirch · 2024-04-25T21:14:24Z

Hi @johanneskoester making this a full PR so you can review.

johanneskoester · 2024-04-28T19:03:53Z

It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

Yes, good idea! Should be a separate method then, instead of debug_dag I think.

cademirch · 2024-04-28T19:06:30Z

It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

Yes, good idea! Should be a separate method then, instead of debug_dag I think.

Great, I agree. What do you think about the monitoring interface? If this is the approach we take, I'm happy to start publishing the interface and plugins to pypi and write some tests for snakemake.

johanneskoester

Awesome, only a few things.

johanneskoester · 2024-04-28T19:05:14Z

snakemake/cli.py

-        "Snakemake will notify the service on errors and completed execution."
-        "Currently slack and workflow management system (wms) are supported.",
+        "--monitoring-provider",
+        help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>",


Suggested change

help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>",

help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake-monitoring-plugin-<name>",

Thinking about it, why not just snakemake-monitor-plugin-... (without the ing)?

I'm not sure tbh - I think b/c I had "monitoring-provider" stuck in my head. I like snakemake-monitor-plugin better, I'll make that change.

jjjermiah · 2024-05-02T14:46:12Z

nice work @cademirch glad to see this going. If theres a new logging call it might be useful to consider specific logs that might provide more/structured info regarding mapping to be ingested by the plugin

cademirch · 2024-05-02T14:49:24Z

nice work @cademirch glad to see this going. If theres a new logging call it might be useful to consider specific logs that might provide more/structured info regarding mapping to be ingested by the plugin

Thanks. Yes, I agree. I think working out the suggestion in #2827 to break out the show_failed_logs into its own method would be a good place to start, as it would be nice to offer failed logs to the monitor plugin

johanneskoester · 2024-05-02T20:36:46Z

One further thought: couldn't we instead call the interface snakemake-interface-logger-plugins, and unify monitoring and logging? Isn't it in the end just the question where the log messages are directed to? For the cli, I would then imagine that people can specify any number of loggers based on what plugins are installed, e.g. snakemake --logger json slack would run snakemake with snakemake-logger-plugin-json and snakemake-logger-plugin-slack, whereas snakemake --logger text wms-monitor would use the term logger (which will stay built into snakemake analogously to the local executor) and snakemake-logger-plugin-wms-monitor. Importantly, loggers that write to the terminal should be mutually exclusive, i.e. Snakemake should give an error if e.g. snakemake --logger text json is executed.

jjjermiah · 2024-05-02T21:04:04Z

that sounds about right, this is directly related to the logging itself. It might help to think about how the logs might be used.

I for one, was first interested in structuring logs to be posted directly to the Google Cloud Logging API and use that as a "database" where I can build a monitor that queries logs using filters on the fields that the login plugin defines.

I've tested this out using the log-handler-script argument to circumvent the issue regarding the google-batch plugin's logging issues.

Taking a step back it becomes clear that this is resembles more of a pub/sub approach which opens the door for using an actual pub/sub architecture.

This might be complicating things more than it needs to be, but it also allows for the remote jobs that are being executed to post their logs (instead of having to i.e do some kubectl logs snakejob-...... as the only other way to get a job's logs

cademirch · 2024-05-03T04:40:35Z

@johanneskoester I think unifying monitoring and logging is a great idea. Just to confirm though, the interface design would stay the same? Just the name would change? Now would be a good time to change the design.

If we go this direction, perhaps we should compile any changes to logging.py we would want to see and implement them here.

I'm currently traveling so I won't be able to make any code progress, but am happy to keep discussing!

cademirch · 2024-05-03T10:28:31Z

@johanneskoester would you be open to changes to the logger class in this PR? To be honest, if we want to do logging plugins, I think we should consider a pretty big refactor of how logging is handled. Currently the use of the log_handlers list and class wrapping the logger is pretty non standard and would limit potential logging plugins imo. If we were to use the logging addHandler method, we could design logging plugins to be subclasses of logging.Handler. Also it would be nice to use the built in logging.Filter to handle filtering of logs at the handler level, that way plugins could have access to all logs. Overall, the Logger class is doing too many things and could benefit from a refactor to make it more modular, especially if we want to move forward with log/monitor plugins. Here is an example of how we could use logging.dictConfig to specify different handlers, filters, for matters, etc.

As an example, currently passing a failed jobs logs to a logging plugin would be difficult because show_logs is too deeply nested as #2827 pointed out. Ideally it should be easy for future plugins to have access to all logs.

Finally, we could consider using a different logging package like structlog. While this would take a bit of work and add a dependency, I think it is worth thinking about as it is a bit more modern than pythons built in logging and would possibly make refactoring a bit easier.

I'm happy to flesh out this refactoring some more, but want to make sure that you're open to it @johanneskoester.

sonarcloud · 2024-05-03T17:38:58Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

johanneskoester · 2024-05-04T06:40:19Z

@cademirch yes absolutely. The logging code of Snakemake has never seen a proper design and has only been extended ad hoc when needed. Structlog looks great and I am completely open for a rewrite of all these parts. Looking forward to your changes!

Once we have logging plugins, people can also experiment with different ways of presenting the info on the terminal (e.g. in a more dynamic way for example, also including progress bars and so on).

cademirch · 2024-05-04T12:30:44Z

Sounds good. I'll leave this PR where its at for now to revisit down the line and open a new one for logging refactoring once. One question I do have: how much would you like to retain the current stdout logging formatting? Would you be opposed to pretty-printed json logs to stdout? This can be decided later, and I can work up some demos eventually, just curious what you might have in mind.

And I agree re presenting info on the terminal... I have some ideas already :)

cademirch added 3 commits April 15, 2024 13:27

initial commit adding monitoring provider

7bf0c42

Merge branch 'snakemake:main' into feat-add-monitoring-interface

d447f64

remove wms cli options, add metadata to monitor settings

bb5ea94

cademirch changed the title ~~Feat: add monitoring plugin interface~~ feat: add monitoring plugin interface Apr 20, 2024

cademirch marked this pull request as ready for review April 25, 2024 21:14

cademirch requested a review from johanneskoester as a code owner April 25, 2024 21:14

cademirch added 2 commits April 25, 2024 14:19

Merge branch 'main' into feat-add-monitoring-interface

c029296

update docs

cb08b99

johanneskoester mentioned this pull request Apr 28, 2024

fix: Prevent binary log files to crash snakemake execution with show-failed-logs #2827

Merged

2 tasks

johanneskoester reviewed Apr 28, 2024

View reviewed changes

Merge branch 'main' into feat-add-monitoring-interface

3901423

Merge branch 'snakemake:main' into feat-add-monitoring-interface

61e1666

johanneskoester self-assigned this May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add monitoring plugin interface #2830

feat: add monitoring plugin interface #2830

cademirch commented Apr 20, 2024 •

edited

cademirch commented Apr 25, 2024

johanneskoester commented Apr 28, 2024

cademirch commented Apr 28, 2024

johanneskoester left a comment

johanneskoester Apr 28, 2024

johanneskoester Apr 28, 2024

cademirch Apr 28, 2024

jjjermiah commented May 2, 2024

cademirch commented May 2, 2024

johanneskoester commented May 2, 2024

jjjermiah commented May 2, 2024

cademirch commented May 3, 2024

cademirch commented May 3, 2024 •

edited

sonarcloud bot commented May 3, 2024

johanneskoester commented May 4, 2024 •

edited

cademirch commented May 4, 2024 •

edited

	help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>",
	help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake-monitoring-plugin-<name>",

feat: add monitoring plugin interface #2830

Are you sure you want to change the base?

feat: add monitoring plugin interface #2830

Conversation

cademirch commented Apr 20, 2024 • edited

Description

cademirch commented Apr 25, 2024

johanneskoester commented Apr 28, 2024

cademirch commented Apr 28, 2024

johanneskoester left a comment

Choose a reason for hiding this comment

johanneskoester Apr 28, 2024

Choose a reason for hiding this comment

johanneskoester Apr 28, 2024

Choose a reason for hiding this comment

cademirch Apr 28, 2024

Choose a reason for hiding this comment

jjjermiah commented May 2, 2024

cademirch commented May 2, 2024

johanneskoester commented May 2, 2024

jjjermiah commented May 2, 2024

cademirch commented May 3, 2024

cademirch commented May 3, 2024 • edited

sonarcloud bot commented May 3, 2024

Quality Gate passed

johanneskoester commented May 4, 2024 • edited

cademirch commented May 4, 2024 • edited

cademirch commented Apr 20, 2024 •

edited

cademirch commented May 3, 2024 •

edited

johanneskoester commented May 4, 2024 •

edited

cademirch commented May 4, 2024 •

edited