Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add monitoring plugin interface #2830

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

cademirch
Copy link
Contributor

@cademirch cademirch commented Apr 20, 2024

Description

@johanneskoester, just wanted to start the discussion on adding the monitoring plugin interface.

I've implemented a monitoring plugin interface in this repo. In this PR I've replaced the --log-service and --wms CLI options with --monitoring-provider, as well as wired up the use of the monitoring provider.

I designed the MonitoringProvider base class around using the log_handler, as this is what logging.WMSLogger and logging.SlackLogger used. New monitoring plugins will just need to define a log_handler and will be good to go.

In order to test this, I reimplemented the WMSLogger into a monitoring plugin in this repo.

Off the top of my head, things left to do are:

  • Write tests and docs
  • Publish the interface to PyPi
  • Implement SlackLogger as a plugin and publish
  • Publish WMSLogger plugin
  • Remove WMSLogger and SlackLogger from logging.py

Shouldn't take long to do all of the above, just want to make sure this is the way to go first.. I'll keep this as a draft until we decide on the way forward.

Extra:
It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

# workflow.py
def _build_dag(self):
        logger.info("Building DAG of jobs...")
        async_run(self.dag.init())
        async_run(self.dag.update_checkpoint_dependencies())
+      logger.debug_dag(self.dag.rule_dot()) # Could also define new func for this instead of debug_dag

Doing this in _build_dag would allow the monitor to update its DAG visualization if the workflow has checkpoints. Of course this assumes we use the log_handler for the monitoring plugin interface. I can't think of any issues logging the rulegraph in _build_dag could cause, but feedback here would be helpful.

@cademirch cademirch changed the title Feat: add monitoring plugin interface feat: add monitoring plugin interface Apr 20, 2024
@cademirch
Copy link
Contributor Author

Hi @johanneskoester making this a full PR so you can review.

@cademirch cademirch marked this pull request as ready for review April 25, 2024 21:14
@johanneskoester
Copy link
Contributor

It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

Yes, good idea! Should be a separate method then, instead of debug_dag I think.

@cademirch
Copy link
Contributor Author

It would be nice to pass the rulegraph to the monitor, so that it can visualize the workflow's DAG. Perhaps something like this:

Yes, good idea! Should be a separate method then, instead of debug_dag I think.

Great, I agree. What do you think about the monitoring interface? If this is the approach we take, I'm happy to start publishing the interface and plugins to pypi and write some tests for snakemake.

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, only a few things.

"Snakemake will notify the service on errors and completed execution."
"Currently slack and workflow management system (wms) are supported.",
"--monitoring-provider",
help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake_monitoring_<name>",
help="Specify a custom monitoring provider, available via an monitoring plugin: snakemake-monitoring-plugin-<name>",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, why not just snakemake-monitor-plugin-... (without the ing)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure tbh - I think b/c I had "monitoring-provider" stuck in my head. I like snakemake-monitor-plugin better, I'll make that change.

@jjjermiah
Copy link
Contributor

nice work @cademirch glad to see this going. If theres a new logging call it might be useful to consider specific logs that might provide more/structured info regarding mapping to be ingested by the plugin

@cademirch
Copy link
Contributor Author

nice work @cademirch glad to see this going. If theres a new logging call it might be useful to consider specific logs that might provide more/structured info regarding mapping to be ingested by the plugin

Thanks. Yes, I agree. I think working out the suggestion in #2827 to break out the show_failed_logs into its own method would be a good place to start, as it would be nice to offer failed logs to the monitor plugin

@johanneskoester
Copy link
Contributor

One further thought: couldn't we instead call the interface snakemake-interface-logger-plugins, and unify monitoring and logging? Isn't it in the end just the question where the log messages are directed to? For the cli, I would then imagine that people can specify any number of loggers based on what plugins are installed, e.g. snakemake --logger json slack would run snakemake with snakemake-logger-plugin-json and snakemake-logger-plugin-slack, whereas snakemake --logger text wms-monitor would use the term logger (which will stay built into snakemake analogously to the local executor) and snakemake-logger-plugin-wms-monitor. Importantly, loggers that write to the terminal should be mutually exclusive, i.e. Snakemake should give an error if e.g. snakemake --logger text json is executed.

@jjjermiah
Copy link
Contributor

that sounds about right, this is directly related to the logging itself. It might help to think about how the logs might be used.

I for one, was first interested in structuring logs to be posted directly to the Google Cloud Logging API and use that as a "database" where I can build a monitor that queries logs using filters on the fields that the login plugin defines.

  • I've tested this out using the log-handler-script argument to circumvent the issue regarding the google-batch plugin's logging issues.

Taking a step back it becomes clear that this is resembles more of a pub/sub approach which opens the door for using an actual pub/sub architecture.

This might be complicating things more than it needs to be, but it also allows for the remote jobs that are being executed to post their logs (instead of having to i.e do some kubectl logs snakejob-...... as the only other way to get a job's logs

@cademirch
Copy link
Contributor Author

@johanneskoester I think unifying monitoring and logging is a great idea. Just to confirm though, the interface design would stay the same? Just the name would change? Now would be a good time to change the design.

If we go this direction, perhaps we should compile any changes to logging.py we would want to see and implement them here.

I'm currently traveling so I won't be able to make any code progress, but am happy to keep discussing!

@cademirch
Copy link
Contributor Author

cademirch commented May 3, 2024

@johanneskoester would you be open to changes to the logger class in this PR? To be honest, if we want to do logging plugins, I think we should consider a pretty big refactor of how logging is handled. Currently the use of the log_handlers list and class wrapping the logger is pretty non standard and would limit potential logging plugins imo. If we were to use the logging addHandler method, we could design logging plugins to be subclasses of logging.Handler. Also it would be nice to use the built in logging.Filter to handle filtering of logs at the handler level, that way plugins could have access to all logs. Overall, the Logger class is doing too many things and could benefit from a refactor to make it more modular, especially if we want to move forward with log/monitor plugins. Here is an example of how we could use logging.dictConfig to specify different handlers, filters, for matters, etc.

As an example, currently passing a failed jobs logs to a logging plugin would be difficult because show_logs is too deeply nested as #2827 pointed out. Ideally it should be easy for future plugins to have access to all logs.

Finally, we could consider using a different logging package like structlog. While this would take a bit of work and add a dependency, I think it is worth thinking about as it is a bit more modern than pythons built in logging and would possibly make refactoring a bit easier.

I'm happy to flesh out this refactoring some more, but want to make sure that you're open to it @johanneskoester.

Copy link

sonarcloud bot commented May 3, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@johanneskoester johanneskoester self-assigned this May 4, 2024
@johanneskoester
Copy link
Contributor

johanneskoester commented May 4, 2024

@cademirch yes absolutely. The logging code of Snakemake has never seen a proper design and has only been extended ad hoc when needed. Structlog looks great and I am completely open for a rewrite of all these parts. Looking forward to your changes!

Once we have logging plugins, people can also experiment with different ways of presenting the info on the terminal (e.g. in a more dynamic way for example, also including progress bars and so on).

@cademirch
Copy link
Contributor Author

cademirch commented May 4, 2024

Sounds good. I'll leave this PR where its at for now to revisit down the line and open a new one for logging refactoring once. One question I do have: how much would you like to retain the current stdout logging formatting? Would you be opposed to pretty-printed json logs to stdout? This can be decided later, and I can work up some demos eventually, just curious what you might have in mind.

And I agree re presenting info on the terminal... I have some ideas already :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants