API Proposal for Logging #17439

thomasjpfan · 2020-06-03T19:09:23Z

By default writing to a logger would not go to STDOUT without a handler. One solution to this is to attach a handler when verbose is an integer and use the logger if it is passed in.

# Do not log
func(verbose=0)

# will create a stream handler to stdout
func(verbose=1)

# will write to the logger passed in
func(verbose=logger)

Edit: See also #78.

rth · 2020-06-03T19:36:56Z

The issue with this is say you have a pipeline and you want to log operations. Passing this verbose handler to each of its components will be tiresome.

Usually one desn't want logging at the estimator level but rather at the application level. i.e. enable logging handler globally with optionally a way to tune how much verbosity we want.

thomasjpfan · 2020-06-03T19:49:10Z

Good point. In this case we do not need to accept verbose=logger. A advanced user can still attach a handler to the global logger. (or the logger at the module level)

I still think it is useful to have an integer for verbosity.

adrinjalali · 2020-06-04T09:31:45Z

Alternatively we could get the default handler in each module eg. sklearn.svm , which by default inherits from the parent 'sklearn' handler, and have convenience methods such as

sklearn.setLoggingLevel(level)

and

sklearn.svm.setLoggingLevel(level) or sklearn.setLoggingLevel(module, level)

and allow users to further configure those handlers if they wish.

The verbosity messages would go to logger.debug, and we can warn the user if they set verbose>0 while not setting the log level appropriately.

thomasjpfan · 2020-06-04T13:12:02Z

and allow users to further configure those handlers if they wish.

Having to configure the handler manually will always be possible by grabbing the logger with a string:

logger = logging.getLogger('sklearn.svm')
logger.setLoggingLevel(...)

My concern is how this complicates how to start logging. Right now we start logging like this:

hist = HistGradientBoostingRegressor(verbose=1)

If we only have a logging api based option:

logger = logging.getLogger('sklearn.ensemble')
logger.setLoggingLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
logger.addHandler(handler)

hist = HistGradientBoostingRegressor()

What I propose is this:

logger = logging.getLogger(__name__)

class HistGradientBoostingRegressor:
	def fit(self, X, y):
		if self.verbose > 0:
			handler = logging.StreamHandler(sys.stdout)
			logger.addHandler(handler)
			# maybe even set logging level based on verbose?

so HistGradientBoostingRegressor(verbose=1) still works.

rth · 2020-06-04T13:46:43Z

What I propose is this:

yes, I think keeping a verbose parameter working would still be useful.

I think it would be useful to think of logging in terms of callbacks (#16925). This is for instance done that way in keras as far as I can tell e.g. here. This would mean that the logging logic is completely separated from the estimator.

adrinjalali · 2020-06-04T14:34:53Z

What I'm proposing is more like:

# this adds a stdout handler as wel
sklearn.setLogLevel(logging.DEBUG) # or logging.INFO
# or
sklearn.setLogLevel(sklearn.DEBUG) # not a fan of this one

HistGradientBoostingRegressor(verbose=1)

I would rather not force having a stdout handler in the estimator, since the user may want to have logging, but not in stdout, and they should be able to do that. I'd be okay if we add the stdout handler in the estimator if there are no handlers attached to the logger, but I would still warn or info on the fact that setting verbose>0 also sets the log level of the logger in the module.

rth · 2020-06-04T16:37:32Z

We already have a config mechanism, so it could be,

# globally
from sklearn._callbacks import LoggingCallback
sklearn.set_config(logging=LoggingCallback())
# or
with sklearn.config_context(logging=LoggingCallback()):
   ...

# or locally
HistGradientBoostingRegressor(verbose=1)  # registers the callback interally

where the logging callback can accept a number of logging options, including,

logging level
logging formatter
output streams

I mean there is a bunch of things to configure for logging. So either we hardcode them and hide it from the user (although I'm not sure what should they be hardcoded to) or we need to allow some mechanism to be able to change them. How much of it we should expose I'm not sure.

The important part however is now this logging callback is registered on fit in each estimator, and depending on both the verbose option and its global parameters it does the logging. This way the logic is completely isolated from the estimator, and it avoids having too much things in the root namespace. People who want to customize this behavior can also write their own LoggingCallback following the documented API (which would allow us to only do the bare minimum).

adrinjalali · 2020-06-05T08:44:31Z

My question is why would we need a LogingCallback if logging provides what we need?

rth · 2020-06-05T09:13:29Z

Well you need 1 object to call in estimators for printing / logging. Logging exposes several such objects including formatter, stream handler, logger. So we would need some container around it, unless we want to have them in the root namespace, and expect users to monkeypatch/update that for customization.

It also happens that do something like progress bars or convergence plots, would need callbacks in the same parts of the code where we do logging. So it makes sense to at least try re-using some of it.

Also TBH, whatever we do with logging I'm sure there will be users who will say that it doesn't work with their use case (e.g. matplotlib/matplotlib#13264 discussion). For instance registering a logging handler in scikit-learn is apparently not OK #15122, but we have to do some of it for verbose=1 option. I find it nice that that users could just subclass the logging handler or write their own if they need to.

Also I think that just printing things to stdout in a loggingcallback and not using the logging module, might also be OK for most interactive usage e.g. https://keras.io/api/callbacks/progbar_logger/. Having one container that is not the logging module makes it easier to swap. For instance, I think unless one sets the output stream to stdout, in a jupyter notebooks logging will be actually shown in the server logs in some cases, and not in the notebook. I don't remember where I saw it, there are some models that currently do this for convergence in scikit-learn.

rth · 2020-06-05T09:19:40Z

For instance, I think unless one sets the output stream to stdout, in a jupyter notebooks logging will be actually shown in the server logs

Since none of the models currently use logging, maybe this is rather a stdout vs stderr issue. I'll try to investigate.

adrinjalali · 2024-04-17T12:37:39Z

@jeremiedbb with callback API, are we going to / can we deprecate verbose and use callbacks instead for logging? Not sure what to do with cython / C logs though.

jeremiedbb · 2024-04-17T13:22:01Z

Callbacks will be usable for logging but I don't think they'll replace verbose because verbose can be triggered at a lot more places than callback hooks are called. For instance #27526. verbose allows more fine grained verbosity. In addition it's sometimes propagated to enable joblib's verbosity where callbacks won't apply.

It doesn't change much for cython since we already need to acquire the gil for existing print statements.

rth mentioned this issue Jun 4, 2020

Callbacks API #16925

Closed

5 tasks

rth mentioned this issue Jun 24, 2020

Use python logging to report on convergence progress it level info for long running tasks #78

Open

rubywerman mentioned this issue Jul 1, 2020

[WIP] Change print statements to logging #17808

Closed

adrinjalali mentioned this issue Jul 20, 2020

Display progress bar for grid search and exponentiated gradient fairlearn/fairlearn#517

Closed

thomasjpfan mentioned this issue Jul 21, 2020

Verbose debugging for spectral clustering #17969

Closed

jnothman mentioned this issue Feb 7, 2021

Elapsed step times for inference stage in Pipeline and FeatureUnion #19293

Closed

glemaitre mentioned this issue Nov 3, 2021

Add print statements to RandomSearchCV and GridSearchCV #21537

Closed

thomasjpfan added API Hard Hard level of difficulty labels Dec 10, 2021

NickleDave mentioned this issue May 22, 2023

BioCypher submission pyOpenSci/software-submission#110

Closed

30 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Proposal for Logging #17439

API Proposal for Logging #17439

thomasjpfan commented Jun 3, 2020 •

edited by lorentzenchr

rth commented Jun 3, 2020 •

edited

thomasjpfan commented Jun 3, 2020 •

edited

adrinjalali commented Jun 4, 2020

thomasjpfan commented Jun 4, 2020 •

edited

rth commented Jun 4, 2020 •

edited

adrinjalali commented Jun 4, 2020

rth commented Jun 4, 2020 •

edited

adrinjalali commented Jun 5, 2020

rth commented Jun 5, 2020 •

edited

rth commented Jun 5, 2020

adrinjalali commented Apr 17, 2024

jeremiedbb commented Apr 17, 2024 •

edited

API Proposal for Logging #17439

API Proposal for Logging #17439

Comments

thomasjpfan commented Jun 3, 2020 • edited by lorentzenchr

rth commented Jun 3, 2020 • edited

thomasjpfan commented Jun 3, 2020 • edited

adrinjalali commented Jun 4, 2020

thomasjpfan commented Jun 4, 2020 • edited

rth commented Jun 4, 2020 • edited

adrinjalali commented Jun 4, 2020

rth commented Jun 4, 2020 • edited

adrinjalali commented Jun 5, 2020

rth commented Jun 5, 2020 • edited

rth commented Jun 5, 2020

adrinjalali commented Apr 17, 2024

jeremiedbb commented Apr 17, 2024 • edited

thomasjpfan commented Jun 3, 2020 •

edited by lorentzenchr

rth commented Jun 3, 2020 •

edited

thomasjpfan commented Jun 3, 2020 •

edited

thomasjpfan commented Jun 4, 2020 •

edited

rth commented Jun 4, 2020 •

edited

rth commented Jun 4, 2020 •

edited

rth commented Jun 5, 2020 •

edited

jeremiedbb commented Apr 17, 2024 •

edited