Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal for Logging #17439

Open
thomasjpfan opened this issue Jun 3, 2020 · 12 comments
Open

API Proposal for Logging #17439

thomasjpfan opened this issue Jun 3, 2020 · 12 comments
Labels
API Hard Hard level of difficulty

Comments

@thomasjpfan
Copy link
Member

thomasjpfan commented Jun 3, 2020

By default writing to a logger would not go to STDOUT without a handler. One solution to this is to attach a handler when verbose is an integer and use the logger if it is passed in.

# Do not log
func(verbose=0)

# will create a stream handler to stdout
func(verbose=1)

# will write to the logger passed in
func(verbose=logger)

Edit: See also #78.

@rth
Copy link
Member

rth commented Jun 3, 2020

The issue with this is say you have a pipeline and you want to log operations. Passing this verbose handler to each of its components will be tiresome.

Usually one desn't want logging at the estimator level but rather at the application level. i.e. enable logging handler globally with optionally a way to tune how much verbosity we want.

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Jun 3, 2020

Good point. In this case we do not need to accept verbose=logger. A advanced user can still attach a handler to the global logger. (or the logger at the module level)

I still think it is useful to have an integer for verbosity.

@adrinjalali
Copy link
Member

Alternatively we could get the default handler in each module eg. sklearn.svm , which by default inherits from the parent 'sklearn' handler, and have convenience methods such as

sklearn.setLoggingLevel(level)

and

sklearn.svm.setLoggingLevel(level) or sklearn.setLoggingLevel(module, level)

and allow users to further configure those handlers if they wish.

The verbosity messages would go to logger.debug, and we can warn the user if they set verbose>0 while not setting the log level appropriately.

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Jun 4, 2020

and allow users to further configure those handlers if they wish.

Having to configure the handler manually will always be possible by grabbing the logger with a string:

logger = logging.getLogger('sklearn.svm')
logger.setLoggingLevel(...)

My concern is how this complicates how to start logging. Right now we start logging like this:

hist = HistGradientBoostingRegressor(verbose=1)

If we only have a logging api based option:

logger = logging.getLogger('sklearn.ensemble')
logger.setLoggingLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
logger.addHandler(handler)

hist = HistGradientBoostingRegressor()

What I propose is this:

logger = logging.getLogger(__name__)

class HistGradientBoostingRegressor:
	def fit(self, X, y):
		if self.verbose > 0:
			handler = logging.StreamHandler(sys.stdout)
			logger.addHandler(handler)
			# maybe even set logging level based on verbose?

so HistGradientBoostingRegressor(verbose=1) still works.

@rth
Copy link
Member

rth commented Jun 4, 2020

What I propose is this:

yes, I think keeping a verbose parameter working would still be useful.

I think it would be useful to think of logging in terms of callbacks (#16925). This is for instance done that way in keras as far as I can tell e.g. here. This would mean that the logging logic is completely separated from the estimator.

@adrinjalali
Copy link
Member

What I'm proposing is more like:

# this adds a stdout handler as wel
sklearn.setLogLevel(logging.DEBUG) # or logging.INFO
# or
sklearn.setLogLevel(sklearn.DEBUG) # not a fan of this one

HistGradientBoostingRegressor(verbose=1)

I would rather not force having a stdout handler in the estimator, since the user may want to have logging, but not in stdout, and they should be able to do that. I'd be okay if we add the stdout handler in the estimator if there are no handlers attached to the logger, but I would still warn or info on the fact that setting verbose>0 also sets the log level of the logger in the module.

@rth rth mentioned this issue Jun 4, 2020
5 tasks
@rth
Copy link
Member

rth commented Jun 4, 2020

We already have a config mechanism, so it could be,

# globally
from sklearn._callbacks import LoggingCallback
sklearn.set_config(logging=LoggingCallback())
# or
with sklearn.config_context(logging=LoggingCallback()):
   ...

# or locally
HistGradientBoostingRegressor(verbose=1)  # registers the callback interally

where the logging callback can accept a number of logging options, including,

  • logging level
  • logging formatter
  • output streams

I mean there is a bunch of things to configure for logging. So either we hardcode them and hide it from the user (although I'm not sure what should they be hardcoded to) or we need to allow some mechanism to be able to change them. How much of it we should expose I'm not sure.

The important part however is now this logging callback is registered on fit in each estimator, and depending on both the verbose option and its global parameters it does the logging. This way the logic is completely isolated from the estimator, and it avoids having too much things in the root namespace. People who want to customize this behavior can also write their own LoggingCallback following the documented API (which would allow us to only do the bare minimum).

@adrinjalali
Copy link
Member

My question is why would we need a LogingCallback if logging provides what we need?

@rth
Copy link
Member

rth commented Jun 5, 2020

Well you need 1 object to call in estimators for printing / logging. Logging exposes several such objects including formatter, stream handler, logger. So we would need some container around it, unless we want to have them in the root namespace, and expect users to monkeypatch/update that for customization.

It also happens that do something like progress bars or convergence plots, would need callbacks in the same parts of the code where we do logging. So it makes sense to at least try re-using some of it.

Also TBH, whatever we do with logging I'm sure there will be users who will say that it doesn't work with their use case (e.g. matplotlib/matplotlib#13264 discussion). For instance registering a logging handler in scikit-learn is apparently not OK #15122, but we have to do some of it for verbose=1 option. I find it nice that that users could just subclass the logging handler or write their own if they need to.

Also I think that just printing things to stdout in a loggingcallback and not using the logging module, might also be OK for most interactive usage e.g. https://keras.io/api/callbacks/progbar_logger/. Having one container that is not the logging module makes it easier to swap. For instance, I think unless one sets the output stream to stdout, in a jupyter notebooks logging will be actually shown in the server logs in some cases, and not in the notebook. I don't remember where I saw it, there are some models that currently do this for convergence in scikit-learn.

@rth
Copy link
Member

rth commented Jun 5, 2020

For instance, I think unless one sets the output stream to stdout, in a jupyter notebooks logging will be actually shown in the server logs

Since none of the models currently use logging, maybe this is rather a stdout vs stderr issue. I'll try to investigate.

@adrinjalali
Copy link
Member

@jeremiedbb with callback API, are we going to / can we deprecate verbose and use callbacks instead for logging? Not sure what to do with cython / C logs though.

@jeremiedbb
Copy link
Member

jeremiedbb commented Apr 17, 2024

Callbacks will be usable for logging but I don't think they'll replace verbose because verbose can be triggered at a lot more places than callback hooks are called. For instance #27526. verbose allows more fine grained verbosity. In addition it's sometimes propagated to enable joblib's verbosity where callbacks won't apply.

It doesn't change much for cython since we already need to acquire the gil for existing print statements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Hard Hard level of difficulty
Projects
None yet
Development

No branches or pull requests

4 participants