Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package should not configure logging without being asked to do so #885

Closed
mitar opened this issue Nov 14, 2019 · 5 comments · Fixed by #889
Closed

Package should not configure logging without being asked to do so #885

mitar opened this issue Nov 14, 2019 · 5 comments · Fixed by #889
Assignees

Comments

@mitar
Copy link
Member

mitar commented Nov 14, 2019

I think this is a followup to #736. I think the fact that OpenML is configuring logger at all is problematic, especially because it is doing at the import openml time without any way for anyone to prevent that. Currently, there is such code segment;

logger = logging.getLogger(__name__)


def configure_logging(console_output_level: int, file_output_level: int):
    """ Sets the OpenML logger to DEBUG, with attached Stream- and FileHandler. """
    # Verbosity levels as defined (https://github.com/openml/OpenML/wiki/Client-API-Standards)
    # don't match Python values directly:
    verbosity_map = {0: logging.WARNING, 1: logging.INFO, 2: logging.DEBUG}

    openml_logger = logging.getLogger('openml')
    openml_logger.setLevel(logging.DEBUG)
    message_format = '[%(levelname)s] [%(asctime)s:%(name)s] %(message)s'
    output_formatter = logging.Formatter(message_format, datefmt='%H:%M:%S')

    console_stream = logging.StreamHandler()
    console_stream.setFormatter(output_formatter)
    console_stream.setLevel(verbosity_map[console_output_level])

    one_mb = 2**20
    log_path = os.path.join(cache_directory, 'openml_python.log')
    file_stream = logging.handlers.RotatingFileHandler(log_path, maxBytes=one_mb, backupCount=1)
    file_stream.setLevel(verbosity_map[file_output_level])
    file_stream.setFormatter(output_formatter)

    openml_logger.addHandler(console_stream)
    openml_logger.addHandler(file_stream)
    return console_stream, file_stream

And that method is always called inside __init__.

Libraries should not do that. Libraries should log things to logger, but consumers of the libraries should then configure how logging happens. For example, inside __main__ section or something like that.

The problem with the code above is that it writes to the file system just by importing a package. That can make problems on various computing platforms, especially grid computing, where one cannot just write around the shared systems.

openml_logger.setLevel(logging.DEBUG) is unnecessary. Every logger in fact starts with NOTSET logging level which is even lower (numerical 0) than DEBUG (numerical 10). But setting it might prevent somebody else to configure it before importing openml. So I suggest this is removed.

Adding handlers to the logger is generally OK, but might be unnecessary. For example, the user of the openml package might not be a person sitting in front of the interpreter, but the system. Which would configure that all logging is send to a central logging service over the network. Having to know that you also have to disable/remove some handlers added to the logger seems tricky.

I suggest this is made as a function one can call to configure, but not something which is done by default at import time. Or maybe one can enable it in the config file.

Alternatively, if you really want this behavior, I suggest this is done at the first call to OpenML API functions, and not at the import.

@mfeurer
Copy link
Collaborator

mfeurer commented Nov 18, 2019

We should indeed follow the advice given in the Python logging manual.

@PGijsbers
Copy link
Collaborator

Thanks for raising this issue. It's a valid point that we should consider not attaching any handlers (by default). I do want to enable easy logging to file/console for users. I do think it's reasonable not to do this by default. I propose we only attach handlers if the log levels are explicitly set in the configuration or if the users initializes this through a helper function. I would propose something like this in the config module:

def configure_logging(console_level: Optional[int], file_level: Optional[int], save_as_default: bool)
    """ Add logging handlers to process openml python logging, save to configuration if requested. 
    
    console_level: If not None, set a StreamHandler to print all messages of `console_level` or higher to console.
    file_level: If not None, set a RotatingFileHandler to write all messages of `console_level` or higher to file.
    save_as_default: If True, update the openml configuration file with these preferences so the handlers are configured automatically on import.
"""

As an aside:

openml_logger.setLevel(logging.DEBUG) is unnecessary

is not true with our setup. If we left the openml logger at logging.NOTSET, it would use the root loggers logging.WARNING and logging.DEBUG messages would not be handled by the file handler. E.g.:

>>> import logging
>>> logger = logging.getLogger('openml')
>>> console = logging.StreamHandler()
>>> console.setLevel(logging.DEBUG)
>>> logger.addHandler(console)
>>> logger.debug('hi')
>>> # no output.

@PGijsbers PGijsbers self-assigned this Nov 18, 2019
@mitar
Copy link
Member Author

mitar commented Nov 18, 2019

I do want to enable easy logging to file/console for users.

So by default in Python (if no logging configuration is set), anything of WARNING and above is printed out to the console. Would that satisfy your needs?

I propose we only attach handlers if the log levels are explicitly set in the configuration or if the users initializes this through a helper function. I would propose something like this in the config module:

I think this is great.

Also compare this utility function with logging.basicConfig, it might be that is good enough?

One option is also to provide a default logging configuration dict, one can pass to here. It might be then easier to modify that dict before passing it yourself, if you want. (The helper function can just apply the default one.)

@mitar
Copy link
Member Author

mitar commented Nov 18, 2019

One interesting related discussion.

@PGijsbers
Copy link
Collaborator

Would that satisfy your needs?

Generally to inform users, yes. To debug what went wrong because someone did something weird, no. This is why our default console handler was set to logging.WARNING but we're keeping logging.DEBUG info in a file.

Also compare this utility function with logging.basicConfig, it might be that is good enough?

I have three problems with this, 1. it attaches to the root logger instead of the openml logger, 2. it gives you a streamhandler or file handler, and 3. the user might have to end up reading the python logging documentation to figure out how to configure the logger in a way they want. Advanced Python users that are familiar with how Python logging works can set up their handlers to their liking anyway, the utility function would be there in particular to help less experienced users. We want anyone to be able to produce a log for us with the relevant debug information.

Thanks for the pointers on the config dict and the other related discussion. Unfortunately I don't have time this week, but I'll get back to this issue next week.

PGijsbers added a commit that referenced this issue Jan 13, 2020
* Changes proposed in #885. Don't register handlers by default.

* Delay file creation until log emit. Correctly read from config.

* Remove loading/storing log level references.

* _create_log_handlers now returns early if called a second time

* Fix type errors.

* Update changelog.

* Test remove register file log handler to see if CI works.

* Undo last change. test server ssl works agian.

* Bump scikit-learn version to 0.22

* Scikit-learn 0.22 does not install properly.

* Install scikit-learn through pip instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants