[MRG] Use logging.info instead of print (#6929) #6930

SahilKang · 2016-06-23T06:21:48Z

Reference Issue

Fixes #6929

What does this implement/fix? Explain your changes.

Replaces 'print' with 'logging.info' so the library will no longer print to stdout.

Any other comments?

jnothman · 2016-06-23T06:23:48Z

Are there instances that need to be fixed elsewhere in sklearn, or only in datasets?

jnothman · 2016-06-23T06:24:13Z

sklearn/datasets/twenty_newsgroups.py

-            print('Cache loading failed')
-            print(80 * '_')
-            print(e)
+            logging.info(80 * '_')


I think this would be better composed as a single multiline string.

nelson-liu · 2016-06-23T06:25:58Z

from a cursory glance at a github search for print, it seems like there are a large amount of print statements in the code, not only in datasets

SahilKang · 2016-06-23T06:32:18Z

There are a lot of instances outside of datasets so I'll start hunting those print statements as well

jnothman · 2016-06-23T06:33:42Z

I think you're not wrong, @nelson-liu, but cursory doesn't tell you a lot without eliminating verbose cases (please excuse the hacky use of gnu utils):

$ git grep -wnB10 print sklearn | tr "\n" "\t" | gsed 's/\x09--\x09/\x0a/g' | grep -v -e '>>>' -e '\.\.\.' -e verbos -e '/tests/' -e '/externals/' -e 'estimator_checks' | tr "\t" "\n" | grep '^[^ ]*:[0-9]*:'
sklearn/__init__.py:84:    print("I: Seeding RNGs with %r" % _random_seed)
sklearn/base.py:128:    """Pretty print the dictionary 'params'
sklearn/base.py:133:        The dictionary to pretty print
sklearn/cluster/spectral.py:142:                print("SVD did not converge, randomizing and trying again")
sklearn/datasets/california_housing.py:93:        print('downloading Cal. housing from %s to %s' % (DATA_URL, data_home))
sklearn/datasets/kddcup99.py:332:        print('extraction done')
sklearn/datasets/olivetti_faces.py:114:        print('downloading Olivetti faces from %s to %s'
sklearn/datasets/species_distributions.py:225:        print('Downloading species data from %s to %s' % (SAMPLES_URL,
sklearn/datasets/species_distributions.py:236:        print('Downloading coverage data from %s to %s' % (COVERAGES_URL,
sklearn/datasets/species_distributions.py:244:            print(' - converting', f)
sklearn/datasets/twenty_newsgroups.py:215:            print(80 * '_')
sklearn/datasets/twenty_newsgroups.py:216:            print('Cache loading failed')
sklearn/datasets/twenty_newsgroups.py:217:            print(80 * '_')
sklearn/datasets/twenty_newsgroups.py:218:            print(e)
sklearn/datasets/twenty_newsgroups.py:222:            print('Downloading 20news dataset. This may take a few minutes.')
sklearn/gaussian_process/gaussian_process.py:739:                    print("Optimization failed. Try increasing the ``nugget``")
sklearn/utils/_scipy_sparse_lsqr_backport.py:271:        print(' ')
sklearn/utils/_scipy_sparse_lsqr_backport.py:272:        print('LSQR            Least-squares solution of  Ax = b')
sklearn/utils/_scipy_sparse_lsqr_backport.py:277:        print(str1)
sklearn/utils/_scipy_sparse_lsqr_backport.py:278:        print(str2)
sklearn/utils/_scipy_sparse_lsqr_backport.py:279:        print(str3)
sklearn/utils/_scipy_sparse_lsqr_backport.py:280:        print(str4)
sklearn/utils/_scipy_sparse_lsqr_backport.py:332:        print(msg[0])
sklearn/utils/_scipy_sparse_lsqr_backport.py:339:        print(' ')
sklearn/utils/_scipy_sparse_lsqr_backport.py:340:        print(head1, head2)
sklearn/utils/_scipy_sparse_lsqr_backport.py:346:        print(str1, str2, str3)
sklearn/utils/_scipy_sparse_lsqr_backport.py:464:        # See if it is time to print something.
sklearn/utils/_scipy_sparse_lsqr_backport.py:488:                print(str1, str2, str3, str4)
sklearn/utils/_scipy_sparse_lsqr_backport.py:496:        print(' ')
sklearn/utils/_scipy_sparse_lsqr_backport.py:497:        print('LSQR finished')
sklearn/utils/_scipy_sparse_lsqr_backport.py:498:        print(msg[istop])
sklearn/utils/_scipy_sparse_lsqr_backport.py:499:        print(' ')
sklearn/utils/_scipy_sparse_lsqr_backport.py:504:        print(str1 + '   ' + str2)
sklearn/utils/_scipy_sparse_lsqr_backport.py:505:        print(str3 + '   ' + str4)
sklearn/utils/_scipy_sparse_lsqr_backport.py:506:        print(' ')
sklearn/utils/graph_shortest_path.pyx:456:#    print '%s(%i,%i) %i' % (level*'   ', node.index, node.val, node.rank)
sklearn/utils/graph_shortest_path.pyx:464:#    print "---------------------------------"
sklearn/utils/graph_shortest_path.pyx:468:#        print "[empty heap]"

nelson-liu · 2016-06-23T06:36:14Z

very true @jnothman , good catch / sorry for any confusion.

GaelVaroquaux · 2016-06-23T15:48:18Z

I am really not positive about using the logging module. It has a default behavior that is not at all what users may want, and changing that requires expertise.

The way we usually do it is by having an

if verbose > 0:
      print(msg)

jnothman · 2016-06-23T21:19:20Z

In these sorts of cases I'd be okay with print(..., file=sys.stderr). They are exceptional output, e.g. the first time a dataset is fetched. print to stdout is a bad idea; a verbose flag makes little sense.

amueller · 2016-10-13T16:06:48Z

How about we add a logger that by default prints to STDOUT? That should be fairly simple, right?

jnothman · 2016-10-13T21:08:53Z

Ordinarily you leave the logger config to the user. Do you suggest we
configure in sklearn.init??

On 14 October 2016 at 03:06, Andreas Mueller notifications@github.com
wrote:

How about we add a logger that by default prints to STDOUT? That should be
fairly simple, right?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6930 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz67QFfzzEsXqimpmavNGSbQnnC691ks5qzlcZgaJpZM4I8fTM
.

amueller · 2016-10-14T17:52:13Z

Some minimal config, yeah.
It would not change anything for the interactive usage, and library users would be able to configure themselves. They have to do manual configuration anyway to make use of the logging, right?

code-of-kpp · 2016-10-15T11:10:51Z

This will complicate life for people using proper logging.
This can be done for some special logger (not just module name) or set if some environment variable is set.

amueller · 2016-10-17T16:06:15Z

@podshumok can you elaborate? How would it interfere with any existing logging you're doing?
Maybe I don't understand well how the logging is usually used.
I thought there would be one singleton per unit that you are logging.
So you are saying that if we call our singleton sklearn, it would interfere with your code because you called your singleton sklearn?
You could easily reconfigure our logger to do whatever you like, and it sounds like you are doing that anyhow. We could also check if that logger already exists and is configured (I'd hope that's possible), and if so, not touch it.

amueller · 2016-10-17T16:09:47Z

Basically if there is no handler, we could register a handler that prints to stdout, I guess?
Also, arguably you shouldn't have called your logger sklearn ;)

code-of-kpp · 2016-10-19T18:10:28Z

Python standard logging is pretty configurable. One can set different handlers for different levels for different loggers.

Usually one don't want to use (log to a) logger (logging.getLogger(name)) with name other than __name__ or __package__. But it is ok to use custom names. When implementing visitor pattern (the closest thing in sklearn for this is probably pyfunc params), one may want to use third-party module name for it. But I don't think many people go this way (as it is ugly) so we shouldn't worry.

But configuring other loggers is fine (One may want to suppress errors from one module/package and redirect all messages from another one with maximum verbosity to a file and pipe all critical messages from everything else to a Sentry instance).

By looking at logging.root.handlers and logging.root.filters you can find out if default logger was not changed (these lists are empty by default). It should work in pure python and ipython/jupyter, but I can't say for other environments / IDE. Again, it doesn't protect from situation where someone configured some other child logger (like sklearn or sklearn.examples) Any child can be configured while any parent is not and vice-versa And that can be done before anything is imported at all (except logging module).

Anyway doing logging at import is not very common or "welcomed" solution. If such thing is going to happen users should be warned about it and proper instructions on disabling this thing should be provided too.

So, if any logging is going to be integrated in sklearn (which is a good thing), it is just a question of about what users should be warned:
a) "Log messages are not displayed by default anymore, if you need them do THAT"
b) "Log messages are now handled by python logging and we configure it automatically to print our log messages to stdout to obtain old behaviour. To avoid logging auto configuration, do THAT. To manually config logging do THAT. You will have to manually config logging before version XXX, as logging autoconfiguration is not standard and will be removed sooner or later (in version XXX)"

amueller · 2016-10-19T19:16:38Z

I would vote for b), but I still didn't see how this will complicate life for people already using logging.
What would be the reason for us not to auto-configure the logger that belongs to scikit-learn?

jakirkham · 2018-06-07T14:32:01Z

Bumping this PR. Having things logged by default is very useful in distributed settings.

GaelVaroquaux · 2018-06-07T18:51:47Z

Having things logged by default is very useful in distributed settings.

I lost half an hour to get the logging module to print anything last week (because of human errors on my side). I am not excited: unless work is done for the usability, it will make things much harder for the common cases.

jnothman · 2018-06-09T11:02:32Z

But logging to stdout is not exactly good behaviour for a library. Perhaps we should consider using a config setting to allow users to opt-in to logging?

…

On 8 June 2018 at 04:51, Gael Varoquaux ***@***.***> wrote: > Having things logged by default is very useful in distributed settings. I lost half an hour to get the logging module to print anything last week (because of human errors on my side). I am not excited: unless work is done for the usability, it will make things much harder for the common cases. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6930 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66DNMMt9F_PEkcfJNLNyChKCgNuSks5t6XZFgaJpZM4I8fTM> .

GaelVaroquaux · 2018-06-09T14:31:15Z

But logging to stdout is not exactly good behaviour for a library. Perhaps we should consider using a config setting to allow users to opt-in to logging?

Granted. As long as it comes to no cost to usability. That requirement involves a bit of work!

amueller · 2018-06-09T19:47:40Z

we can register our own logger and make it print to stdout by default, right? What's the issue with that? The main problem is creating a unique enough string for our logger name. But arguable no-one should have used sklearn as a logger name because it's the module name.

jnothman · 2018-06-09T22:06:25Z

I think it's hard to see all the backwards compatibility problems here, and we'll probably break someone's expectations somewhere in non-critical ways. I'd be in favour in the first instance of something like Andy suggests, or at least replacing all uses of print by some sklearn.utils function

rth · 2018-06-10T09:50:46Z

Do you know if there is a reason e.g. numpy, scipy or pandas never implemented logging? I would have though that could have been useful e.g. for scipy solvers. I couldn't find relevant discussions about it on their issue trackers but maybe I missed something.

So is matplotlib then the only large scientific python library that uses logging at present ? Does anyone know what has been their feedback about it since matplotlib/matplotlib#9313 was merged? dask.distributed also uses it, but distributed computations with a scheduler/workers is also a context where it's probably more natural.

amueller · 2019-08-05T18:46:22Z

also see #78

adrinjalali · 2024-04-19T10:55:14Z

#78 still doesn't have a consensus, and we'd need a new start for that. Closing this PR.

Use logging.info instead of print (scikit-learn#6929)

b41f81e

jnothman reviewed Jun 23, 2016
View reviewed changes

lesteve mentioned this pull request Jun 24, 2016

Library should not be printing to stdout without verbose option #6929

Closed

TomAugspurger mentioned this pull request Jun 7, 2018

Standardize logging dask/dask-ml#193

Open

rth mentioned this pull request Aug 10, 2018

warnings.filterwarnings -- please don't do this. it's bad behavior. #11792

Closed

amueller added the Needs Decision Requires decision label Aug 5, 2019

github-actions bot added the module:datasets label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:49

adrinjalali closed this Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Use logging.info instead of print (#6929) #6930

[MRG] Use logging.info instead of print (#6929) #6930

SahilKang commented Jun 23, 2016

jnothman commented Jun 23, 2016 •

edited

jnothman Jun 23, 2016

nelson-liu commented Jun 23, 2016

SahilKang commented Jun 23, 2016

jnothman commented Jun 23, 2016 •

edited

nelson-liu commented Jun 23, 2016

GaelVaroquaux commented Jun 23, 2016

jnothman commented Jun 23, 2016

amueller commented Oct 13, 2016

jnothman commented Oct 13, 2016

amueller commented Oct 14, 2016

code-of-kpp commented Oct 15, 2016 •

edited

amueller commented Oct 17, 2016

amueller commented Oct 17, 2016

code-of-kpp commented Oct 19, 2016

amueller commented Oct 19, 2016

jakirkham commented Jun 7, 2018

GaelVaroquaux commented Jun 7, 2018 via email

jnothman commented Jun 9, 2018 via email

GaelVaroquaux commented Jun 9, 2018 via email

amueller commented Jun 9, 2018

jnothman commented Jun 9, 2018 via email

rth commented Jun 10, 2018

amueller commented Aug 5, 2019

adrinjalali commented Apr 19, 2024

[MRG] Use logging.info instead of print (#6929) #6930

[MRG] Use logging.info instead of print (#6929) #6930

Conversation

SahilKang commented Jun 23, 2016

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

jnothman commented Jun 23, 2016 • edited

jnothman Jun 23, 2016

Choose a reason for hiding this comment

nelson-liu commented Jun 23, 2016

SahilKang commented Jun 23, 2016

jnothman commented Jun 23, 2016 • edited

nelson-liu commented Jun 23, 2016

GaelVaroquaux commented Jun 23, 2016

jnothman commented Jun 23, 2016

amueller commented Oct 13, 2016

jnothman commented Oct 13, 2016

amueller commented Oct 14, 2016

code-of-kpp commented Oct 15, 2016 • edited

amueller commented Oct 17, 2016

amueller commented Oct 17, 2016

code-of-kpp commented Oct 19, 2016

amueller commented Oct 19, 2016

jakirkham commented Jun 7, 2018

GaelVaroquaux commented Jun 7, 2018 via email

jnothman commented Jun 9, 2018 via email

GaelVaroquaux commented Jun 9, 2018 via email

amueller commented Jun 9, 2018

jnothman commented Jun 9, 2018 via email

rth commented Jun 10, 2018

amueller commented Aug 5, 2019

adrinjalali commented Apr 19, 2024

jnothman commented Jun 23, 2016 •

edited

jnothman commented Jun 23, 2016 •

edited

code-of-kpp commented Oct 15, 2016 •

edited