Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Use logging.info instead of print (#6929) #6930

Closed
wants to merge 1 commit into from

Conversation

SahilKang
Copy link

Reference Issue

Fixes #6929

What does this implement/fix? Explain your changes.

Replaces 'print' with 'logging.info' so the library will no longer print to stdout.

Any other comments?

@jnothman
Copy link
Member

jnothman commented Jun 23, 2016

Are there instances that need to be fixed elsewhere in sklearn, or only in datasets?

print('Cache loading failed')
print(80 * '_')
print(e)
logging.info(80 * '_')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be better composed as a single multiline string.

@nelson-liu
Copy link
Contributor

from a cursory glance at a github search for print, it seems like there are a large amount of print statements in the code, not only in datasets

@SahilKang
Copy link
Author

There are a lot of instances outside of datasets so I'll start hunting those print statements as well

@jnothman
Copy link
Member

jnothman commented Jun 23, 2016

I think you're not wrong, @nelson-liu, but cursory doesn't tell you a lot without eliminating verbose cases (please excuse the hacky use of gnu utils):

$ git grep -wnB10 print sklearn | tr "\n" "\t" | gsed 's/\x09--\x09/\x0a/g' | grep -v -e '>>>' -e '\.\.\.' -e verbos -e '/tests/' -e '/externals/' -e 'estimator_checks' | tr "\t" "\n" | grep '^[^ ]*:[0-9]*:'
sklearn/__init__.py:84:    print("I: Seeding RNGs with %r" % _random_seed)
sklearn/base.py:128:    """Pretty print the dictionary 'params'
sklearn/base.py:133:        The dictionary to pretty print
sklearn/cluster/spectral.py:142:                print("SVD did not converge, randomizing and trying again")
sklearn/datasets/california_housing.py:93:        print('downloading Cal. housing from %s to %s' % (DATA_URL, data_home))
sklearn/datasets/kddcup99.py:332:        print('extraction done')
sklearn/datasets/olivetti_faces.py:114:        print('downloading Olivetti faces from %s to %s'
sklearn/datasets/species_distributions.py:225:        print('Downloading species data from %s to %s' % (SAMPLES_URL,
sklearn/datasets/species_distributions.py:236:        print('Downloading coverage data from %s to %s' % (COVERAGES_URL,
sklearn/datasets/species_distributions.py:244:            print(' - converting', f)
sklearn/datasets/twenty_newsgroups.py:215:            print(80 * '_')
sklearn/datasets/twenty_newsgroups.py:216:            print('Cache loading failed')
sklearn/datasets/twenty_newsgroups.py:217:            print(80 * '_')
sklearn/datasets/twenty_newsgroups.py:218:            print(e)
sklearn/datasets/twenty_newsgroups.py:222:            print('Downloading 20news dataset. This may take a few minutes.')
sklearn/gaussian_process/gaussian_process.py:739:                    print("Optimization failed. Try increasing the ``nugget``")
sklearn/utils/_scipy_sparse_lsqr_backport.py:271:        print(' ')
sklearn/utils/_scipy_sparse_lsqr_backport.py:272:        print('LSQR            Least-squares solution of  Ax = b')
sklearn/utils/_scipy_sparse_lsqr_backport.py:277:        print(str1)
sklearn/utils/_scipy_sparse_lsqr_backport.py:278:        print(str2)
sklearn/utils/_scipy_sparse_lsqr_backport.py:279:        print(str3)
sklearn/utils/_scipy_sparse_lsqr_backport.py:280:        print(str4)
sklearn/utils/_scipy_sparse_lsqr_backport.py:332:        print(msg[0])
sklearn/utils/_scipy_sparse_lsqr_backport.py:339:        print(' ')
sklearn/utils/_scipy_sparse_lsqr_backport.py:340:        print(head1, head2)
sklearn/utils/_scipy_sparse_lsqr_backport.py:346:        print(str1, str2, str3)
sklearn/utils/_scipy_sparse_lsqr_backport.py:464:        # See if it is time to print something.
sklearn/utils/_scipy_sparse_lsqr_backport.py:488:                print(str1, str2, str3, str4)
sklearn/utils/_scipy_sparse_lsqr_backport.py:496:        print(' ')
sklearn/utils/_scipy_sparse_lsqr_backport.py:497:        print('LSQR finished')
sklearn/utils/_scipy_sparse_lsqr_backport.py:498:        print(msg[istop])
sklearn/utils/_scipy_sparse_lsqr_backport.py:499:        print(' ')
sklearn/utils/_scipy_sparse_lsqr_backport.py:504:        print(str1 + '   ' + str2)
sklearn/utils/_scipy_sparse_lsqr_backport.py:505:        print(str3 + '   ' + str4)
sklearn/utils/_scipy_sparse_lsqr_backport.py:506:        print(' ')
sklearn/utils/graph_shortest_path.pyx:456:#    print '%s(%i,%i) %i' % (level*'   ', node.index, node.val, node.rank)
sklearn/utils/graph_shortest_path.pyx:464:#    print "---------------------------------"
sklearn/utils/graph_shortest_path.pyx:468:#        print "[empty heap]"

@nelson-liu
Copy link
Contributor

very true @jnothman , good catch / sorry for any confusion.

@GaelVaroquaux
Copy link
Member

I am really not positive about using the logging module. It has a default behavior that is not at all what users may want, and changing that requires expertise.

The way we usually do it is by having an

if verbose > 0:
      print(msg)

@jnothman
Copy link
Member

In these sorts of cases I'd be okay with print(..., file=sys.stderr). They are exceptional output, e.g. the first time a dataset is fetched. print to stdout is a bad idea; a verbose flag makes little sense.

@amueller
Copy link
Member

How about we add a logger that by default prints to STDOUT? That should be fairly simple, right?

@jnothman
Copy link
Member

Ordinarily you leave the logger config to the user. Do you suggest we
configure in sklearn.init??

On 14 October 2016 at 03:06, Andreas Mueller notifications@github.com
wrote:

How about we add a logger that by default prints to STDOUT? That should be
fairly simple, right?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6930 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz67QFfzzEsXqimpmavNGSbQnnC691ks5qzlcZgaJpZM4I8fTM
.

@amueller
Copy link
Member

Some minimal config, yeah.
It would not change anything for the interactive usage, and library users would be able to configure themselves. They have to do manual configuration anyway to make use of the logging, right?

@code-of-kpp
Copy link

code-of-kpp commented Oct 15, 2016

This will complicate life for people using proper logging.
This can be done for some special logger (not just module name) or set if some environment variable is set.

@amueller
Copy link
Member

@podshumok can you elaborate? How would it interfere with any existing logging you're doing?
Maybe I don't understand well how the logging is usually used.
I thought there would be one singleton per unit that you are logging.
So you are saying that if we call our singleton sklearn, it would interfere with your code because you called your singleton sklearn?
You could easily reconfigure our logger to do whatever you like, and it sounds like you are doing that anyhow. We could also check if that logger already exists and is configured (I'd hope that's possible), and if so, not touch it.

@amueller
Copy link
Member

Basically if there is no handler, we could register a handler that prints to stdout, I guess?
Also, arguably you shouldn't have called your logger sklearn ;)

@code-of-kpp
Copy link

Python standard logging is pretty configurable. One can set different handlers for different levels for different loggers.

Usually one don't want to use (log to a) logger (logging.getLogger(name)) with name other than __name__ or __package__. But it is ok to use custom names. When implementing visitor pattern (the closest thing in sklearn for this is probably pyfunc params), one may want to use third-party module name for it. But I don't think many people go this way (as it is ugly) so we shouldn't worry.

But configuring other loggers is fine (One may want to suppress errors from one module/package and redirect all messages from another one with maximum verbosity to a file and pipe all critical messages from everything else to a Sentry instance).

By looking at logging.root.handlers and logging.root.filters you can find out if default logger was not changed (these lists are empty by default). It should work in pure python and ipython/jupyter, but I can't say for other environments / IDE. Again, it doesn't protect from situation where someone configured some other child logger (like sklearn or sklearn.examples) Any child can be configured while any parent is not and vice-versa And that can be done before anything is imported at all (except logging module).

Anyway doing logging at import is not very common or "welcomed" solution. If such thing is going to happen users should be warned about it and proper instructions on disabling this thing should be provided too.

So, if any logging is going to be integrated in sklearn (which is a good thing), it is just a question of about what users should be warned:
a) "Log messages are not displayed by default anymore, if you need them do THAT"
b) "Log messages are now handled by python logging and we configure it automatically to print our log messages to stdout to obtain old behaviour. To avoid logging auto configuration, do THAT. To manually config logging do THAT. You will have to manually config logging before version XXX, as logging autoconfiguration is not standard and will be removed sooner or later (in version XXX)"

@amueller
Copy link
Member

I would vote for b), but I still didn't see how this will complicate life for people already using logging.
What would be the reason for us not to auto-configure the logger that belongs to scikit-learn?

@jakirkham
Copy link
Contributor

Bumping this PR. Having things logged by default is very useful in distributed settings.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 7, 2018 via email

@jnothman
Copy link
Member

jnothman commented Jun 9, 2018 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 9, 2018 via email

@amueller
Copy link
Member

amueller commented Jun 9, 2018

we can register our own logger and make it print to stdout by default, right? What's the issue with that? The main problem is creating a unique enough string for our logger name. But arguable no-one should have used sklearn as a logger name because it's the module name.

@jnothman
Copy link
Member

jnothman commented Jun 9, 2018 via email

@rth
Copy link
Member

rth commented Jun 10, 2018

Do you know if there is a reason e.g. numpy, scipy or pandas never implemented logging? I would have though that could have been useful e.g. for scipy solvers. I couldn't find relevant discussions about it on their issue trackers but maybe I missed something.

So is matplotlib then the only large scientific python library that uses logging at present ? Does anyone know what has been their feedback about it since matplotlib/matplotlib#9313 was merged? dask.distributed also uses it, but distributed computations with a scheduler/workers is also a context where it's probably more natural.

@amueller
Copy link
Member

amueller commented Aug 5, 2019

also see #78

Base automatically changed from master to main January 22, 2021 10:49
@adrinjalali
Copy link
Member

#78 still doesn't have a consensus, and we'd need a new start for that. Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Library should not be printing to stdout without verbose option
9 participants