Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stdlib logging references sys.error on import #73

Open
jayvdb opened this issue Sep 5, 2019 · 6 comments
Open

Stdlib logging references sys.error on import #73

jayvdb opened this issue Sep 5, 2019 · 6 comments
Labels
investigate Need to figure this out todo Definitely needs doing for a near-term release
Milestone

Comments

@jayvdb
Copy link
Contributor

jayvdb commented Sep 5, 2019

As special case of #71 , and likely to be one of several cases, stdlib logging creates a reference to sys.error on import. As a result the following fairly common scenario of using a main() to test a cli without subprocesses fails if logging is first imported within the context:

from stdio_mgr import StdioManager

def invoke_cli_main():
    import logging
    logging.basicConfig()
    ...

def test_cli():
    with StdioManager():
        invoke_cli_main()
    import logging
    logging.warning('BOOM')
>>> test_cli()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.7/logging/__init__.py", line 1037, in emit
    stream.write(msg + self.terminator)
  File "/home/jayvdb/projects/python/stdio-mgr/src/stdio_mgr/stdio_mgr.py", line 90, in write
    super().write(*args, **kwargs)
ValueError: I/O operation on closed file.
Call stack:
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in test_cli
Message: 'BOOM'
Arguments: ()

After the context handler exits, the wrapper streams are closed, and the logging is still using them.

Easy solution: stdio_mgr imports logging, like we would need to for #43

Even the following doesnt dislodge the problem:

def invoke_cli_main():
    import logging
    logging.basicConfig()
    ...
    from logging.config import _clearExistingHandlers
    _clearExistingHandlers()

To clear it properly we need to do:

from stdio_mgr import StdioManager

def invoke_cli_main():
    import logging
    logging.basicConfig()
    ...
    from logging.config import _clearExistingHandlers
    _clearExistingHandlers()
    logging.Logger.root = logging.root = logging.RootLogger(logging.WARNING)
    logging.Logger.manager = logging.Manager(logging.Logger.root)

def test_cli():
    with StdioManager():
        invoke_cli_main()
    import logging
    logging.info('OK')

test_cli()

or the following also seems to work

def invoke_cli_main():
    import logging
    logging.basicConfig()
    ...
    from logging.config import _clearExistingHandlers
    _clearExistingHandlers()
    logging.Logger.manager._clear_cache()

We can hide that logging cleanup inside of StdioManager.
It also works when import colorama is added to that mixture, but I do not feel comfortable that it is solved.

Thankfully there doesnt seem to be too many references to sys.std* in the stdlib, so we might be able to check and workaround them all, whenever feasible and appropriate.

@bskinn bskinn added investigate Need to figure this out todo Definitely needs doing for a near-term release labels Sep 5, 2019
@bskinn bskinn added this to the Sooner milestone Sep 5, 2019
@bskinn
Copy link
Owner

bskinn commented Sep 5, 2019

Scattershot of questions:

  • What about applications that use other than the .basicConfig() logging setup? Does it still misbehave? Will this case need to be handled separately?

  • I anticipate using non-.basicConfig() logging for Debug level logging #43 -- does this change how this needs to be handled?

  • Hopefully a grep -rE 'sys[.]std(in|out|err)' *.py in the CPython repo would suffice to identify all such stdlib references to the streams?

    $ grep -rE 'sys[.]std(in|out|err)' lib/* | grep -v '/test/' | wc -l
    400
    

    To what extent do we have to worry about interference from the C portions of CPython ?

@jayvdb
Copy link
Contributor Author

jayvdb commented Sep 5, 2019

shutdown snippet for py34-37

            try:
                logging.config._clearExistingHandlers()
            except AttributeError:  # pragma Python 3.6,3.7: no cover
                logging._handlers.clear()
                logging.shutdown(logging._handlerList[:])
                del logging._handlerList[:]

            try:
                logging.Logger.manager._clear_cache()
            except AttributeError:  # pragma Python 3.7: no cover
                logging.Logger.root = logging.root = logging.RootLogger(
                    logging.WARNING)
                logging.Logger.manager = logging.Manager(logging.Logger.root)

@jayvdb
Copy link
Contributor Author

jayvdb commented Sep 6, 2019

Any import logging causes the problem, as the references to our fake streams is done during module loading.

If we import logging first those references will be to the 'real' streams, or at least the streams as they existed when stdio_mgr was imported. That is the easy fix, but only for the one specific problem.

The "400" isnt so scary as many of them are only using the streams. The problem is stored references to the streams, especially references created a module import (class creation) phase. This includes any time sys.std* are used as an argument default as those defaults are in the class object.

I havent scanned the C portion, so I dont know the size of the problem there.

Rather than finding them all, which would still only encompass the stdlib, I think we need to first focus on detecting the problem (and issuing warnings, or raising exceptions, etc).

One way that might work is for the streams pushed into sys to be reference counted, e.g. using weakref's or sys.getrefcount , and the __exit__ complain bitterly if any of those streams still have living objects referring to them.

@jayvdb
Copy link
Contributor Author

jayvdb commented Sep 11, 2019

Technically this is post-existing stream states in the case of pytest + logging because pytest doesnt import logging. But a different test runner would import logging, and a whole new slew of problems would be caused ...

@bskinn
Copy link
Owner

bskinn commented Sep 11, 2019

One way that might work is for the streams pushed into sys to be reference counted, e.g. using weakref 's or sys.getrefcount , and the __exit__ complain bitterly if any of those streams still have living objects referring to them.

If my mental model of Python name assignment is correct, I suspect there's not any way to invisibly "intercept" assignments to our mocked objects, such that we could track such assignments and then silently "re-target" the assignments once we're ready to __exit__, or whatever?

The GC has to know both ends of assignments in order to detect cycles...could we hijack that machinery somehow? [...looks...] Looks like GC.get_referrers might help, though it specifically says not to use it for production code. :-P

@bskinn bskinn mentioned this issue Sep 16, 2019
16 tasks
@bskinn
Copy link
Owner

bskinn commented Sep 28, 2019

If my mental model of Python name assignment is correct, I suspect there's not any way to invisibly "intercept" assignments to our mocked objects, such that we could track such assignments and then silently "re-target" the assignments once we're ready to __exit__, or whatever?

But -- what if we added a toggle flag to our object, default to False, that when switched to True turns it into a silent, complete pass-through to the .prior_stdfoo? I don't think we can reach out to other code and rebind their references, but if we can just switch our objects, which those references remain bound to, to a no-op pass-through to the old objects... might work ok?

Could foul some is tests in that external code... but having is be False is probably good... we could try to set it up so that == is True but is is False? Gut-feel, that seems like a sensible way for it to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigate Need to figure this out todo Definitely needs doing for a near-term release
Projects
Development

No branches or pull requests

2 participants