Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consecutive headless pyglet application runs as part of a pytest testing flow #953

Open
matanox opened this issue Sep 12, 2023 · 19 comments
Labels
bug Something isn't working

Comments

@matanox
Copy link
Contributor

matanox commented Sep 12, 2023

The following code simulates a pytest test suite run where a first test runs a pyglet app and finalizes it, and later another pyglet app is run as part of the next test, yielding an error. The below minimally reproducible flow reproduces the same issue ― within the boundaries of a single plain script ― starting and finalizing a pyglet app, and then naively taking the first step into starting the next test having the same nature:

import pyglet; pyglet.options["headless"] = True

class FirstAppTest:
    def __init__(self):

        self.update_count = 0

        self.window = pyglet.window.Window(width=600, height=600, resizable=True, visible=True)

        pyglet.clock.schedule_interval(self.update, 1 / 30)
        pyglet.app.event_loop.run()

    def update(self, dt):
        self.update_count += 1
        if self.update_count == 10:
            self.window.close()

if __name__ == '__main__':

    FirstAppTest()

    # next pytest will fail at this line, if the previous test code had preceded it within the same pytest suite run:
    window = pyglet.window.Window(width=600, height=600, resizable=True, visible=False)

The exception text provided by pytest is:

============================== 1 failed in 0.52s ===============================
FAILED                  [100%]
ReproduciblePytestPygletIssue.py:17 (test)
def test():
    
        FirstAppTest()
>       window = pyglet.window.Window(width=600, height=600, resizable=True, visible=False)

ReproduciblePytestPygletIssue.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.venv3.10/lib/python3.10/site-packages/pyglet/window/__init__.py:578: in __init__
    self._create()
.venv3.10/lib/python3.10/site-packages/pyglet/window/headless/__init__.py:104: in _create
    self.dispatch_event('on_resize', self._width, self._height)
.venv3.10/lib/python3.10/site-packages/pyglet/window/__init__.py:668: in dispatch_event
    super().dispatch_event(*args)
.venv3.10/lib/python3.10/site-packages/pyglet/event.py:392: in dispatch_event
    raise e
.venv3.10/lib/python3.10/site-packages/pyglet/event.py:387: in dispatch_event
    if getattr(self, event_type)(*args):
.venv3.10/lib/python3.10/site-packages/pyglet/window/__init__.py:818: in on_resize
    self.projection = Mat4.orthogonal_projection(0, width, 0, height, -255, 255)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = HeadlessWindow=(width=600, height=600)
matrix = Mat4(0.0033333333333333335, 0.0, 0.0, 0.0)
    (0.0, 0.0033333333333333335, 0.0, 0.0)
    (0.0, 0.0, -0.00392156862745098, 0.0)
    (-1.0, -1.0, 0.0, 1.0)

    @projection.setter
    def projection(self, matrix: Mat4):
    
>       with self.ubo as window_block:
E       AttributeError: 'HeadlessWindow' object has no attribute 'ubo'

.venv3.10/lib/python3.10/site-packages/pyglet/window/__init__.py:1305: AttributeError

Process finished with exit code 1

I might imagine that there would be a different way to finalize and restart a pyglet application, or that the api was never designed to be used for more than one application within the same flow ― however when using pytest ― to the best of my understanding the library is only loaded once and then reused across all test cases being run as part of a test suite run. Which bumps into this issue, when instructing pyglet to run headless.

I find that the running multiple tests starting and stopping the application to be well-motivated, when you really want to test your application with multiple test cases, is there any way to work around this?

System Information

Ubuntu 20.04
python 3.10.13
pyglet 2.0.5

@matanox matanox added the bug Something isn't working label Sep 12, 2023
@pushfoo
Copy link
Contributor

pushfoo commented Sep 12, 2023

to the best of my understanding the library is only loaded once and then reused across all test cases being run as part of a test suite run.

This is my understanding as well. The arcade library goes further by re-using the same window throughout unit tests in addition to just the same pyglet app context.

I haven't looked at many other projects for comparison, but they seem to either assume there will be one app run, or provide some capacity for nesting with the assumption there will be one root parent. I think FastAPI may fall in the latter category, but I may be wrong on the details.

Regardless of what other applications do, the idea of having separable app contexts might imply each having their own clock.

@matanox
Copy link
Contributor Author

matanox commented Sep 12, 2023

I don't think that python itself (or at least CPython) provide any stable mechanism for re-importing a library to get past this.
The nearest pytest plugin (pytest-xdist) has dropped support for running each test on its own.

@matanox
Copy link
Contributor Author

matanox commented Sep 12, 2023

Oh, thanks for the quick reply, I'll try looking at how arcade makes that happen.

@pushfoo
Copy link
Contributor

pushfoo commented Sep 12, 2023

I forgot to mention this, but it's important: arcade also assumes there will only be one application window.

@matanox
Copy link
Contributor Author

matanox commented Sep 12, 2023

I wonder if enabling this implies great changes to pyglet, though it would make more sense for pytest (or an alternative testing framework) to be smart enough to allow running marked tests on separate python interpreters, within the same test suite being managed by it.

@pushfoo
Copy link
Contributor

pushfoo commented Sep 12, 2023

The nearest pytest plugin (pytest-xdist) has pytest-dev/pytest-xdist#468 for running each test on its own.

That thread mentions pytest-forked as the replacement. Have you tried it? My understanding is that web application testing is a primary use case for this sort of parallelization, which makes sense given the same thread's history mentions selenium.

I find that the running multiple tests starting and stopping the application to be well-motivated, when you really want to test your application with multiple test cases,

Can you be more specific about the use cases for this with pyglet? Is your goal testing the application abstraction itself, or something else?

If it's the latter, then it might be fine as long as you don't try to share a single GL context between the application contexts. Someone with more GL experience will have to weigh in one we have specifics.

@matanox
Copy link
Contributor Author

matanox commented Sep 12, 2023

Thanks so much @pushfoo.

I just have multiple headless tests for my single application, each running the application with different arguments and/or differently simulating user events while the application is headless. I wouldn't call these unit-tests, as they simulate user interaction timelines against the application.

After switching my application code to take the window objects as an input argument (a.k.a. dependency injection) and providing the very same list of pre-instantiated window objects to all tests via a pytest fixture at test-suite running time (still using pytest), everything seems to work as expected.

Here's the pytest fixture code ―

print('pytest conftest making a fixture shared at the pytest session level, providing the same headless window objects to all tests during a test suite run')
@pytest.fixture(scope='session')
def windows():
    return [MyPygletWindow(resizable=True, visible=False, headless=True),
            MyPygletWindow(resizable=True, visible=False, headless=True)]

(I omit my class subclassing pyglet.window.Window from the code listing, basically it introduces a config and some branching so that when instructed to run headless it doesn't try to use antialiasing through sample buffers).

This is a bit like #911, though only circumstantially, in which it had emerged yet in a platform-specific manner, that closing and then reopening windows brings in some challenges which are best avoided.

This change has entailed switching the application to never close its windows, but rather exit by using pyglet.app.event_loop.exit(), since the application must be exited both at runtime and during tests. Which I hope is a valid shutdown method just the same as closing all windows.

I have tested this solution diligently but will keep monitoring for a bit, as I add headless tests to my suite.

@pushfoo
Copy link
Contributor

pushfoo commented Sep 12, 2023

I just have multiple headless tests for my single application,

Are you testing non-drawing only functionality only, or are you doing pixel comparison as well? There was a non-drawing branch I was working on to replace shaders and the GL context with mocks. If you are interested in this as well, I would appreciate your help and/or feedback.

I wouldn't call these unit-tests, as they simulate user interaction timelines against the application.

These are probably better referred to as integration or end to end tests, but the wording isn't as relevant as what they tell us. Could you share more about them?

closing and reopening windows brings some challenges which are best avoided.

I agree for the time being, even though I'd like to better understand or solve these one day. If it's indeed related to X windowing as #911 suggests, it may be why Factorio's devs share sub-regions of a window across multiple tests threads.

Splitting the window may only make sense if you've already implemented or plan to split rendering, client input, and server logic from each other for networked gameplay. It also makes window management much easier.

I have tested this solution diligently but will keep monitoring for a bit, as I add headless tests to my suite.

Thank you for the update. I would appreciate it if you could share either code or more about what you learn. Maybe it will help with being able to parallelize arcade's tests one day, even if current design choices make it difficult.

My understanding is that your test setup may be like one using Selenium grid, except you can only add new runners instead of destroy existing ones.

@matanox
Copy link
Contributor Author

matanox commented Sep 13, 2023

Hi @pushfoo,

I am not doing pixel-level verification in any of my tests, or at least I haven't figured going to that level would be a worthwhile strategy for my application insofar. My application is an application for taking video and reviewing it with various props, features and modes, but if I can be of any utility for anything please let me know what it might be!

As to my "solution", it really boils down to that pytest fixture which I copied earlier above, no parallelization is involved, I do not expect reaching a mass of headless tests to merit going concurrent, so I have no contribution in that realm.

If anything fancy or interesting comes up in my journey I'll post an update though. I can share my private repo to your user if you'd like to take a look though.

You are absolutely right, I should have called them integration or end-to-end tests, although the title of this issue was meant to imply towards that in a sense.

I hope this isn't too disappointing to read and thank you again for your support (!)

@matanox
Copy link
Contributor Author

matanox commented Sep 13, 2023

That said, why would you replace shaders and the GL context with mocks, if that could be accomplished by using the headless mode which as far as my understanding is leveraging EGL to accomplish the same?

@pushfoo
Copy link
Contributor

pushfoo commented Sep 14, 2023

Mutually Beneficial Items Which Need Help

if I can be of any utility for anything please let me know what it might be!

Solving one or both of these benefits everyone:

  1. Getting the Mac-specific select all hotkey to work in Fix issues with TextEntry hotkey branch CORRUPTOR2037/pyglet#2
  2. Bug: L/R Arrow Keys do not move Caret to ends of selection correctly #932

I've been blocked by not having enough:

  1. Spare time
  2. Familiarity with objective C
  3. Low-latency access to a Mac

My application is an application for taking video and reviewing it with various props, features and modes

Prioritizing Mac support might be a good idea if you're doing any of the following:

  • Working on media / art production tools
  • Planning to add automatic labeling suggestions via AI
  • Building an AI data set annotation system

Macs are already commonly used for media production, but their recent models may have additional benefits for you. These include:

  • Good reported AI performance / cost ratio
  • First-party support for multiple AI backends (PyTorch, JAX, etc)

Figuring out cmd-A for TextEntry also opens the door to other directly relevant UI improvements:

  1. Better keyboard shortcut handling, such as:
    1. Mode support and other keyboard shortcut handling
    2. Select all in file selection or annotation UI since NSCollectionView.selectAll should work the same way as for NSTextView
  2. Expand drag and drop support as outlined in this comment

Feedback or further research on the keyboard shortcut or drag and drop systems would also be helpful.

Why Mock Shaders?

It's for a very specific use case: isolating unit tests of Python code for init and properties in high level abstractions. We use headless tests and interactive tests elsewhere.

To be more specific, abstractions like shapes and sprites break a lot, which keeps wasting everyone's time. We need to solve this.

Here's why it happens and how shader mocks / dummy classes help fix it:

  1. Function calls are expensive in Python, especially before 3.10
  2. pyglet will support 3.9 until its EOL in 2025
  3. High level abstractions maximize performance by inlining code
  4. As a result, their implementations of init, property, and respective tests get out of sync frequently (color, rotation, etc)
  5. Protocol types and centralizing unit tests will help prevent this and other problems such as incorrect passes / fails common with the current implementation
  6. Automating accurate shader replacements is a first step towards centralization
  7. I mostly have it down to a single fixture invocation, so it's very convenient

There are also further benefits:

  1. As long as Python & pip-based dev dependencies install, these unit tests will return useful results
  2. They save time elsewhere: we instantly know when (E)GL or its bindings broke if these tests pass but headless pixel comparison tests fail
  3. It's teaching me a lot about OpenGL and pyglet's structure

@matanox
Copy link
Contributor Author

matanox commented Sep 14, 2023

I can see the benefit of the mocking now, but the rest of items are indeed a handful. Specifically in my project, Mac support is not even at the bottom of the list of priorities in the foreseeable future, but I have to be honest my familiarity with pyglet's codebase and general scenarios, and to be very open about it my coding level, are definitely very far from being at a contributor level as of yet, I can possibly try smaller and safer contributions if I live long enough.

@pushfoo
Copy link
Contributor

pushfoo commented Sep 14, 2023

Mac support is not even at the bottom of the list of priorities in the foreseeable future,

That's understandable. Some other contributors have even stronger opinions, and would prefer we didn't support mac at all. I understand their sentiment because it takes a totally different approach to some tasks.

The second text entry ticket I linked is actually cross-platform, but it may also be worth thinking about it further. I'm not sure it should also cover up/down arrow keys.

I have to be honest my familiarity with pyglet's codebase and general scenarios, and to be very open about it my coding level, are definitely very far from being at a contributor level as of yet,

  1. Even wrong or broken code can get a useful discussion started
  2. The best way to learn is sometimes to try, get it wrong, and learn from the mistakes

For example, I had to throw out two precursors to my current shader mock branch.

I can possibly try smaller and safer contributions if I live long enough.

Good bug reports and the patience to follow up on them are also important forms of contribution. I've been reading through the backlog of issues in search of easier ones not associated with glaring inconveniences (shape / etc API instability), and I appreciate the ones you've submitted so far.

I need to step away for now, but I'll think over the original issue. If the pytest plugin I linked above doesn't cover it already, another ugly workaround might be to start other interpreters and use an RPC library to pass data between the tests and the wrapped interpreter. Your current approach may be less work, however.

@matanox
Copy link
Contributor Author

matanox commented Sep 14, 2023

You are too kind, but I think that reusing windows in testing is a simple, working approach, which overlaps with the philosophy and practice of dependency injection which is anyway a good practical approach to many desires in testability. Pytest's fork plugin is more of a dead-end than something to keep using for long, if you read the wording on that url, I wouldn't recommend it for new development project to pick up right now.

I imagine that executing interpreters and collecting their return codes and exceptions makes a natural approach ― yet this should be IMHO outside of a library like pyglet, I'm not sure why RPC should be involved, and I assume many people write their own subprocess wrappers to this end, at varying degrees of robustness and generality and providing varying sets of guarantees (same as test frameworks have different sets of features).

If my current approach breaks at any point, I'm sure to report back on it, but it seems like the "golden path" to address headless testing suites till the point that some long-running window reuse breakage emerges in pyglet or due to the underlying opengl implementation, at which point, launching tests as sub-processes is always doable ― even though it means going down to writing infrastructure code, the building blocks are quite simple.

Thanks again for pyglet ― hope to become helpful as things make progress.

@matanox
Copy link
Contributor Author

matanox commented Dec 29, 2023

Just a note to myself that a consequence of app.event_loop being a module singleton, is that a pending callback scheduled through pyglet.clock.schedule_once() will persevere through the sequence of stopping and restarting the pyglet.app.event_loop singleton, as a pytest run is traversing multiple tests under a single initialization of all imports.

Such a scheduled event will persevere across the test cases, as pytest only loads pylget once. So when a given test case goes pyglet.app.event_loop.run(), any outstanding queued callback from the previous test case (if any) will (ghostly) execute shortly after.

Of course, one can make sure to avoid test cases exiting with pending callbacks queued, by tightening up conditional logic over when callbacks are registered, give or take using pyglet.clock.unschedule()

I could instantiate a new pyglet EventLoop() rather than use the module's singleton event_loop object, but then it will be reusing the same instance of the default clock which is also a singleton (pyglet 2.0.5, and also on the master branch) and I'm not sure how deep is the motivation for using singletons rooted.


Perhaps in future versions, the pyglet.app.event_loop object could possibly reset its clock or queue of callbacks, on its pyglet.app.event_loop.exit(), or otherwise enable a workflow not involving any singleton object for the event loop, if anything like that makes sense.

@matanox matanox reopened this Dec 30, 2023
@matanox
Copy link
Contributor Author

matanox commented Dec 30, 2023

I worked to prevent my test cases exiting code while any callbacks are left queued, but I think it could be a nice enhancement to either provide (i) a method to clear all queued callbacks which a test runner fixture or app code can call at test boundaries, or, (ii) to have pyglet.app.event_loop.exit() clear all queued callbacks for the same purpose, or (iii) to make it possible to instantiate an entirely compartmented EventLoop, so that test runners loading pyglet (and its singleton objects) only once, can not meet queued events beyond the test that had queued them as they will be in such a case instantiating a fully compartmentalized (new) copy of an event loop ― rather than restarting the same singleton object of it.

Obviously one may rightfully contend that leaving behind queued callbacks is bad application implementation, but since they are bound to creep up in a way very hard to debug in the mentioned test running scenario, perhaps it is easy to provide some affordance that avoids it, via the framework/library iteslf.

* disclaiming that I'm not aware at this time whether other test runners (that is other than pytest) carry the same library loading behavior and possible consequence.

@matanox
Copy link
Contributor Author

matanox commented Dec 30, 2023

All that said I'd like to share that running headless tests has proven immensely useful and successful for my application, I would not be able to develop and evolve my project without this extensive headless testing suite that I am running.

I guess that something careful enough in the style as suggested may save future others from very hard to debug issues in their headless testing architecture for their pyglet applications, due to the spill-over effect that you get in the current state.

@pushfoo
Copy link
Contributor

pushfoo commented Apr 14, 2024

TL;DR of the state of tests since earlier comments:

  1. Python 3.13 has exciting parallelism changes coming:
  2. It may be worth waiting to see how testing libraries respond to the No-GIL changes
  3. Once there's news from pytest/etc, the purity concepts of the shader mock branch may be worth to revisiting if not the whole implementation:
    • The branch was put on the backburner since Ben made improvements which sidestepped the branch for the time
    • OpenGL still only allows a single thread to ever touch the GL state
    • Maybe this would allow multiple pyglet instances without making custom, compartmentalized event loops (the "bad implementation" mentioned above)?
      • CPU-side-only stubs of GL abstractions?
      • Multiple GL contexts since the per-application state may be isolated (if each sub-interpreter is seen by GL as a separate application?)

@matanox
Copy link
Contributor Author

matanox commented May 16, 2024

Thanks for those comments.

Would PlatformEventLoop.notify actually be an idiomatic way for syncrhonously clearing all events between pytest tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants