Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations to experimental geoshader Sprite #905

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

cspotcode
Copy link

@cspotcode cspotcode commented Jul 28, 2023

After learning about pyglet's Sprite, and learning that pyglet's drawables are generic such that shapes and sprites can be combined into the same batch, I wondered how it compares against arcade's Sprite. My benchmarks showed pyglet is slower, so I dug into the code and made some optimizations. These optimizations make pyglet's sprite faster than arcade's in my specific benchmark, creating thousands of sprites and moving each one every frame.

I made a variety of changes in separate commits, then ran the same benchmark against each commit. This table was generated by hyperfine and shows the relative performance of each. The most recent (fastest) commit is at the top, and the bottom commit is the vanilla geoshader Sprite currently on master.

Some of these changes are bad and should not be merged! I was curious if they would help, but the benchmark shows that they did not improve performance, and may have made it worst. I've included them here to share my research.

By looking at the relative timings, you can see which code changes had a positive/negative/negligible impact on performance.

This PR probably should not be merged as-is. Rather, the most impactful performance improvements can be extracted to another PR.

To run the benchmark yourself, open bench.sh and put in the correct commit hash where it tells you to. Then run bench.sh and see the results in results.md. On my windows box in PowerShell, I run it like this:

& 'C:/Program Files/Git/bin/bash.exe' ./benchmarks/bench.sh sprite
Command Mean [s] Min [s] Max [s] Relative
9bdf59dc1 Sprite uses VertexList optimized setter for position updates 58.694 ± 0.850 57.771 59.443 1.00
6da2b35dc Add to VertexDomain and VertexList attribute.buffer_array and optimized setters to write values into the buffers 103.905 ± 0.867 103.351 104.903 1.77 ± 0.03
751d79e2a Refactor sprite from _x, _y, _z attributes to _position tuple attribute 105.724 ± 1.881 104.177 107.818 1.80 ± 0.04
dab20d922 Make Sprite slotted 111.933 ± 1.746 110.405 113.837 1.91 ± 0.04
7b9d3a019 Optimize VertexList region cache 108.820 ± 0.793 107.953 109.507 1.85 ± 0.03
fcdc49d1c optimize BufferObjectRegion.invalidate 109.228 ± 1.010 108.533 110.386 1.86 ± 0.03
42841c287 Move geoshader_sprite so it's accessible from editable installs of pyglet 127.479 ± 4.137 123.112 131.339 2.17 ± 0.08

Copy link
Contributor

@pushfoo pushfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested changes which will save contributors and users time in the future, as well as some style nitpicks.

@@ -309,5 +309,7 @@ def invalidate(self):
until the next time the buffer is used, for efficiency).
"""
buffer = self.buffer
buffer._dirty_min = min(buffer._dirty_min, self.start)
buffer._dirty_max = max(buffer._dirty_max, self.end)
if self.start < buffer._dirty_min:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explaining why we aren't using the more concise syntax will save everyone time by preventing reversion attempts.

Suggested change
if self.start < buffer._dirty_min:
# Intentionally avoid min and max functions to increase performance as of 3.8 - 3.12
if self.start < buffer._dirty_min:

Comment on lines +111 to +139
"""
For each attribute, create a ctypes array pointing to the entire buffer
which can be used to write values into the buffer using slice assignment
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style nitpick: pyglet's docstring conventions place the summary on the same line as the opening quotes.

Suggested change
"""
For each attribute, create a ctypes array pointing to the entire buffer
which can be used to write values into the buffer using slice assignment
"""
"""For each attribute, create a ctypes array pointing to to its entire buffer.
This allows using slice assignment syntax to write values into the buffer.
"""

Comment on lines 312 to 348
"""
Returns a function that, when called, sets the values for a given attribute.

The setter becomes invalid when the VertexList migrates to a different
domain, and it's the caller's responsibility to stop using the old setter
and create a new one.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Same docstring style nitpick as before
  2. Phrasing tweaks
  3. We can make debugging easier by explaining what happens when you fail to migrate to the new setter so the issue can be identified
Suggested change
"""
Returns a function that, when called, sets the values for a given attribute.
The setter becomes invalid when the VertexList migrates to a different
domain, and it's the caller's responsibility to stop using the old setter
and create a new one.
"""
"""Return a function which sets the value(s) of the named attribute.
The setter is invalidated whenever the VertexList migrates to a different
domain. It's the caller's responsibility to stop using the old setter and
create a new one when this happens.
"""

@caffeinepills
Copy link
Collaborator

Slots may have to be omitted as this will break existing code or pushed further into a release where we can break compatibility.

Personally I am not a fan of restricting sprites to this degree in general. Although it's probably fine for a new class such as GeoSprite that is not already being used.

@cspotcode
Copy link
Author

Yeah, I talked a little bit about the slots change in the PR description, though didn't call it out by name. It's the bit saying "Some of these changes are bad and should not be merged! ..."

@cspotcode
Copy link
Author

cspotcode commented Aug 3, 2023

@benmoran56 I cherry-picked your code that uses a custom VertexList subclass per VertexDomain.

Benchmark results are below. The fastest is to combine our two optimizations, which is weird considering the closure/partial-bound _position_setter does not use the generated VertexList property.
EDIT: that's not true, actually, I'm tired. I'll let the numbers do the talking. Looks like the closure/functools.partial is fastest.

Command Mean [s] Min [s] Max [s] Relative
631c70bff Sprite sets position with Sprite._position_setter(position) which is bound closure. Custom VertexList subclass instead of VertexList.__getattr__/__setattr__ 69.797 ± 7.609 65.220 78.581 1.05 ± 0.12
6411c53f2 Sprite sets position with Sprite._vertext_list.position[:] = position Custom VertexList subclass instead of VertexList.__getattr__/__setattr__ 80.551 ± 1.871 78.423 81.941 1.21 ± 0.03
944be0026 Sprite sets position with Sprite._position_setter(position) which is bound closure 66.445 ± 0.779 65.574 67.076 1.00
3111f9267 Sprite sets position with Sprite._position_setter(position) which is bound functools.partial() 68.395 ± 2.051 66.372 70.472 1.03 ± 0.03
23aa26e4a Refactor sprite from _x, _y, _z attributes to _position tuple attribute 112.209 ± 0.927 111.213 113.047 1.69 ± 0.02
4029cba7e Make Sprite slotted 115.144 ± 1.829 113.902 117.244 1.73 ± 0.03
7d721bf14 Optimize VertexList region cache 115.094 ± 5.124 110.604 120.676 1.73 ± 0.08
c8ee652d4 optimize BufferObjectRegion.invalidate 112.646 ± 0.818 111.924 113.535 1.70 ± 0.02
cd7e06225 Move geoshader_sprite so it's accessible from editable installs of pyglet 130.280 ± 2.801 127.517 133.117 1.96 ± 0.05

@cspotcode
Copy link
Author

I made one final optimization, this one focused on memory. Ran a final benchmark.

In short, it's now 2.3 times as fast.

For fairness, this table shows the same 4 commits ran twice to make sure the first ran again on a hotter CPU.

Command Mean [s] Min [s] Max [s] Relative
28b2790d1 further optimize position setter to avoid closures and functools.partial, reduce memory usage 61.748 ± 0.290 61.568 62.083 1.00
631c70bff Sprite sets position with Sprite._position_setter(position) which is bound closure. Custom VertexList subclass instead of VertexList.__getattr__/__setattr__ 68.715 ± 1.132 67.955 70.016 1.11 ± 0.02
6411c53f2 Sprite sets position with Sprite._vertext_list.position[:] = position Custom VertexList subclass instead of VertexList.__getattr__/__setattr__ 84.336 ± 3.822 81.029 88.520 1.37 ± 0.06
cd7e06225 Move geoshader_sprite so it's accessible from editable installs of pyglet 136.987 ± 3.303 133.767 140.368 2.22 ± 0.05
28b2790d1 further optimize position setter to avoid closures and functools.partial, reduce memory usage 63.626 ± 0.626 62.986 64.238 1.03 ± 0.01
631c70bff Sprite sets position with Sprite._position_setter(position) which is bound closure. Custom VertexList subclass instead of VertexList.__getattr__/__setattr__ 70.843 ± 2.088 69.538 73.251 1.15 ± 0.03
6411c53f2 Sprite sets position with Sprite._vertext_list.position[:] = position Custom VertexList subclass instead of VertexList.__getattr__/__setattr__ 85.504 ± 3.634 81.669 88.897 1.38 ± 0.06
cd7e06225 Move geoshader_sprite so it's accessible from editable installs of pyglet 140.134 ± 4.354 135.209 143.470 2.27 ± 0.07

`vertex_list_set_attribute_values` along with a data tuple to set values
for the attribute.

These params become invalid when the VertexList migrates to a different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a blocker, unfortunately. For the default Batch, users should be able to add/remove/migrate things between Batches transparently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safe for Sprite to use internally, right?

Sprite._batch and Sprite._vertex_list are private. For a user to migrate a Sprite between batches, they have to assign to Sprite.batch, Sprite.program, etc. And all those properties call VertexList.get_setter_params.

If a user is using VertexDomain / VertexList directly, they can safely avoid get_setter_params and instead use the same API that sprites do today: get a BufferObjectRegion via VertexList.__getattr__ or the generated properties on _vertexlist_class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants