Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Pinwheel Expand 1D Effect #3961

Merged
merged 18 commits into from May 15, 2024
Merged

Added Pinwheel Expand 1D Effect #3961

merged 18 commits into from May 15, 2024

Conversation

Brandon502
Copy link

Requested on discord to bring this effect from MM to main branch. Reworked and optimized by @softhack007.

Copy link
Collaborator

@blazoncek blazoncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, except the use of sinf/cosf which should be replaced by sin_t/cos_t.

BTW I found out that there is no need to use int_fast32_t unless you strictly need 32 bits and the code is ported to 16 bit environments (or int_fast16_t). int would be cleaner (or unsigned where appropriate) and the compiler treats it as such, but that's just my preference.

@softhack007
Copy link
Collaborator

softhack007 commented May 9, 2024

Hi, let me answer as the one who did the optimization part of the new mapping mode.

Looks fine to me, except the use of sinf/cosf which should be replaced by sin_t/cos_t.

sin_t/cos_t are slow and less accurate.

I guess you intend to reduce 8266 memory footprint, by not pulling in the dependency to std math lib?

We might be able to implement your suggestion at least in the getpixelcolor() part, which is inaccurate any way.

BTW I found out that there is no need to use int_fast32_t unless you strictly need 32 bits and the code is ported to 16 bit environments (or int_fast16_t). int would be cleaner (or unsigned where appropriate) and the compiler treats it as such, but that's just my preference.

Not sure. In fact we do fixed-point math in setPixelColor(). As the x and y position is incremented until we hit the border, 9 bit of "fraction" just leaves about 1-2bit of slack when one of the matrix dimensions is close to the maximum of 256 (imagine a matrix 196x4). Unsigned is not possible, because we have negative increments when sin() gets negative.

So considering that x and y can be at most 256, we'd need 18bit of resolution for good results (one bit for sign, 8bit integer part, 9bit fraction). Maybe we can squeeze out 2bits of fraction so everything fits into 16bit. I'd need to do some more testing to see if reducing the resolution creates holes in the mapping.

I'm not sure if we can expect a benefit from going to int16 for x and y. What do you think?

Edit: just re-read your question - maybe I misunderstood part of it 😄 yes in principle we can change int_fast32_t to just int and add a comment that at least 18bits (signed) are needed for the algorithm to work properly.

@softhack007
Copy link
Collaborator

softhack007 commented May 9, 2024

(Changing hat - now speaking as a maintainer 🎩 )

the pinwheel mapping works with many effects, however it has a few open ends:

  • not explictly supporting "virtual 2D strip" effects like fire or popcorn (effects do work, just no "virtual 2D" along the rays)
  • getpixelColor is sometimes inaccurate, as it returns color of a pixels with correct angle, however on the outmost circle that was possibly overwritten by a neighbouring "ray" already. This could be improved by using a pixel from the segment edge, but that's still not perfect. To really solve this issue, the mapping mode would need to allocate it's own buffer on segment level.

However I think these two limitations are still acceptable for a first shot, just there is room for improvement in the future.

@softhack007 softhack007 added enhancement documentation documentation should be updated to explain behaviour labels May 9, 2024
@blazoncek
Copy link
Collaborator

blazoncek commented May 9, 2024

sin_t/cos_t implementation can be overridden using -D WLED_USE_REAL_MATH, sinf/cosf cannot.
They were introduced by @Aircoookie and I insist they are used across core WLED until he changes his mind. And yes, they were introduced to reduce RAM usage on ESP8266 despite being slightly inaccurate at extreme edges.

int_fast16_t is just a typedef of int on ESP platform (as it is 32 bit). So is int_fast32_t.
At least that's what compiler and VSC tell me.
It does matter with non fast versions. What compiler does is it adds & 0xFFFF to every instruction when using int16_t (non fast).

@softhack007
Copy link
Collaborator

softhack007 commented May 9, 2024

despit being slightly inaccurate at extreme edges.

Ok - just let me add that sin_t and cos_t are not just inaccurate - on esp32 platforms they are noticeably slower than sinf/cosf.

@Brandon502 please make the changes as request by @blazoncek.

  • For the int_fast16_t and int_fast32_t types in serPixelColor(), please change them to int and add a comment about min required number of bits ( I.e. 18bit for former int_fast32_t, 10bit for former int_fast16_t).
  • Also let's replace sinf/cosf by sin_t/cos_t. It's enough to simply change the function name.

Copy link
Collaborator

@softhack007 softhack007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved from my side, assuming that changes already requested by @blazoncek get implemented.

Adding the "documentation" tag, because I think that limitations of the pinwheel mapping - it may not work well with some effects - should be documented in the KB. But that's a different task.

@blazoncek
Copy link
Collaborator

Ok - just let me add that sin_t and cos_t are not just inaccurate - on esp32 platforms they are noticeably slower than sinf/cosf.

Nobody said they were faster. 😁 sinf/cosf use look-up tables and can be fast, sin_t/cos_t can use polynomial approximation or can be substituted by libc's sinf/cosf implementation.
I am just trying to explain why there was a decision for wled_math.cpp. FYI I do not necessarily agree with that.

cosf/sinf changed to cos_t/sin_t. int_fast32_t and int_fast_16_t changed to int.
@Brandon502
Copy link
Author

Made the requested changes. Everything still seems to work the same on my matrix. Have not had the chance to test different size matrices yet.

@softhack007
Copy link
Collaborator

@blazoncek do you think this can go into the upcoming 0.15.0-b3, or should we better wait and merge this PR after -b3 is out?

@blazoncek
Copy link
Collaborator

I think this will be included in b3 release this weekend.

@Brandon502
Copy link
Author

Brandon502 commented May 10, 2024

@softhack007 I never tested large matrices after your rework just looked at the code. I loaded up latest MM version on a virtual 64x64 matrix. Up to 30x30 works fine. 31x31 and 32x32 produces holes. 33x33 up to 50x50 also works correctly. 51x51 and above also produces holes on my end. Not sure if I'm setting up something wrong or if some things need to be tweaked a bit. This applies to v15 version as well for me.

@softhack007
Copy link
Collaborator

softhack007 commented May 10, 2024

@softhack007 I never tested large matrices after your rework just looked at the code. I loaded up latest MM version on a virtual 64x64 matrix. Up to 30x30 works fine. 31x31 and 32x32 produces holes. 33x33 up to 50x50 also works correctly. 51x51 and above also produces holes on my end. Not sure if I'm setting up something wrong or if some things need to be tweaked a bit. This applies to v15 version as well for me.

@Brandon502 thanks for testing. Actually, my main test environments are 24x32 and 52x52 where I didn't see holes.
Holes are always possible when the number of pixels gets very hight. Its mainly about finding the best "tuning" that avoids holes without blowing up runtime too much.

I can take a look on Sunday. If you have time, maybe test out these ideas:

  • For the 32x32 case, please change the 32 in constexpr int Pinwheel_Size_Medium = 32; to 30.
  • for the >50x50 case, I need to see if rounding helps with avoiding holes. I was hoping to get away without rounding for speed reasons. Maybe lines 758 + 759 need to include rounding, i.e. int x = (posx + Fixed_Scale/2) / Fixed_Scale; and int y = (posy + Fixed_Scale/2) / Fixed_Scale;.

Copy link
Collaborator

@blazoncek blazoncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit perplexed by complexity of what should be a simple expansion.
I did not try to compile so can't judge the efficiency of the code but for testing I'd recommend setting up 4 segments (16x16) with pinwheel expansion and see what FPS you get.

Otherwise the code looks ok.

EDIT: Do you have a video of the expansion in action?

@Brandon502
Copy link
Author

I am a bit perplexed by complexity of what should be a simple expansion. I did not try to compile so can't judge the efficiency of the code but for testing I'd recommend setting up 4 segments (16x16) with pinwheel expansion and see what FPS you get.

Otherwise the code looks ok.

EDIT: Do you have a video of the expansion in action?

32x32.mp4

virtual matrix so liveview is a bit laggy. Shows a stable 20fps running 4 16x16 segments (Rainbow, Two Dots, Rainbow Runner, Washing Machine). A single 32x32 effect runs around ~28fps.

Removed holes on 31x31 and 32x32 grids.
@blazoncek
Copy link
Collaborator

Shows a stable 20fps running 4 16x16 segments (Rainbow, Two Dots, Rainbow Runner, Washing Machine). A single 32x32 effect runs around ~28fps.

I thought it would be slow. Unfortunate as that is.

@Brandon502 & @softhack007 if you can spot a place for speed optimisation this would need it.

A word about missing pixels: Unless you increase the "resolution" of your expansion (i.e. slow it down), you'll always get them on certain dimensions. Larger dimensions more chance for them as you are dealing with sin/cos.

@softhack007
Copy link
Collaborator

softhack007 commented May 11, 2024

@Brandon502 & @softhack007 if you can spot a place for speed optimisation this would need it.

Yes - the limiting factor is the number of setPixelColor() calls needed for the full pinwheel, so I see two options:

  • add rounding (see my previous comment) and reduce the number of "rays"
  • add two more "steps" by finding the optimal number of rays for common sizes: size <= 16 (~160 rays maybe less); 30 <= size <= 40 (~256 rays may work)

I'm always from my esp32 today - most likely can test that tomorrow.

@Brandon502
Copy link
Author

Brandon502 commented May 11, 2024

@Brandon502 & @softhack007 if you can spot a place for speed optimisation this would need it.

Yes - the limiting factor is the number of setPixelColor() calls needed for the full pinwheel, so I see two options:

* add rounding (see my previous comment) and reduce the number of "rays"

* add two more "steps" by finding the optimal number of rays for common sizes: size <= 16 (~160 rays maybe less); 30 <= size <= 40 (~256 rays may work)

I'm always from my esp32 today - most likely can test that tomorrow.

Created a custom effect so I can control rays and turn rounding on/off. Rounding seems to be much worse, at least the method you mentioned above. Here's the amount of rays needed for various sizes without rounding.

8x8 30 rays
16x16 72 rays
24x24 128 rays
32x32 200 rays
40x40 272 rays
48x48 296 rays
56x56 296 rays
64x64 448 rays

This may not be 100% accurate, but it's a decent baseline.

Added small pinwheel size. Adjusts medium and large values.
@blazoncek
Copy link
Collaborator

Dear Lord! 😁 For each pixel you draw 200 lines on 32x32? No wonder it is slow.

You need to optimize loop to only call setPixelColor() when either x or y actually changes from previous iteration.
Next optimisation would be to have dynamic "rays" that start at certain distance from center instead of all of them starting at center and overwriting same pixel over and over again.

@Brandon502
Copy link
Author

I added small pinwheel option for matrices 16x16 and below. The 4 16x16 effects on 32x32 grid went from ~20 fps to ~33 fps. Pinwheel seems to be higher fps then arc on effects that I've tested.

@blazoncek
Copy link
Collaborator

Pinwheel seems to be higher fps then arc on effects that I've tested.

Yes arc is slow as it iterates in polar space (increasing angle) instead of using Bresenham's circle drawing algorithm as that produces empty pixels. That is also in need of optimisation.

@Brandon502
Copy link
Author

Dear Lord! 😁 For each pixel you draw 200 lines on 32x32? No wonder it is slow.

You need to optimize loop to only call setPixelColor() when either x or y actually changes from previous iteration. Next optimisation would be to have dynamic "rays" that start at certain distance from center instead of all of them starting at center and overwriting same pixel over and over again.

200 lines total on 32x32. setPixelColor seems to be called 3604 times on 32x32. So that can definitely be reduced by avoiding overwriting near the center like you said. I'll play around with trying to reduce it.

Changed method for drawing odd numbered rays.
@Brandon502
Copy link
Author

@Brandon502 i think our best chance - in addition to my optimizations to avoid double painting of pixels - is to find optimized numbers of rays for common sizeses -> 16x16, 32x32 (we already have that)

I'm not sure I fully understand your "odd rays" idea - basically you draw even numbered ones from centre (outwards) while drawing odd-numbered ones from the edge (inwards) - is this correct?

later today I want to try a similar idea - by just moving the start of "odd" rays left by 1/2 pixel, we possibly get a slightly better coverage and less holes.

Please test my latest commit with your "test setup" because i'm curious if you see a speedup.

Here's the fps on my setup.

38 FPS on 4 16x16 effects
31 FPS on 1 32x32 Rainbow

Experimental method:
34 FPS on 4 16x16 effects
26 FPS on 1 32x32 Rainbow

Combining your lastX/Y and my odd ray method
40 FPS on 4 16x16 effects
35 FPS on 1 32x32 Rainbow

I pushed my odd ray code. You can likely change the math and make it much quicker. It still goes outward but I just increment it a few times before entering the loop so it starts about midway to the edge instead of the center. When you keep track of the prevRay number you can use this method without breaking effects that don't cover full screen.

posx = (posx + inc_x * (vW/4));

This is how I'm jumping away from the center. On square matrices you can go even further from the center, but rectangles will start producing holes. Can likely find a better spot to jump and make it quicker.

@softhack007
Copy link
Collaborator

Combining your lastX/Y and my odd ray method
40 FPS on 4 16x16 effects
35 FPS on 1 32x32 Rainbow

Thanks, I think that's a great result - we are almost twice as fast now, compared to the original code. And still no holes 😃

I'll look at your "odd ray" code later tonight - and maybe add some tuning for the math.

@blazoncek
Copy link
Collaborator

I'll wait with b3.

* minor cleanup, moved prevRay into setPixelColor
* removed experimental code (too slow)
* comments cleanup
fixing holes that appeared during testing
* at 52x52 (big 296 -> 304)
* at 24x32 (medium 192 -> 208)
* at 12x16 (small 72 -> 82)

... there is still one hole at 14x16 ... well wtf
@Brandon502
Copy link
Author

Brandon502 commented May 12, 2024

@softhack007 can't test the 52x52 hole yet, but the holes on 24x32, 12x16, and 14x16 are from the prevRay method it seems. Need to tweak the jump distance a bit more.

Edit: Fixed on my end for the smaller matrices. Just messing around with the highest possible jump and I'll push.

Jump distance for odd rays fixed. Removed holes on rectangular matrices.
@Brandon502
Copy link
Author

Brandon502 commented May 12, 2024

May have messed up while committing. Forgot to pull before committing. 9e0b91a contains the fix. Still need to test on 52x52 though. The jump needed to be the same for each axis. This improves fps slightly for a single 32x32 effect.

Pinwheel_Steps_Small = 72
Pinwheel_Steps_Medium = 192
Are not producing holes anymore at least on on the rectangles I've checked so far.

@Brandon502
Copy link
Author

Brandon502 commented May 12, 2024

OddRays192 EvenRays192 OddRays72 EvenRays72

Quick visualization showing only the odd rays / even rays. May help show areas for further improvement.

softhack007 and others added 2 commits May 13, 2024 19:27
* setPixelColor: ensure that 0/0 is used
* getPixelColor: accuracy improvements

unfortunately, "scrolling" audioreactive effects are still not working properly - they end after 1/4 of the circle. Could be due to limited resolution of getPixelColor.
Fixed getPixelColor.
@Brandon502
Copy link
Author

Brandon502 commented May 13, 2024

@softhack007 fixed music mode effects. Got tired of trying ways to fix the weird rounding issues causing wrong pixel to be used in getPixelColor so I just used the exact same method as setting. Seems like it works. Changed small and medium pinwheel sizes back to the smaller values. I haven't spotted any missing holes.

Waterfall and DJ Lights (scrolling effects) are running at 55+ fps on 32x32.

@softhack007
Copy link
Collaborator

softhack007 commented May 14, 2024

Got tired of trying ways to fix the weird rounding issues causing wrong pixel to be used in getPixelColor so I just used the exact same method as setting

Hi @Brandon502, maybe that's the better solution 👍 .
A part of me keeps saying "DRY - don't repeat yourself" but most important is "it works", and the ray algo is fast enough to re-run it (without setting pixels) in getpixelcolor. It's also very compact in code size so having a reduced version in getPC should be ok.

The other benefit is that we get higher resolution in getpPixelColor now, because we always use an edge pixel not a circle inside the segment frame - that's an improvement especially on non-square layouts.

@blazoncek I think we are good now for beta3. Maybe just wait if something comes up until the evening, then merge the PR.

@softhack007 softhack007 removed the documentation documentation should be updated to explain behaviour label May 14, 2024
reducing code duplication between sPC and gPC
there were still two holes in my 52x52 setup --> added "XL" size for bigger than 50x50 - achieves 18fps on 52x52
@softhack007
Copy link
Collaborator

softhack007 commented May 14, 2024

Hi guys, for me it's good now 😃
@Brandon502 maybe you can do a final cross-check before we merge?

An intersting goodie (and we should keep that) is the behaviour of "Wavesins" with pinwheel mapping.
Its producing a moirè pattern that alternates (black/orange) with every "full scan" round. Looks a bit like a radar screen 😉

Wavesins_Pinwheel

@Brandon502
Copy link
Author

Hi guys, for me it's good now 😃 @Brandon502 maybe you can do a final cross-check before we merge?

An intersting goodie (and we should keep that) is the behaviour of "Wavesins" with pinwheel mapping. Its producing a moirè pattern that alternates (black/orange) with every "full scan" round. Looks a bit like a radar screen 😉

Hey @softhack007, I did a run through of all the effects last night. The only "broken" effects are now ones that use a "stripe" pattern. Like wavesins (above) and traffic light along with a few others. With these you can see the jump on odd rays. Even if you remove the jump they still are somewhat broken since close rays still overwrite each other. Playing with the sliders on wavesins does produce so nice patterns with the jump visible.

I'm not sure why you're getting holes on your large setup. If you want to reduce the count of rays even further can you try these values....

constexpr int Pinwheel_Steps_Small = 64;       
constexpr int Pinwheel_Steps_Medium = 160;
constexpr int Pinwheel_Steps_Big = 296; 
//    8x8  ->  24 steps    36x36  ->  192 steps
//  12x12  ->  48 steps    40x40  ->  216 steps 
// *16x16  ->  64 steps    44x44  ->  256 steps
//  20x20  ->  88 steps    48x48  ->  256 steps
//  24x24  -> 112 steps    52x52  ->  296 steps
//  28x28  -> 136 steps   *56x56  ->  296 steps
// *32x32  -> 160 steps    60x60  ->  408 steps

I've doubled check all of these values, so if they aren't working for you on >32x32 I can't trust my setup anymore lol.

@Brandon502
Copy link
Author

If you like data here's the optimal rays for every grid size and the amount of setPixelColor calls with and without jump needed for each. This data wasn't verified like the numbers above but they seem to all line up. If you ever wanted to make this even more efficient for the very large grids I think I have a way that uses this same method, just can't think of how to code it quite yet. But I'm sure there are even better ways.

Size Rays Total SetCalls JumpSetCalls
4 14 16 32 25
5 22 25 60 49
6 24 36 77 53
7 24 49 92 68
8 24 64 112 88
9 34 81 168 159
10 48 100 264 192
11 48 121 291 219
12 48 144 320 224
13 63 169 456 332
14 64 196 493 365
15 64 225 535 375
16 64 256 572 412
17 64 289 608 448
18 88 324 885 677
19 88 361 935 735
20 88 400 984 792
21 104 441 1220 924
22 112 484 1373 981
23 112 529 1447 1055
24 112 576 1508 1060
25 120 625 1679 1283
26 136 676 1977 1521
27 136 729 2060 1528
28 136 784 2136 1616
29 144 841 2343 1695
30 160 900 2681 1881
31 160 961 2780 1980
32 160 1024 2868 2068
33 184 1089 3407 2491
34 192 1156 3654 2598
35 192 1225 3763 2707
36 192 1296 3876 2724
37 192 1369 3984 2832
38 216 1444 4603 3307
39 216 1521 4727 3447
40 216 1600 4856 3584
41 216 1681 4972 4220
42 256 1764 6025 4233
43 256 1849 6179 4387
44 256 1936 6324 4532
45 256 2025 6460 4540
46 256 2116 6597 4677
47 256 2209 6739 4819
48 256 2304 6880 5760
49 296 2401 8135 5923
50 296 2500 8297 6097
51 296 2601 8468 6752
52 296 2704 8636 6932
53 296 2809 8803 7129
54 296 2916 8965 7843
55 296 3025 9136 8048
56 296 3136 9300 8264
57 368 3249 11775 8279
58 408 3364 13278 9402
59 408 3481 13515 9639
60 408 3600 13752 9672
61 408 3721 13968 9888
62 408 3844 14203 10123
63 408 3969 14427 10143

@softhack007
Copy link
Collaborator

softhack007 commented May 14, 2024

I'm not sure why you're getting holes on your large setup.

One difference is I'm using an -S3 in my setup, which requires a newer framework version (esp-idf 4.4.4 +arduino-esp 2.0.9). Could be that my additional holes comes from slightly different accuracies of sinf/cosf in the new framework. I had two holes in 51x52 and 52x52.

If you like data here's the optimal rays for every grid size and the amount of setPixelColor calls with and without jump needed for each

Thanks 👍 will put them into excel later tonight - maybe there's a pattern. I'd say let's do a second optimization round after -beta3 is out.

@Brandon502
Copy link
Author

If you like data here's the optimal rays for every grid size and the amount of setPixelColor calls with and without jump needed for each

Thanks 👍 will put them into excel later tonight - maybe there's a pattern. I'd say let's do a second optimization round after -beta3 is out.

Sounds good. Just pushed a small change to format getPinwheelAngle the same as getPinwheelLength removing the long single line method.

Final FPS numbers:
41 FPS on 4 16x16
37 FPS on 32x32 Rainbow

wled00/FX_fcn.cpp Outdated Show resolved Hide resolved
@softhack007 softhack007 merged commit e33299b into Aircoookie:0_15 May 15, 2024
18 checks passed
softhack007 added a commit to MoonModules/WLED that referenced this pull request May 15, 2024
Added Pinwheel Expand 1D ->2D effect mapping mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants