perf: optimize indexing in inner loops #4550

ryuukk · 2024-01-03T11:10:05Z

According to note [1], clang is able to optimize them, shouldn't LDC be able to as well?

[1] - https://github.com/attractivechaos/plb2?tab=readme-ov-file#optimizing-inner-loops

JohanEngelen · 2024-01-03T12:35:32Z

Please add more details here. I have to search different pages and source code to figure out exactly what you mean and what difference you observe between Clang and LDC.

First thing that comes to mind is that for the D source, the optimizer is not able to prove that writing to c[i][j] does not change contents a or b, i.e. that c does not alias a or b; without that proof, it is invalid to hoist a[i][k] or b[k] from the loop. In the C source, the optimizer can see calls to malloc and calloc which guarantee that c can never alias a or b. In D, the call to _d_newarraymiTX (auto c = new double[][](m, p);) is IIRC not recognized (nor marked) as a memory allocation function, thus the optimizer does not get the information to prove that things don't alias.
My first attempt to improve the situation would be to replace the allocation with something that the optimizer does recognize.

ryuukk · 2024-01-03T14:50:30Z

Please add more details here. I have to search different pages and source code to figure out exactly what you mean and what difference you observe between Clang and LDC.

Sorry, i thought using the diff would be enough, because my lack of expertise would have made me choose incorrect wording

https://github.com/attractivechaos/plb2/pull/6/files

Particularly the 2nd commit makes things even faster:

attractivechaos/plb2@7dc32f6

Result in the comment in the 2nd paragraph: attractivechaos/plb2#6 (comment)

First thing that comes to mind is that for the D source, the optimizer is not able to prove that writing to c[i][j] does not change contents a or b, i.e. that c does not alias a or b; without that proof, it is invalid to hoist a[i][k] or b[k] from the loop. In the C source, the optimizer can see calls to malloc and calloc which guarantee that c can never alias a or b. In D, the call to _d_newarraymiTX (auto c = new double[][](m, p);) is IIRC not recognized (nor marked) as a memory allocation function, thus the optimizer does not get the information to prove that things don't alias. My first attempt to improve the situation would be to replace the allocation with something that the optimizer does recognize.

Thanks for the clear explanation

PR got merged, result now is equal to what C with clang produce https://github.com/attractivechaos/plb2?tab=readme-ov-file#appendix-timing-on-apple-m1-macbook-pro

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize indexing in inner loops #4550

perf: optimize indexing in inner loops #4550

ryuukk commented Jan 3, 2024

JohanEngelen commented Jan 3, 2024

ryuukk commented Jan 3, 2024 •

edited

perf: optimize indexing in inner loops #4550

perf: optimize indexing in inner loops #4550

Comments

ryuukk commented Jan 3, 2024

JohanEngelen commented Jan 3, 2024

ryuukk commented Jan 3, 2024 • edited

ryuukk commented Jan 3, 2024 •

edited