Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize indexing in inner loops #4550

Open
ryuukk opened this issue Jan 3, 2024 · 2 comments
Open

perf: optimize indexing in inner loops #4550

ryuukk opened this issue Jan 3, 2024 · 2 comments

Comments

@ryuukk
Copy link
Contributor

ryuukk commented Jan 3, 2024

Benchmark: attractivechaos/plb2#6

According to note [1], clang is able to optimize them, shouldn't LDC be able to as well?

[1] - https://github.com/attractivechaos/plb2?tab=readme-ov-file#optimizing-inner-loops

@JohanEngelen
Copy link
Member

Please add more details here. I have to search different pages and source code to figure out exactly what you mean and what difference you observe between Clang and LDC.

First thing that comes to mind is that for the D source, the optimizer is not able to prove that writing to c[i][j] does not change contents a or b, i.e. that c does not alias a or b; without that proof, it is invalid to hoist a[i][k] or b[k] from the loop. In the C source, the optimizer can see calls to malloc and calloc which guarantee that c can never alias a or b. In D, the call to _d_newarraymiTX (auto c = new double[][](m, p);) is IIRC not recognized (nor marked) as a memory allocation function, thus the optimizer does not get the information to prove that things don't alias.
My first attempt to improve the situation would be to replace the allocation with something that the optimizer does recognize.

@ryuukk
Copy link
Contributor Author

ryuukk commented Jan 3, 2024

Please add more details here. I have to search different pages and source code to figure out exactly what you mean and what difference you observe between Clang and LDC.

Sorry, i thought using the diff would be enough, because my lack of expertise would have made me choose incorrect wording

https://github.com/attractivechaos/plb2/pull/6/files

Particularly the 2nd commit makes things even faster:

attractivechaos/plb2@7dc32f6

Result in the comment in the 2nd paragraph: attractivechaos/plb2#6 (comment)

First thing that comes to mind is that for the D source, the optimizer is not able to prove that writing to c[i][j] does not change contents a or b, i.e. that c does not alias a or b; without that proof, it is invalid to hoist a[i][k] or b[k] from the loop. In the C source, the optimizer can see calls to malloc and calloc which guarantee that c can never alias a or b. In D, the call to _d_newarraymiTX (auto c = new double[][](m, p);) is IIRC not recognized (nor marked) as a memory allocation function, thus the optimizer does not get the information to prove that things don't alias. My first attempt to improve the situation would be to replace the allocation with something that the optimizer does recognize.

Thanks for the clear explanation

PR got merged, result now is equal to what C with clang produce https://github.com/attractivechaos/plb2?tab=readme-ov-file#appendix-timing-on-apple-m1-macbook-pro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants