-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kj::ArrayPtr<T>::copyFrom method #2035
Conversation
In your godbolt, if you mark In real-world applications, it will probably often be the case that the compiler does not know if the two ArrayPtrs point at overlapping memory, so it won't be able to optimize to memcpy. There is probably a significant performance difference between memcpy and a sequential loop, since memcpy can use vector instructions and such. |
Compiler happily generates vector instructions too when you mark function as noinline. I do not think memcpy has any leg here.
But I do have method as inline.
Would you rather I used manual memcpy or used restrict pointers to help compiler figure it out? |
Yes but the point is not the inlining itself, but that in your particular test case, the compiler is able to see at the inline site that the arrays are not overlapping. If it couldn't see that, then even with inlining it would still have the same problem.
Normally, a loop like this without restrict pointers cannot use vector instructions because the loop cannot be unrolled because the behavior would differ in the case of overlap. However, you are right that in the godbolt example, marking the copy() function noinline does produce an implementation with vector ops. However, this appears to be because it produces a 100-line long output that actually dynamically tests for overlap and chooses an optimized or unoptimized approach depending on what it finds. We really don't want to be generating this quantity of code whenever we do a copy. We really need to let the compiler know that it's safe to assume no overlap.
Since it looks like the compiler will optimized a restrict pointer copy into memcpy(), I guess either way is equivalent. I don't have a strong opinion. |
LGTM from an API perspective. I'll leave it to someone else to hit the approve button (I'm assuming after the compiler optimization discussion is resolved) |
438b062
to
02db659
Compare
updated the loop to be more optimization-friendly and added intersect check. ptal. |
Still LGTM but will let someone else hit the approve button |
Type-safe alternative to memcpy. As with the fill case compilers are pretty good at optimizing it too: https://godbolt.org/z/dfMxeT1M5
02db659
to
61a54b5
Compare
Type-safe alternative to memcpy. As with the fill case compilers are pretty good at optimizing it too:
https://godbolt.org/z/dfMxeT1M5