Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kj::ArrayPtr<T>::copyFrom method #2035

Merged
merged 1 commit into from
May 13, 2024
Merged

kj::ArrayPtr<T>::copyFrom method #2035

merged 1 commit into from
May 13, 2024

Conversation

mikea
Copy link
Collaborator

@mikea mikea commented May 10, 2024

Type-safe alternative to memcpy. As with the fill case compilers are pretty good at optimizing it too:

https://godbolt.org/z/dfMxeT1M5

@mikea mikea requested review from jasnell and kentonv May 10, 2024 17:37
@kentonv
Copy link
Member

kentonv commented May 10, 2024

In your godbolt, if you mark copy() as noinline, then it no longer optimizes to memcpy. If you add __restrict to one of the input pointers, then it is able to use memcpy again. The problem is that the compiler can only optimized to memcpy if it knows the arrays do not overlap.

In real-world applications, it will probably often be the case that the compiler does not know if the two ArrayPtrs point at overlapping memory, so it won't be able to optimize to memcpy.

There is probably a significant performance difference between memcpy and a sequential loop, since memcpy can use vector instructions and such.

@mikea
Copy link
Collaborator Author

mikea commented May 10, 2024

There is probably a significant performance difference between memcpy and a sequential loop, since memcpy can use vector instructions and such.

Compiler happily generates vector instructions too when you mark function as noinline. I do not think memcpy has any leg here.

In your godbolt, if you mark copy() as noinline, then it no longer optimizes to memcpy. If you add __restrict to one of the input pointers, then it is able to use memcpy again. The problem is that the compiler can only optimized to memcpy if it knows the arrays do not overlap.

But I do have method as inline.

In real-world applications, it will probably often be the case that the compiler does not know if the two ArrayPtrs point at overlapping memory, so it won't be able to optimize to memcpy.

Would you rather I used manual memcpy or used restrict pointers to help compiler figure it out?

@kentonv
Copy link
Member

kentonv commented May 10, 2024

But I do have method as inline.

Yes but the point is not the inlining itself, but that in your particular test case, the compiler is able to see at the inline site that the arrays are not overlapping. If it couldn't see that, then even with inlining it would still have the same problem.

Compiler happily generates vector instructions too when you mark function as noinline.

Normally, a loop like this without restrict pointers cannot use vector instructions because the loop cannot be unrolled because the behavior would differ in the case of overlap.

However, you are right that in the godbolt example, marking the copy() function noinline does produce an implementation with vector ops.

However, this appears to be because it produces a 100-line long output that actually dynamically tests for overlap and chooses an optimized or unoptimized approach depending on what it finds. We really don't want to be generating this quantity of code whenever we do a copy. We really need to let the compiler know that it's safe to assume no overlap.

Would you rather I used manual memcpy or used restrict pointers to help compiler figure it out?

Since it looks like the compiler will optimized a restrict pointer copy into memcpy(), I guess either way is equivalent. I don't have a strong opinion.

@jasnell
Copy link
Collaborator

jasnell commented May 10, 2024

LGTM from an API perspective. I'll leave it to someone else to hit the approve button (I'm assuming after the compiler optimization discussion is resolved)

@mikea mikea force-pushed the maizatskyi/2024-05-10-copy-from branch from 438b062 to 02db659 Compare May 13, 2024 15:32
@mikea
Copy link
Collaborator Author

mikea commented May 13, 2024

updated the loop to be more optimization-friendly and added intersect check. ptal.

@mikea mikea requested a review from jasnell May 13, 2024 15:35
@jasnell
Copy link
Collaborator

jasnell commented May 13, 2024

Still LGTM but will let someone else hit the approve button

c++/src/kj/common.h Outdated Show resolved Hide resolved
Type-safe alternative to memcpy. As with the fill case
compilers are pretty good at optimizing it too:

https://godbolt.org/z/dfMxeT1M5
@mikea mikea force-pushed the maizatskyi/2024-05-10-copy-from branch from 02db659 to 61a54b5 Compare May 13, 2024 16:46
@mikea mikea merged commit 8dd8823 into v2 May 13, 2024
14 checks passed
@mikea mikea deleted the maizatskyi/2024-05-10-copy-from branch May 13, 2024 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants