kj::ArrayPtr<T>::copyFrom method #2035

mikea · 2024-05-10T17:37:57Z

Type-safe alternative to memcpy. As with the fill case compilers are pretty good at optimizing it too:

c++/src/kj/common.h

kentonv · 2024-05-10T19:18:35Z

In your godbolt, if you mark copy() as noinline, then it no longer optimizes to memcpy. If you add __restrict to one of the input pointers, then it is able to use memcpy again. The problem is that the compiler can only optimized to memcpy if it knows the arrays do not overlap.

In real-world applications, it will probably often be the case that the compiler does not know if the two ArrayPtrs point at overlapping memory, so it won't be able to optimize to memcpy.

There is probably a significant performance difference between memcpy and a sequential loop, since memcpy can use vector instructions and such.

mikea · 2024-05-10T20:10:44Z

There is probably a significant performance difference between memcpy and a sequential loop, since memcpy can use vector instructions and such.

Compiler happily generates vector instructions too when you mark function as noinline. I do not think memcpy has any leg here.

In your godbolt, if you mark copy() as noinline, then it no longer optimizes to memcpy. If you add __restrict to one of the input pointers, then it is able to use memcpy again. The problem is that the compiler can only optimized to memcpy if it knows the arrays do not overlap.

But I do have method as inline.

In real-world applications, it will probably often be the case that the compiler does not know if the two ArrayPtrs point at overlapping memory, so it won't be able to optimize to memcpy.

Would you rather I used manual memcpy or used restrict pointers to help compiler figure it out?

kentonv · 2024-05-10T21:21:00Z

But I do have method as inline.

Yes but the point is not the inlining itself, but that in your particular test case, the compiler is able to see at the inline site that the arrays are not overlapping. If it couldn't see that, then even with inlining it would still have the same problem.

Compiler happily generates vector instructions too when you mark function as noinline.

Normally, a loop like this without restrict pointers cannot use vector instructions because the loop cannot be unrolled because the behavior would differ in the case of overlap.

However, you are right that in the godbolt example, marking the copy() function noinline does produce an implementation with vector ops.

However, this appears to be because it produces a 100-line long output that actually dynamically tests for overlap and chooses an optimized or unoptimized approach depending on what it finds. We really don't want to be generating this quantity of code whenever we do a copy. We really need to let the compiler know that it's safe to assume no overlap.

Would you rather I used manual memcpy or used restrict pointers to help compiler figure it out?

Since it looks like the compiler will optimized a restrict pointer copy into memcpy(), I guess either way is equivalent. I don't have a strong opinion.

jasnell · 2024-05-10T21:34:16Z

LGTM from an API perspective. I'll leave it to someone else to hit the approve button (I'm assuming after the compiler optimization discussion is resolved)

mikea · 2024-05-13T15:35:22Z

updated the loop to be more optimization-friendly and added intersect check. ptal.

jasnell · 2024-05-13T16:07:00Z

Still LGTM but will let someone else hit the approve button

c++/src/kj/common.h

Type-safe alternative to memcpy. As with the fill case compilers are pretty good at optimizing it too: https://godbolt.org/z/dfMxeT1M5

mikea requested review from jasnell and kentonv May 10, 2024 17:37

mikea commented May 10, 2024

View reviewed changes

c++/src/kj/common.h Show resolved Hide resolved

jasnell reviewed May 10, 2024

View reviewed changes

c++/src/kj/common.h Show resolved Hide resolved

mikea force-pushed the maizatskyi/2024-05-10-copy-from branch from 438b062 to 02db659 Compare May 13, 2024 15:32

mikea requested a review from jasnell May 13, 2024 15:35

kentonv approved these changes May 13, 2024

View reviewed changes

c++/src/kj/common.h Outdated Show resolved Hide resolved

kj::ArrayPtr<T>::copyFrom method

61a54b5

Type-safe alternative to memcpy. As with the fill case compilers are pretty good at optimizing it too: https://godbolt.org/z/dfMxeT1M5

mikea force-pushed the maizatskyi/2024-05-10-copy-from branch from 02db659 to 61a54b5 Compare May 13, 2024 16:46

mikea merged commit 8dd8823 into v2 May 13, 2024
14 checks passed

mikea deleted the maizatskyi/2024-05-10-copy-from branch May 13, 2024 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kj::ArrayPtr<T>::copyFrom method #2035

kj::ArrayPtr<T>::copyFrom method #2035

mikea commented May 10, 2024

kentonv commented May 10, 2024

mikea commented May 10, 2024

kentonv commented May 10, 2024

jasnell commented May 10, 2024

mikea commented May 13, 2024

jasnell commented May 13, 2024

kj::ArrayPtr<T>::copyFrom method #2035

kj::ArrayPtr<T>::copyFrom method #2035

Conversation

mikea commented May 10, 2024

kentonv commented May 10, 2024

mikea commented May 10, 2024

kentonv commented May 10, 2024

jasnell commented May 10, 2024

mikea commented May 13, 2024

jasnell commented May 13, 2024