Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86-64: Implement an enhancement for byte array and char array System.arraycopy #7332

Merged
merged 3 commits into from
May 27, 2024

Conversation

a7ehuo
Copy link
Contributor

@a7ehuo a7ehuo commented May 10, 2024

Implement an enhancement for byte array and char array System.arraycopy on x86-64.

Also add JIT options that disable array copy enhancement:

  • disableArrayCopyEnhancementByteArray: Disable array copy enhancement for 8 bit primitive array
  • disableArrayCopyEnhancementCharArray: Disable array copy enhancement for 16 bit primitive array

@a7ehuo
Copy link
Contributor Author

a7ehuo commented May 10, 2024

@0xdaryl May I ask you to review this change? Thank you!

@vijaysun-omr @hzongaro fyi

@a7ehuo
Copy link
Contributor Author

a7ehuo commented May 10, 2024

x86-64 macOS failure is a known issue #7181

The following tests FAILED:
	  2 - TestJitBuilderAPIGenerator (Failed)

@0xdaryl
Copy link
Contributor

0xdaryl commented May 15, 2024

Can you attach a JIT log for a method that shows the code generated for these functions? Just a tracecg of a single method should be fine. Thanks.

@a7ehuo
Copy link
Contributor Author

a7ehuo commented May 15, 2024

Can you attach a JIT log for a method that shows the code generated for these functions?

Search arrayCopy8BitPrimitiveEnhancementImplRoot8 for byte array or arrayCopy16BitPrimitiveEnhancementImplRoot16 for char array to jump to the code gen section on arraycopy

jitlog.arrayCopy8BitPrimitiveEnhancementImplRoot8.txt

jitlog.arrayCopy16BitPrimitiveEnhancementImplRoot16.txt

@0xdaryl
Copy link
Contributor

0xdaryl commented May 17, 2024

Generally, I think the instruction sequences are reasonable. Please consider Brad’s suggestions where appropriate for using wider instructions on hardware that supports it.

My main feedback concerns the naming of these optimizations. Calling them “enhancements”, while technically correct, is too generic for someone who wasn’t involved in its development up front. There aren’t even any code comments to describe what the enhancements are, leaving others to read and understand the code to learn about them. I suggest incorporating something like “inlineSmallArraycopiesWithoutSTOS” into the naming would make it better. For example, “inlineSmall16BitPrimitiveArraycopiesWithoutSTOS”

Longer term, I’m not convinced that inlining this much code at each arraycopy call (over 50 instructions) is worth the simplicity and savings from calling out to a helper to do the same thing. I’m willing to let the current inline design go through now, but I would like a helper approach evaluated for the future. This might be fine for a method with one or two arraycopies, but if there are several arraycopies in a method (and we do see methods like that) the code footprint will add up quickly.

@a7ehuo
Copy link
Contributor Author

a7ehuo commented May 21, 2024

I'm currently running sanity test on the changes that address the above comments. I'll push another update once the sanity test looks good

@a7ehuo a7ehuo force-pushed the system-arraycopy-perf-10-byte-char branch from 6cc0763 to 9065569 Compare May 22, 2024 17:36
@a7ehuo
Copy link
Contributor Author

a7ehuo commented May 22, 2024

@0xdaryl Comments are addressed. Ready for another review. Thank you!

  • Fixed temp vector register type to TR_VRF
  • Updated the function name and their corresponding option names to be more specific
    • arrayCopy8BitPrimitiveInlineSmallSizeWithoutREPMOVS
    • arrayCopy16BitPrimitiveInlineSmallSizeWithoutREPMOVS
  • Added comments in the beginning of the implementation to give a detailed description of the inlined code sequence

@0xdaryl 0xdaryl self-assigned this May 23, 2024
Copy link
Contributor

@0xdaryl 0xdaryl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comments describing the operation. They look good!

I missed publishing a couple of other comments in my earlier review for some reason. They are minor changes only.

I realize the test farm isn't in great shape right now. Can you confirm you tested this code internally on both 64-bit and 32-bit?

compiler/x/codegen/OMRTreeEvaluator.cpp Outdated Show resolved Hide resolved
compiler/x/codegen/OMRTreeEvaluator.cpp Outdated Show resolved Hide resolved
@0xdaryl
Copy link
Contributor

0xdaryl commented May 23, 2024

@BradleyWood : please approve if your comments are addressed

@a7ehuo
Copy link
Contributor Author

a7ehuo commented May 23, 2024

Can you confirm you tested this code internally on both 64-bit and 32-bit?

I tested only on 64-bit machines since I thought the enhancement was guarded with checking if target is 64-bit. I'll test on 32-bit as well after addressing the latest comments.

bool enable8BitPrimitiveArrayCopyInlineSmallSizeWithoutREPMOVS = ((elementSize == 1) &&
                                                                     !disableEnhancement &&
                                                                     cg->comp()->target().cpu.supportsAVX() &&
                                                                     cg->comp()->target().is64Bit()) ? true : false;
bool enable16BitPrimitiveArrayCopyInlineSmallSizeWithoutREPMOVS = (!disableEnhancement &&
                                                                 cg->comp()->target().cpu.supportsAVX() &&
                                                                 cg->comp()->target().is64Bit()) ? true : false;

a7ehuo and others added 3 commits May 23, 2024 14:15
The setup to run `rep movsd` is not efficient on copying smaller sizes.
The enhancement inlines copy size equal or less than 64 bytes without
using `rep movsd`.

Signed-off-by: Annabelle Huo <Annabelle.Huo@ibm.com>
The setup to run `rep movsb` is not efficient on copying smaller sizes.
The enhancement inlines copy size equal or less than 64 bytes without
using `rep movsb`.

Co-Authored-By: Henry Zongaro <zongaro@ca.ibm.com>
Signed-off-by: Annabelle Huo <Annabelle.Huo@ibm.com>
`disableArrayCopyByteArrayInlineSmallSizeWithoutREPMOVS`
- Disable array copy enhancement for 8 bit primitive array

`disableArrayCopyCharArrayInlineSmallSizeWithoutREPMOVS`
- Disable array copy enhancement for 16 bit primitive array

Signed-off-by: Annabelle Huo <Annabelle.Huo@ibm.com>
@a7ehuo a7ehuo force-pushed the system-arraycopy-perf-10-byte-char branch from 9065569 to a81d5c0 Compare May 27, 2024 12:48
@a7ehuo
Copy link
Contributor Author

a7ehuo commented May 27, 2024

@0xdaryl Addressed all review comments. Ready for another review. Thank you!

  • Internal Jenkins test (ID 22283): JDK 8 sanity.functional,sanity.system,sanity.openjdk,extended.system,extended.functional with x86-32_windows, all passed.
  • Internal Jenkins test (ID 22275): JDK11 and JDK 17 sanity.functional,sanity.system,sanity.openjdk,extended.system,extended.functional with x86-64_linux : Seen some failures but the failures are not related to this change

@0xdaryl 0xdaryl merged commit 5c74758 into eclipse:master May 27, 2024
4 of 6 checks passed
@0xdaryl
Copy link
Contributor

0xdaryl commented May 27, 2024

@vijaysun-omr : FYI

@vijaysun-omr
Copy link
Contributor

Since we want to keep all codegens aware of key changes in one, I thought I'd mention @zl-wang @r30shah and @knn-k here so that they can consider whatever feels relevant to their platform

@knn-k
Copy link
Contributor

knn-k commented May 31, 2024

AArch64 codegen inlines arraycopy only when the copy length is a constant at the compile time. Otherwise it generates instructions to call arraycopy helpers in https://github.com/eclipse/omr/blob/master/compiler/aarch64/runtime/ARM64arrayCopy.spp
Maybe I can add special helpers for arrays of short/int/long types that skip the checks for lower bits of the copy length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants