Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Bmi2 instrunction to optmize compact protocol int64 code and deco… #2780

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zzachimed
Copy link

…de,in our unit

test.It show that code has 30% performance boost, and in decode it show 5% performance boost.
origin bench:
$ ./GbenchMarkForCompactProtocol
2023-04-10T00:56:33+08:00
Running ./GbenchMarkForCompactProtocol
Run on (128 X 3300 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x64)
L1 Instruction 32 KiB (x64)
L2 Unified 1280 KiB (x64)
L3 Unified 49152 KiB (x2)
Load Average: 0.71, 0.57, 0.24

Benchmark Time CPU Iterations

BM_Int64_Encode_1 22.1 ns 22.1 ns 31677633
BM_Int64_Encode_2 23.7 ns 23.7 ns 29457528
BM_Int64_Encode_3 26.1 ns 26.1 ns 26844781
BM_Int64_Encode_4 27.1 ns 27.1 ns 25705377
BM_Int64_Encode_5 30.0 ns 29.9 ns 23345095
BM_Int64_Encode_6 32.3 ns 32.3 ns 21658363
BM_Int64_Encode_7 34.5 ns 34.5 ns 20256118
BM_Int64_Encode_8 36.3 ns 36.3 ns 19234451
BM_Int64_Encode_9 37.8 ns 37.8 ns 18510508
BM_Int64_Encode_10 39.8 ns 39.8 ns 17531197
BM_Int64_Decode_1 485 ns 486 ns 1439281
BM_Int64_Decode_2 489 ns 491 ns 1424556
BM_Int64_Decode_3 493 ns 495 ns 1413917
BM_Int64_Decode_4 491 ns 493 ns 1421200
BM_Int64_Decode_5 495 ns 497 ns 1408950
BM_Int64_Decode_6 496 ns 498 ns 1406069
BM_Int64_Decode_7 504 ns 506 ns 1383636
BM_Int64_Decode_8 507 ns 509 ns 1374477
BM_Int64_Decode_9 505 ns 507 ns 1380481
BM_Int64_Decode_10 503 ns 505 ns 1386843

optimized bench:
$ ./GbenchMarkForCompactProtocol
2023-04-10T01:09:25+08:00
Running ./GbenchMarkForCompactProtocol
Run on (128 X 3300 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x64)
L1 Instruction 32 KiB (x64)
L2 Unified 1280 KiB (x64)
L3 Unified 49152 KiB (x2)
Load Average: 0.00, 0.24, 0.31

Benchmark Time CPU Iterations

BM_Int64_Encode_1 21.0 ns 21.0 ns 30414973
BM_Int64_Encode_2 22.3 ns 22.2 ns 28778678
BM_Int64_Encode_3 25.6 ns 25.6 ns 27278662
BM_Int64_Encode_4 29.1 ns 29.1 ns 23984143
BM_Int64_Encode_5 29.3 ns 29.2 ns 23856317
BM_Int64_Encode_6 29.6 ns 29.6 ns 23585080
BM_Int64_Encode_7 29.9 ns 29.8 ns 23423250
BM_Int64_Encode_8 29.7 ns 29.7 ns 23496002
BM_Int64_Encode_9 28.9 ns 28.9 ns 24128142
BM_Int64_Encode_10 28.9 ns 28.8 ns 24132967
BM_Int64_Decode_1 480 ns 480 ns 1430995
BM_Int64_Decode_2 481 ns 481 ns 1390746
BM_Int64_Decode_3 490 ns 492 ns 1395035
BM_Int64_Decode_4 495 ns 496 ns 1409327
BM_Int64_Decode_5 500 ns 501 ns 1398344
BM_Int64_Decode_6 495 ns 497 ns 1408096
BM_Int64_Decode_7 499 ns 501 ns 1398316
BM_Int64_Decode_8 496 ns 497 ns 1409067
BM_Int64_Decode_9 501 ns 503 ns 1392703
BM_Int64_Decode_10 500 ns 502 ns 1395121

  • Did you create an Apache Jira ticket? (Request account here, not required for trivial changes)
  • If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"?
  • Did you squash your changes to a single commit? (not required, but preferred)
  • Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"?
  • If your change does not involve any code, include [skip ci] anywhere in the commit message to free up build resources.

@Jens-G Jens-G added the c++ label Apr 9, 2023
…de,in our unit

test.It show that code has 30% peformance boost, and in decode it show 5% performance
boost.
origin bench:
$ ./GbenchMarkForCompactProtocol
2023-04-10T00:56:33+08:00
Running ./GbenchMarkForCompactProtocol
Run on (128 X 3300 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x64)
  L1 Instruction 32 KiB (x64)
  L2 Unified 1280 KiB (x64)
  L3 Unified 49152 KiB (x2)
Load Average: 0.71, 0.57, 0.24
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
BM_Int64_Encode_1        22.1 ns         22.1 ns     31677633
BM_Int64_Encode_2        23.7 ns         23.7 ns     29457528
BM_Int64_Encode_3        26.1 ns         26.1 ns     26844781
BM_Int64_Encode_4        27.1 ns         27.1 ns     25705377
BM_Int64_Encode_5        30.0 ns         29.9 ns     23345095
BM_Int64_Encode_6        32.3 ns         32.3 ns     21658363
BM_Int64_Encode_7        34.5 ns         34.5 ns     20256118
BM_Int64_Encode_8        36.3 ns         36.3 ns     19234451
BM_Int64_Encode_9        37.8 ns         37.8 ns     18510508
BM_Int64_Encode_10       39.8 ns         39.8 ns     17531197
BM_Int64_Decode_1         485 ns          486 ns      1439281
BM_Int64_Decode_2         489 ns          491 ns      1424556
BM_Int64_Decode_3         493 ns          495 ns      1413917
BM_Int64_Decode_4         491 ns          493 ns      1421200
BM_Int64_Decode_5         495 ns          497 ns      1408950
BM_Int64_Decode_6         496 ns          498 ns      1406069
BM_Int64_Decode_7         504 ns          506 ns      1383636
BM_Int64_Decode_8         507 ns          509 ns      1374477
BM_Int64_Decode_9         505 ns          507 ns      1380481
BM_Int64_Decode_10        503 ns          505 ns      1386843

optmizied bench:
$ ./GbenchMarkForCompactProtocol
2023-04-10T01:09:25+08:00
Running ./GbenchMarkForCompactProtocol
Run on (128 X 3300 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x64)
  L1 Instruction 32 KiB (x64)
  L2 Unified 1280 KiB (x64)
  L3 Unified 49152 KiB (x2)
Load Average: 0.00, 0.24, 0.31
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
BM_Int64_Encode_1        21.0 ns         21.0 ns     30414973
BM_Int64_Encode_2        22.3 ns         22.2 ns     28778678
BM_Int64_Encode_3        25.6 ns         25.6 ns     27278662
BM_Int64_Encode_4        29.1 ns         29.1 ns     23984143
BM_Int64_Encode_5        29.3 ns         29.2 ns     23856317
BM_Int64_Encode_6        29.6 ns         29.6 ns     23585080
BM_Int64_Encode_7        29.9 ns         29.8 ns     23423250
BM_Int64_Encode_8        29.7 ns         29.7 ns     23496002
BM_Int64_Encode_9        28.9 ns         28.9 ns     24128142
BM_Int64_Encode_10       28.9 ns         28.8 ns     24132967
BM_Int64_Decode_1         480 ns          480 ns      1430995
BM_Int64_Decode_2         481 ns          481 ns      1390746
BM_Int64_Decode_3         490 ns          492 ns      1395035
BM_Int64_Decode_4         495 ns          496 ns      1409327
BM_Int64_Decode_5         500 ns          501 ns      1398344
BM_Int64_Decode_6         495 ns          497 ns      1408096
BM_Int64_Decode_7         499 ns          501 ns      1398316
BM_Int64_Decode_8         496 ns          497 ns      1409067
BM_Int64_Decode_9         501 ns          503 ns      1392703
BM_Int64_Decode_10        500 ns          502 ns      1395121
@emmenlau
Copy link
Member

Whow, nice work! But not easy to validate. One option would be to keep this optional and disabled by default, so people can experiment with the new code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants