Use Bmi2 instrunction to optmize compact protocol int64 code and deco… #2780
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…de,in our unit
test.It show that code has 30% performance boost, and in decode it show 5% performance boost.
origin bench:
$ ./GbenchMarkForCompactProtocol
2023-04-10T00:56:33+08:00
Running ./GbenchMarkForCompactProtocol
Run on (128 X 3300 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x64)
L1 Instruction 32 KiB (x64)
L2 Unified 1280 KiB (x64)
L3 Unified 49152 KiB (x2)
Load Average: 0.71, 0.57, 0.24
Benchmark Time CPU Iterations
BM_Int64_Encode_1 22.1 ns 22.1 ns 31677633
BM_Int64_Encode_2 23.7 ns 23.7 ns 29457528
BM_Int64_Encode_3 26.1 ns 26.1 ns 26844781
BM_Int64_Encode_4 27.1 ns 27.1 ns 25705377
BM_Int64_Encode_5 30.0 ns 29.9 ns 23345095
BM_Int64_Encode_6 32.3 ns 32.3 ns 21658363
BM_Int64_Encode_7 34.5 ns 34.5 ns 20256118
BM_Int64_Encode_8 36.3 ns 36.3 ns 19234451
BM_Int64_Encode_9 37.8 ns 37.8 ns 18510508
BM_Int64_Encode_10 39.8 ns 39.8 ns 17531197
BM_Int64_Decode_1 485 ns 486 ns 1439281
BM_Int64_Decode_2 489 ns 491 ns 1424556
BM_Int64_Decode_3 493 ns 495 ns 1413917
BM_Int64_Decode_4 491 ns 493 ns 1421200
BM_Int64_Decode_5 495 ns 497 ns 1408950
BM_Int64_Decode_6 496 ns 498 ns 1406069
BM_Int64_Decode_7 504 ns 506 ns 1383636
BM_Int64_Decode_8 507 ns 509 ns 1374477
BM_Int64_Decode_9 505 ns 507 ns 1380481
BM_Int64_Decode_10 503 ns 505 ns 1386843
optimized bench:
$ ./GbenchMarkForCompactProtocol
2023-04-10T01:09:25+08:00
Running ./GbenchMarkForCompactProtocol
Run on (128 X 3300 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x64)
L1 Instruction 32 KiB (x64)
L2 Unified 1280 KiB (x64)
L3 Unified 49152 KiB (x2)
Load Average: 0.00, 0.24, 0.31
Benchmark Time CPU Iterations
BM_Int64_Encode_1 21.0 ns 21.0 ns 30414973
BM_Int64_Encode_2 22.3 ns 22.2 ns 28778678
BM_Int64_Encode_3 25.6 ns 25.6 ns 27278662
BM_Int64_Encode_4 29.1 ns 29.1 ns 23984143
BM_Int64_Encode_5 29.3 ns 29.2 ns 23856317
BM_Int64_Encode_6 29.6 ns 29.6 ns 23585080
BM_Int64_Encode_7 29.9 ns 29.8 ns 23423250
BM_Int64_Encode_8 29.7 ns 29.7 ns 23496002
BM_Int64_Encode_9 28.9 ns 28.9 ns 24128142
BM_Int64_Encode_10 28.9 ns 28.8 ns 24132967
BM_Int64_Decode_1 480 ns 480 ns 1430995
BM_Int64_Decode_2 481 ns 481 ns 1390746
BM_Int64_Decode_3 490 ns 492 ns 1395035
BM_Int64_Decode_4 495 ns 496 ns 1409327
BM_Int64_Decode_5 500 ns 501 ns 1398344
BM_Int64_Decode_6 495 ns 497 ns 1408096
BM_Int64_Decode_7 499 ns 501 ns 1398316
BM_Int64_Decode_8 496 ns 497 ns 1409067
BM_Int64_Decode_9 501 ns 503 ns 1392703
BM_Int64_Decode_10 500 ns 502 ns 1395121
[skip ci]
anywhere in the commit message to free up build resources.