Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lockfree fixed size sublist cache #4451

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

HeavyHorst
Copy link

  • Link to issue, e.g. Resolves #NNN
  • Documentation added (if applicable)
  • Tests added
  • Branch rebased on top of current main (git pull --rebase origin main)
  • Changes squashed to a single commit (described here)
  • Build is green in Travis CI
  • You have certified that the contribution is your original work and that you license the work to the project under the Apache 2 license

Resolves #4341

Changes proposed in this pull request:

The sublist cache currently uses a map[string]*SublistResult to store the values.
A go map is obviously not limited in size so the current implementation runs reduceCacheCount() to remove random elements from the cache once there are slCacheMax = 1024 elemnts in it.
It remove elements untill there are only slCacheSweep = 256 left in it. This becomes really expensive when there are many subscriptions and the hit rate is low because the cache just permanently deletes elements from the cache and locks for a relatively long time.

I experimented with a simple fixed size cache implementation that is just backed by a fixed size array.
Its also lock free using atomic.Pointer. I haven't implemented a collision strategy jet (not sure its needed for this cache).

type Cache struct {
	data     [slCacheMax]atomic.Pointer[cacheEntry]
	hashFunc func(string) int
}

type cacheEntry struct {
	key   string
	value *SublistResult
}

This implementation ensures that there are never more than slCacheMax elements in the cache with zero overhead.
Its also faster in all of my tests. I still need to add unit tests but alread wanted to drop it here for discussion.

Nats bench results in my Ryzen 9 5900HX:

Old Cache

Old cache slCacheMax 1024:
nats bench foo --msgs 1000000 --pub 100 --no-progress --multisubject --multisubjectmax 100000
Pub stats: 1,465,732 msgs/sec ~ 178.92 MB/sec
[1] 27,027 msgs/sec ~ 3.30 MB/sec (10000 msgs)
[2] 25,414 msgs/sec ~ 3.10 MB/sec (10000 msgs)
[3] 25,258 msgs/sec ~ 3.08 MB/sec (10000 msgs)
[4] 24,885 msgs/sec ~ 3.04 MB/sec (10000 msgs)
[5] 24,929 msgs/sec ~ 3.04 MB/sec (10000 msgs)
[6] 23,350 msgs/sec ~ 2.85 MB/sec (10000 msgs)
[7] 22,923 msgs/sec ~ 2.80 MB/sec (10000 msgs)
[8] 22,887 msgs/sec ~ 2.79 MB/sec (10000 msgs)
[9] 22,801 msgs/sec ~ 2.78 MB/sec (10000 msgs)
[10] 21,709 msgs/sec ~ 2.65 MB/sec (10000 msgs)
[11] 21,549 msgs/sec ~ 2.63 MB/sec (10000 msgs)
[12] 21,456 msgs/sec ~ 2.62 MB/sec (10000 msgs)
[13] 21,184 msgs/sec ~ 2.59 MB/sec (10000 msgs)
[14] 20,086 msgs/sec ~ 2.45 MB/sec (10000 msgs)
[15] 19,926 msgs/sec ~ 2.43 MB/sec (10000 msgs)
[16] 19,687 msgs/sec ~ 2.40 MB/sec (10000 msgs)
[17] 19,500 msgs/sec ~ 2.38 MB/sec (10000 msgs)
[18] 19,379 msgs/sec ~ 2.37 MB/sec (10000 msgs)
[19] 19,370 msgs/sec ~ 2.36 MB/sec (10000 msgs)
[20] 19,321 msgs/sec ~ 2.36 MB/sec (10000 msgs)
[21] 19,313 msgs/sec ~ 2.36 MB/sec (10000 msgs)
[22] 19,315 msgs/sec ~ 2.36 MB/sec (10000 msgs)
[23] 19,257 msgs/sec ~ 2.35 MB/sec (10000 msgs)
[24] 19,192 msgs/sec ~ 2.34 MB/sec (10000 msgs)
[25] 19,190 msgs/sec ~ 2.34 MB/sec (10000 msgs)
[26] 19,175 msgs/sec ~ 2.34 MB/sec (10000 msgs)
[27] 19,082 msgs/sec ~ 2.33 MB/sec (10000 msgs)
[28] 18,537 msgs/sec ~ 2.26 MB/sec (10000 msgs)
[29] 18,517 msgs/sec ~ 2.26 MB/sec (10000 msgs)
[30] 18,484 msgs/sec ~ 2.26 MB/sec (10000 msgs)
[31] 18,395 msgs/sec ~ 2.25 MB/sec (10000 msgs)
[32] 18,305 msgs/sec ~ 2.23 MB/sec (10000 msgs)
[33] 18,196 msgs/sec ~ 2.22 MB/sec (10000 msgs)
[34] 18,029 msgs/sec ~ 2.20 MB/sec (10000 msgs)
[35] 18,028 msgs/sec ~ 2.20 MB/sec (10000 msgs)
[36] 18,027 msgs/sec ~ 2.20 MB/sec (10000 msgs)
[37] 17,820 msgs/sec ~ 2.18 MB/sec (10000 msgs)
[38] 17,660 msgs/sec ~ 2.16 MB/sec (10000 msgs)
[39] 17,599 msgs/sec ~ 2.15 MB/sec (10000 msgs)
[40] 17,387 msgs/sec ~ 2.12 MB/sec (10000 msgs)
[41] 17,165 msgs/sec ~ 2.10 MB/sec (10000 msgs)
[42] 17,104 msgs/sec ~ 2.09 MB/sec (10000 msgs)
[43] 16,919 msgs/sec ~ 2.07 MB/sec (10000 msgs)
[44] 16,943 msgs/sec ~ 2.07 MB/sec (10000 msgs)
[45] 16,916 msgs/sec ~ 2.06 MB/sec (10000 msgs)
[46] 16,942 msgs/sec ~ 2.07 MB/sec (10000 msgs)
[47] 16,934 msgs/sec ~ 2.07 MB/sec (10000 msgs)
[48] 16,890 msgs/sec ~ 2.06 MB/sec (10000 msgs)
[49] 16,856 msgs/sec ~ 2.06 MB/sec (10000 msgs)
[50] 16,855 msgs/sec ~ 2.06 MB/sec (10000 msgs)
[51] 16,831 msgs/sec ~ 2.05 MB/sec (10000 msgs)
[52] 16,816 msgs/sec ~ 2.05 MB/sec (10000 msgs)
[53] 16,811 msgs/sec ~ 2.05 MB/sec (10000 msgs)
[54] 16,484 msgs/sec ~ 2.01 MB/sec (10000 msgs)
[55] 16,353 msgs/sec ~ 2.00 MB/sec (10000 msgs)
[56] 16,307 msgs/sec ~ 1.99 MB/sec (10000 msgs)
[57] 16,294 msgs/sec ~ 1.99 MB/sec (10000 msgs)
[58] 16,248 msgs/sec ~ 1.98 MB/sec (10000 msgs)
[59] 16,225 msgs/sec ~ 1.98 MB/sec (10000 msgs)
[60] 16,134 msgs/sec ~ 1.97 MB/sec (10000 msgs)
[61] 16,038 msgs/sec ~ 1.96 MB/sec (10000 msgs)
[62] 15,964 msgs/sec ~ 1.95 MB/sec (10000 msgs)
[63] 15,757 msgs/sec ~ 1.92 MB/sec (10000 msgs)
[64] 15,728 msgs/sec ~ 1.92 MB/sec (10000 msgs)
[65] 15,691 msgs/sec ~ 1.92 MB/sec (10000 msgs)
[66] 15,676 msgs/sec ~ 1.91 MB/sec (10000 msgs)
[67] 15,663 msgs/sec ~ 1.91 MB/sec (10000 msgs)
[68] 15,640 msgs/sec ~ 1.91 MB/sec (10000 msgs)
[69] 15,651 msgs/sec ~ 1.91 MB/sec (10000 msgs)
[70] 15,606 msgs/sec ~ 1.91 MB/sec (10000 msgs)
[71] 15,586 msgs/sec ~ 1.90 MB/sec (10000 msgs)
[72] 15,569 msgs/sec ~ 1.90 MB/sec (10000 msgs)
[73] 15,566 msgs/sec ~ 1.90 MB/sec (10000 msgs)
[74] 15,538 msgs/sec ~ 1.90 MB/sec (10000 msgs)
[75] 15,472 msgs/sec ~ 1.89 MB/sec (10000 msgs)
[76] 15,466 msgs/sec ~ 1.89 MB/sec (10000 msgs)
[77] 15,339 msgs/sec ~ 1.87 MB/sec (10000 msgs)
[78] 15,245 msgs/sec ~ 1.86 MB/sec (10000 msgs)
[79] 15,238 msgs/sec ~ 1.86 MB/sec (10000 msgs)
[80] 15,227 msgs/sec ~ 1.86 MB/sec (10000 msgs)
[81] 15,143 msgs/sec ~ 1.85 MB/sec (10000 msgs)
[82] 15,127 msgs/sec ~ 1.85 MB/sec (10000 msgs)
[83] 15,091 msgs/sec ~ 1.84 MB/sec (10000 msgs)
[84] 15,035 msgs/sec ~ 1.84 MB/sec (10000 msgs)
[85] 15,022 msgs/sec ~ 1.83 MB/sec (10000 msgs)
[86] 14,985 msgs/sec ~ 1.83 MB/sec (10000 msgs)
[87] 14,978 msgs/sec ~ 1.83 MB/sec (10000 msgs)
[88] 14,966 msgs/sec ~ 1.83 MB/sec (10000 msgs)
[89] 14,920 msgs/sec ~ 1.82 MB/sec (10000 msgs)
[90] 14,874 msgs/sec ~ 1.82 MB/sec (10000 msgs)
[91] 14,822 msgs/sec ~ 1.81 MB/sec (10000 msgs)
[92] 14,817 msgs/sec ~ 1.81 MB/sec (10000 msgs)
[93] 14,717 msgs/sec ~ 1.80 MB/sec (10000 msgs)
[94] 14,715 msgs/sec ~ 1.80 MB/sec (10000 msgs)
[95] 14,687 msgs/sec ~ 1.79 MB/sec (10000 msgs)
[96] 14,706 msgs/sec ~ 1.80 MB/sec (10000 msgs)
[97] 14,697 msgs/sec ~ 1.79 MB/sec (10000 msgs)
[98] 14,692 msgs/sec ~ 1.79 MB/sec (10000 msgs)
[99] 14,683 msgs/sec ~ 1.79 MB/sec (10000 msgs)
[100] 14,668 msgs/sec ~ 1.79 MB/sec (10000 msgs)
min 14,668 | avg 17,551 | max 27,027 | stddev 2,817 msgs

Old cache, cache disabled:
nats bench foo --msgs 1000000 --pub 100 --no-progress --multisubject --multisubjectmax 100000
Pub stats: 3,794,630 msgs/sec ~ 463.21 MB/sec
[1] 68,086 msgs/sec ~ 8.31 MB/sec (10000 msgs)
[2] 62,718 msgs/sec ~ 7.66 MB/sec (10000 msgs)
[3] 57,589 msgs/sec ~ 7.03 MB/sec (10000 msgs)
[4] 56,457 msgs/sec ~ 6.89 MB/sec (10000 msgs)
[5] 54,014 msgs/sec ~ 6.59 MB/sec (10000 msgs)
[6] 54,073 msgs/sec ~ 6.60 MB/sec (10000 msgs)
[7] 53,102 msgs/sec ~ 6.48 MB/sec (10000 msgs)
[8] 52,543 msgs/sec ~ 6.41 MB/sec (10000 msgs)
[9] 52,269 msgs/sec ~ 6.38 MB/sec (10000 msgs)
[10] 50,392 msgs/sec ~ 6.15 MB/sec (10000 msgs)
[11] 50,989 msgs/sec ~ 6.22 MB/sec (10000 msgs)
[12] 50,920 msgs/sec ~ 6.22 MB/sec (10000 msgs)
[13] 50,902 msgs/sec ~ 6.21 MB/sec (10000 msgs)
[14] 49,691 msgs/sec ~ 6.07 MB/sec (10000 msgs)
[15] 50,189 msgs/sec ~ 6.13 MB/sec (10000 msgs)
[16] 49,588 msgs/sec ~ 6.05 MB/sec (10000 msgs)
[17] 49,199 msgs/sec ~ 6.01 MB/sec (10000 msgs)
[18] 49,178 msgs/sec ~ 6.00 MB/sec (10000 msgs)
[19] 49,098 msgs/sec ~ 5.99 MB/sec (10000 msgs)
[20] 48,299 msgs/sec ~ 5.90 MB/sec (10000 msgs)
[21] 46,890 msgs/sec ~ 5.72 MB/sec (10000 msgs)
[22] 46,690 msgs/sec ~ 5.70 MB/sec (10000 msgs)
[23] 46,648 msgs/sec ~ 5.69 MB/sec (10000 msgs)
[24] 45,495 msgs/sec ~ 5.55 MB/sec (10000 msgs)
[25] 44,908 msgs/sec ~ 5.48 MB/sec (10000 msgs)
[26] 45,436 msgs/sec ~ 5.55 MB/sec (10000 msgs)
[27] 45,133 msgs/sec ~ 5.51 MB/sec (10000 msgs)
[28] 45,084 msgs/sec ~ 5.50 MB/sec (10000 msgs)
[29] 44,911 msgs/sec ~ 5.48 MB/sec (10000 msgs)
[30] 44,080 msgs/sec ~ 5.38 MB/sec (10000 msgs)
[31] 44,444 msgs/sec ~ 5.43 MB/sec (10000 msgs)
[32] 43,872 msgs/sec ~ 5.36 MB/sec (10000 msgs)
[33] 44,027 msgs/sec ~ 5.37 MB/sec (10000 msgs)
[34] 43,986 msgs/sec ~ 5.37 MB/sec (10000 msgs)
[35] 43,939 msgs/sec ~ 5.36 MB/sec (10000 msgs)
[36] 43,107 msgs/sec ~ 5.26 MB/sec (10000 msgs)
[37] 43,097 msgs/sec ~ 5.26 MB/sec (10000 msgs)
[38] 43,652 msgs/sec ~ 5.33 MB/sec (10000 msgs)
[39] 43,579 msgs/sec ~ 5.32 MB/sec (10000 msgs)
[40] 43,537 msgs/sec ~ 5.31 MB/sec (10000 msgs)
[41] 43,520 msgs/sec ~ 5.31 MB/sec (10000 msgs)
[42] 43,498 msgs/sec ~ 5.31 MB/sec (10000 msgs)
[43] 43,428 msgs/sec ~ 5.30 MB/sec (10000 msgs)
[44] 43,170 msgs/sec ~ 5.27 MB/sec (10000 msgs)
[45] 42,352 msgs/sec ~ 5.17 MB/sec (10000 msgs)
[46] 42,222 msgs/sec ~ 5.15 MB/sec (10000 msgs)
[47] 42,044 msgs/sec ~ 5.13 MB/sec (10000 msgs)
[48] 41,943 msgs/sec ~ 5.12 MB/sec (10000 msgs)
[49] 41,278 msgs/sec ~ 5.04 MB/sec (10000 msgs)
[50] 41,719 msgs/sec ~ 5.09 MB/sec (10000 msgs)
[51] 41,168 msgs/sec ~ 5.03 MB/sec (10000 msgs)
[52] 41,686 msgs/sec ~ 5.09 MB/sec (10000 msgs)
[53] 41,683 msgs/sec ~ 5.09 MB/sec (10000 msgs)
[54] 41,622 msgs/sec ~ 5.08 MB/sec (10000 msgs)
[55] 41,506 msgs/sec ~ 5.07 MB/sec (10000 msgs)
[56] 41,498 msgs/sec ~ 5.07 MB/sec (10000 msgs)
[57] 41,402 msgs/sec ~ 5.05 MB/sec (10000 msgs)
[58] 41,333 msgs/sec ~ 5.05 MB/sec (10000 msgs)
[59] 41,285 msgs/sec ~ 5.04 MB/sec (10000 msgs)
[60] 40,581 msgs/sec ~ 4.95 MB/sec (10000 msgs)
[61] 41,008 msgs/sec ~ 5.01 MB/sec (10000 msgs)
[62] 41,004 msgs/sec ~ 5.01 MB/sec (10000 msgs)
[63] 40,806 msgs/sec ~ 4.98 MB/sec (10000 msgs)
[64] 40,693 msgs/sec ~ 4.97 MB/sec (10000 msgs)
[65] 40,248 msgs/sec ~ 4.91 MB/sec (10000 msgs)
[66] 40,738 msgs/sec ~ 4.97 MB/sec (10000 msgs)
[67] 40,234 msgs/sec ~ 4.91 MB/sec (10000 msgs)
[68] 40,198 msgs/sec ~ 4.91 MB/sec (10000 msgs)
[69] 40,099 msgs/sec ~ 4.89 MB/sec (10000 msgs)
[70] 39,722 msgs/sec ~ 4.85 MB/sec (10000 msgs)
[71] 39,638 msgs/sec ~ 4.84 MB/sec (10000 msgs)
[72] 39,613 msgs/sec ~ 4.84 MB/sec (10000 msgs)
[73] 39,605 msgs/sec ~ 4.83 MB/sec (10000 msgs)
[74] 39,598 msgs/sec ~ 4.83 MB/sec (10000 msgs)
[75] 39,591 msgs/sec ~ 4.83 MB/sec (10000 msgs)
[76] 39,474 msgs/sec ~ 4.82 MB/sec (10000 msgs)
[77] 39,471 msgs/sec ~ 4.82 MB/sec (10000 msgs)
[78] 38,903 msgs/sec ~ 4.75 MB/sec (10000 msgs)
[79] 39,322 msgs/sec ~ 4.80 MB/sec (10000 msgs)
[80] 39,230 msgs/sec ~ 4.79 MB/sec (10000 msgs)
[81] 39,033 msgs/sec ~ 4.76 MB/sec (10000 msgs)
[82] 38,992 msgs/sec ~ 4.76 MB/sec (10000 msgs)
[83] 38,986 msgs/sec ~ 4.76 MB/sec (10000 msgs)
[84] 38,966 msgs/sec ~ 4.76 MB/sec (10000 msgs)
[85] 38,776 msgs/sec ~ 4.73 MB/sec (10000 msgs)
[86] 38,741 msgs/sec ~ 4.73 MB/sec (10000 msgs)
[87] 38,717 msgs/sec ~ 4.73 MB/sec (10000 msgs)
[88] 38,706 msgs/sec ~ 4.72 MB/sec (10000 msgs)
[89] 38,683 msgs/sec ~ 4.72 MB/sec (10000 msgs)
[90] 38,562 msgs/sec ~ 4.71 MB/sec (10000 msgs)
[91] 38,619 msgs/sec ~ 4.71 MB/sec (10000 msgs)
[92] 38,584 msgs/sec ~ 4.71 MB/sec (10000 msgs)
[93] 38,124 msgs/sec ~ 4.65 MB/sec (10000 msgs)
[94] 38,574 msgs/sec ~ 4.71 MB/sec (10000 msgs)
[95] 38,080 msgs/sec ~ 4.65 MB/sec (10000 msgs)
[96] 38,031 msgs/sec ~ 4.64 MB/sec (10000 msgs)
[97] 38,478 msgs/sec ~ 4.70 MB/sec (10000 msgs)
[98] 38,471 msgs/sec ~ 4.70 MB/sec (10000 msgs)
[99] 38,438 msgs/sec ~ 4.69 MB/sec (10000 msgs)
[100] 38,411 msgs/sec ~ 4.69 MB/sec (10000 msgs)
min 38,031 | avg 43,619 | max 68,086 | stddev 5,583 msgs

Old cache slCacheMax 1024:
nats bench foo --msgs 1000000 --pub 100 --no-progress --multisubject --multisubjectmax 10
Pub stats: 13,183,781 msgs/sec ~ 1.57 GB/sec
[1] 792,935 msgs/sec ~ 96.79 MB/sec (10000 msgs)
[2] 416,793 msgs/sec ~ 50.88 MB/sec (10000 msgs)
[3] 425,855 msgs/sec ~ 51.98 MB/sec (10000 msgs)
[4] 412,288 msgs/sec ~ 50.33 MB/sec (10000 msgs)
[5] 412,135 msgs/sec ~ 50.31 MB/sec (10000 msgs)
[6] 425,931 msgs/sec ~ 51.99 MB/sec (10000 msgs)
[7] 423,951 msgs/sec ~ 51.75 MB/sec (10000 msgs)
[8] 412,812 msgs/sec ~ 50.39 MB/sec (10000 msgs)
[9] 426,090 msgs/sec ~ 52.01 MB/sec (10000 msgs)
[10] 412,863 msgs/sec ~ 50.40 MB/sec (10000 msgs)
[11] 416,049 msgs/sec ~ 50.79 MB/sec (10000 msgs)
[12] 412,533 msgs/sec ~ 50.36 MB/sec (10000 msgs)
[13] 411,879 msgs/sec ~ 50.28 MB/sec (10000 msgs)
[14] 329,393 msgs/sec ~ 40.21 MB/sec (10000 msgs)
[15] 330,119 msgs/sec ~ 40.30 MB/sec (10000 msgs)
[16] 292,080 msgs/sec ~ 35.65 MB/sec (10000 msgs)
[17] 291,857 msgs/sec ~ 35.63 MB/sec (10000 msgs)
[18] 285,308 msgs/sec ~ 34.83 MB/sec (10000 msgs)
[19] 291,834 msgs/sec ~ 35.62 MB/sec (10000 msgs)
[20] 285,844 msgs/sec ~ 34.89 MB/sec (10000 msgs)
[21] 278,464 msgs/sec ~ 33.99 MB/sec (10000 msgs)
[22] 278,536 msgs/sec ~ 34.00 MB/sec (10000 msgs)
[23] 284,219 msgs/sec ~ 34.69 MB/sec (10000 msgs)
[24] 277,294 msgs/sec ~ 33.85 MB/sec (10000 msgs)
[25] 275,357 msgs/sec ~ 33.61 MB/sec (10000 msgs)
[26] 275,334 msgs/sec ~ 33.61 MB/sec (10000 msgs)
[27] 275,245 msgs/sec ~ 33.60 MB/sec (10000 msgs)
[28] 268,392 msgs/sec ~ 32.76 MB/sec (10000 msgs)
[29] 264,920 msgs/sec ~ 32.34 MB/sec (10000 msgs)
[30] 259,936 msgs/sec ~ 31.73 MB/sec (10000 msgs)
[31] 252,694 msgs/sec ~ 30.85 MB/sec (10000 msgs)
[32] 245,938 msgs/sec ~ 30.02 MB/sec (10000 msgs)
[33] 231,012 msgs/sec ~ 28.20 MB/sec (10000 msgs)
[34] 246,502 msgs/sec ~ 30.09 MB/sec (10000 msgs)
[35] 242,007 msgs/sec ~ 29.54 MB/sec (10000 msgs)
[36] 224,047 msgs/sec ~ 27.35 MB/sec (10000 msgs)
[37] 793,469 msgs/sec ~ 96.86 MB/sec (10000 msgs)
[38] 767,256 msgs/sec ~ 93.66 MB/sec (10000 msgs)
[39] 766,996 msgs/sec ~ 93.63 MB/sec (10000 msgs)
[40] 232,576 msgs/sec ~ 28.39 MB/sec (10000 msgs)
[41] 237,156 msgs/sec ~ 28.95 MB/sec (10000 msgs)
[42] 214,475 msgs/sec ~ 26.18 MB/sec (10000 msgs)
[43] 210,776 msgs/sec ~ 25.73 MB/sec (10000 msgs)
[44] 187,124 msgs/sec ~ 22.84 MB/sec (10000 msgs)
[45] 193,815 msgs/sec ~ 23.66 MB/sec (10000 msgs)
[46] 193,851 msgs/sec ~ 23.66 MB/sec (10000 msgs)
[47] 193,652 msgs/sec ~ 23.64 MB/sec (10000 msgs)
[48] 196,220 msgs/sec ~ 23.95 MB/sec (10000 msgs)
[49] 193,808 msgs/sec ~ 23.66 MB/sec (10000 msgs)
[50] 186,851 msgs/sec ~ 22.81 MB/sec (10000 msgs)
[51] 193,518 msgs/sec ~ 23.62 MB/sec (10000 msgs)
[52] 196,275 msgs/sec ~ 23.96 MB/sec (10000 msgs)
[53] 196,566 msgs/sec ~ 23.99 MB/sec (10000 msgs)
[54] 193,770 msgs/sec ~ 23.65 MB/sec (10000 msgs)
[55] 196,147 msgs/sec ~ 23.94 MB/sec (10000 msgs)
[56] 193,764 msgs/sec ~ 23.65 MB/sec (10000 msgs)
[57] 185,355 msgs/sec ~ 22.63 MB/sec (10000 msgs)
[58] 183,785 msgs/sec ~ 22.43 MB/sec (10000 msgs)
[59] 189,493 msgs/sec ~ 23.13 MB/sec (10000 msgs)
[60] 186,976 msgs/sec ~ 22.82 MB/sec (10000 msgs)
[61] 177,384 msgs/sec ~ 21.65 MB/sec (10000 msgs)
[62] 184,658 msgs/sec ~ 22.54 MB/sec (10000 msgs)
[63] 166,887 msgs/sec ~ 20.37 MB/sec (10000 msgs)
[64] 174,228 msgs/sec ~ 21.27 MB/sec (10000 msgs)
[65] 164,030 msgs/sec ~ 20.02 MB/sec (10000 msgs)
[66] 169,250 msgs/sec ~ 20.66 MB/sec (10000 msgs)
[67] 168,647 msgs/sec ~ 20.59 MB/sec (10000 msgs)
[68] 166,539 msgs/sec ~ 20.33 MB/sec (10000 msgs)
[69] 167,785 msgs/sec ~ 20.48 MB/sec (10000 msgs)
[70] 165,580 msgs/sec ~ 20.21 MB/sec (10000 msgs)
[71] 164,850 msgs/sec ~ 20.12 MB/sec (10000 msgs)
[72] 163,736 msgs/sec ~ 19.99 MB/sec (10000 msgs)
[73] 162,737 msgs/sec ~ 19.87 MB/sec (10000 msgs)
[74] 156,067 msgs/sec ~ 19.05 MB/sec (10000 msgs)
[75] 161,883 msgs/sec ~ 19.76 MB/sec (10000 msgs)
[76] 159,926 msgs/sec ~ 19.52 MB/sec (10000 msgs)
[77] 160,657 msgs/sec ~ 19.61 MB/sec (10000 msgs)
[78] 160,201 msgs/sec ~ 19.56 MB/sec (10000 msgs)
[79] 160,050 msgs/sec ~ 19.54 MB/sec (10000 msgs)
[80] 159,953 msgs/sec ~ 19.53 MB/sec (10000 msgs)
[81] 151,032 msgs/sec ~ 18.44 MB/sec (10000 msgs)
[82] 146,681 msgs/sec ~ 17.91 MB/sec (10000 msgs)
[83] 145,443 msgs/sec ~ 17.75 MB/sec (10000 msgs)
[84] 139,811 msgs/sec ~ 17.07 MB/sec (10000 msgs)
[85] 144,737 msgs/sec ~ 17.67 MB/sec (10000 msgs)
[86] 137,836 msgs/sec ~ 16.83 MB/sec (10000 msgs)
[87] 142,750 msgs/sec ~ 17.43 MB/sec (10000 msgs)
[88] 142,889 msgs/sec ~ 17.44 MB/sec (10000 msgs)
[89] 140,320 msgs/sec ~ 17.13 MB/sec (10000 msgs)
[90] 136,593 msgs/sec ~ 16.67 MB/sec (10000 msgs)
[91] 138,869 msgs/sec ~ 16.95 MB/sec (10000 msgs)
[92] 140,343 msgs/sec ~ 17.13 MB/sec (10000 msgs)
[93] 134,802 msgs/sec ~ 16.46 MB/sec (10000 msgs)
[94] 134,397 msgs/sec ~ 16.41 MB/sec (10000 msgs)
[95] 134,347 msgs/sec ~ 16.40 MB/sec (10000 msgs)
[96] 137,316 msgs/sec ~ 16.76 MB/sec (10000 msgs)
[97] 137,230 msgs/sec ~ 16.75 MB/sec (10000 msgs)
[98] 136,996 msgs/sec ~ 16.72 MB/sec (10000 msgs)
[99] 132,765 msgs/sec ~ 16.21 MB/sec (10000 msgs)
[100] 132,286 msgs/sec ~ 16.15 MB/sec (10000 msgs)
min 132,286 | avg 246,746 | max 793,469 | stddev 139,200 msgs

New Cache

New cache slCacheMax 1024:
nats bench foo --msgs 1000000 --pub 100 --no-progress --multisubject --multisubjectmax 100000

Pub stats: 4,390,717 msgs/sec ~ 535.98 MB/sec
[1] 53,083 msgs/sec ~ 6.48 MB/sec (10000 msgs)
[2] 52,676 msgs/sec ~ 6.43 MB/sec (10000 msgs)
[3] 50,037 msgs/sec ~ 6.11 MB/sec (10000 msgs)
[4] 49,132 msgs/sec ~ 6.00 MB/sec (10000 msgs)
[5] 49,100 msgs/sec ~ 5.99 MB/sec (10000 msgs)
[6] 48,791 msgs/sec ~ 5.96 MB/sec (10000 msgs)
[7] 48,110 msgs/sec ~ 5.87 MB/sec (10000 msgs)
[8] 48,403 msgs/sec ~ 5.91 MB/sec (10000 msgs)
[9] 48,333 msgs/sec ~ 5.90 MB/sec (10000 msgs)
[10] 48,214 msgs/sec ~ 5.89 MB/sec (10000 msgs)
[11] 47,297 msgs/sec ~ 5.77 MB/sec (10000 msgs)
[12] 47,717 msgs/sec ~ 5.82 MB/sec (10000 msgs)
[13] 47,713 msgs/sec ~ 5.82 MB/sec (10000 msgs)
[14] 47,503 msgs/sec ~ 5.80 MB/sec (10000 msgs)
[15] 47,423 msgs/sec ~ 5.79 MB/sec (10000 msgs)
[16] 47,391 msgs/sec ~ 5.79 MB/sec (10000 msgs)
[17] 47,277 msgs/sec ~ 5.77 MB/sec (10000 msgs)
[18] 47,270 msgs/sec ~ 5.77 MB/sec (10000 msgs)
[19] 47,056 msgs/sec ~ 5.74 MB/sec (10000 msgs)
[20] 46,520 msgs/sec ~ 5.68 MB/sec (10000 msgs)
[21] 47,018 msgs/sec ~ 5.74 MB/sec (10000 msgs)
[22] 46,437 msgs/sec ~ 5.67 MB/sec (10000 msgs)
[23] 46,187 msgs/sec ~ 5.64 MB/sec (10000 msgs)
[24] 46,663 msgs/sec ~ 5.70 MB/sec (10000 msgs)
[25] 46,136 msgs/sec ~ 5.63 MB/sec (10000 msgs)
[26] 46,642 msgs/sec ~ 5.69 MB/sec (10000 msgs)
[27] 46,567 msgs/sec ~ 5.68 MB/sec (10000 msgs)
[28] 46,530 msgs/sec ~ 5.68 MB/sec (10000 msgs)
[29] 46,499 msgs/sec ~ 5.68 MB/sec (10000 msgs)
[30] 45,926 msgs/sec ~ 5.61 MB/sec (10000 msgs)
[31] 46,474 msgs/sec ~ 5.67 MB/sec (10000 msgs)
[32] 46,465 msgs/sec ~ 5.67 MB/sec (10000 msgs)
[33] 46,445 msgs/sec ~ 5.67 MB/sec (10000 msgs)
[34] 46,439 msgs/sec ~ 5.67 MB/sec (10000 msgs)
[35] 45,902 msgs/sec ~ 5.60 MB/sec (10000 msgs)
[36] 46,444 msgs/sec ~ 5.67 MB/sec (10000 msgs)
[37] 46,374 msgs/sec ~ 5.66 MB/sec (10000 msgs)
[38] 46,351 msgs/sec ~ 5.66 MB/sec (10000 msgs)
[39] 46,269 msgs/sec ~ 5.65 MB/sec (10000 msgs)
[40] 46,181 msgs/sec ~ 5.64 MB/sec (10000 msgs)
[41] 46,171 msgs/sec ~ 5.64 MB/sec (10000 msgs)
[42] 46,136 msgs/sec ~ 5.63 MB/sec (10000 msgs)
[43] 45,594 msgs/sec ~ 5.57 MB/sec (10000 msgs)
[44] 46,099 msgs/sec ~ 5.63 MB/sec (10000 msgs)
[45] 45,558 msgs/sec ~ 5.56 MB/sec (10000 msgs)
[46] 46,022 msgs/sec ~ 5.62 MB/sec (10000 msgs)
[47] 45,990 msgs/sec ~ 5.61 MB/sec (10000 msgs)
[48] 45,916 msgs/sec ~ 5.61 MB/sec (10000 msgs)
[49] 45,368 msgs/sec ~ 5.54 MB/sec (10000 msgs)
[50] 45,892 msgs/sec ~ 5.60 MB/sec (10000 msgs)
[51] 45,853 msgs/sec ~ 5.60 MB/sec (10000 msgs)
[52] 45,873 msgs/sec ~ 5.60 MB/sec (10000 msgs)
[53] 45,852 msgs/sec ~ 5.60 MB/sec (10000 msgs)
[54] 45,814 msgs/sec ~ 5.59 MB/sec (10000 msgs)
[55] 45,614 msgs/sec ~ 5.57 MB/sec (10000 msgs)
[56] 45,100 msgs/sec ~ 5.51 MB/sec (10000 msgs)
[57] 45,587 msgs/sec ~ 5.56 MB/sec (10000 msgs)
[58] 45,495 msgs/sec ~ 5.55 MB/sec (10000 msgs)
[59] 45,421 msgs/sec ~ 5.54 MB/sec (10000 msgs)
[60] 44,876 msgs/sec ~ 5.48 MB/sec (10000 msgs)
[61] 44,854 msgs/sec ~ 5.48 MB/sec (10000 msgs)
[62] 45,381 msgs/sec ~ 5.54 MB/sec (10000 msgs)
[63] 45,335 msgs/sec ~ 5.53 MB/sec (10000 msgs)
[64] 45,332 msgs/sec ~ 5.53 MB/sec (10000 msgs)
[65] 45,272 msgs/sec ~ 5.53 MB/sec (10000 msgs)
[66] 45,211 msgs/sec ~ 5.52 MB/sec (10000 msgs)
[67] 45,133 msgs/sec ~ 5.51 MB/sec (10000 msgs)
[68] 45,112 msgs/sec ~ 5.51 MB/sec (10000 msgs)
[69] 45,119 msgs/sec ~ 5.51 MB/sec (10000 msgs)
[70] 45,056 msgs/sec ~ 5.50 MB/sec (10000 msgs)
[71] 45,036 msgs/sec ~ 5.50 MB/sec (10000 msgs)
[72] 45,018 msgs/sec ~ 5.50 MB/sec (10000 msgs)
[73] 44,511 msgs/sec ~ 5.43 MB/sec (10000 msgs)
[74] 45,009 msgs/sec ~ 5.49 MB/sec (10000 msgs)
[75] 44,938 msgs/sec ~ 5.49 MB/sec (10000 msgs)
[76] 44,944 msgs/sec ~ 5.49 MB/sec (10000 msgs)
[77] 44,927 msgs/sec ~ 5.48 MB/sec (10000 msgs)
[78] 44,909 msgs/sec ~ 5.48 MB/sec (10000 msgs)
[79] 44,853 msgs/sec ~ 5.48 MB/sec (10000 msgs)
[80] 44,815 msgs/sec ~ 5.47 MB/sec (10000 msgs)
[81] 44,785 msgs/sec ~ 5.47 MB/sec (10000 msgs)
[82] 44,793 msgs/sec ~ 5.47 MB/sec (10000 msgs)
[83] 44,760 msgs/sec ~ 5.46 MB/sec (10000 msgs)
[84] 44,722 msgs/sec ~ 5.46 MB/sec (10000 msgs)
[85] 44,671 msgs/sec ~ 5.45 MB/sec (10000 msgs)
[86] 44,634 msgs/sec ~ 5.45 MB/sec (10000 msgs)
[87] 44,600 msgs/sec ~ 5.44 MB/sec (10000 msgs)
[88] 44,568 msgs/sec ~ 5.44 MB/sec (10000 msgs)
[89] 44,585 msgs/sec ~ 5.44 MB/sec (10000 msgs)
[90] 44,584 msgs/sec ~ 5.44 MB/sec (10000 msgs)
[91] 44,558 msgs/sec ~ 5.44 MB/sec (10000 msgs)
[92] 44,548 msgs/sec ~ 5.44 MB/sec (10000 msgs)
[93] 44,550 msgs/sec ~ 5.44 MB/sec (10000 msgs)
[94] 44,522 msgs/sec ~ 5.43 MB/sec (10000 msgs)
[95] 44,505 msgs/sec ~ 5.43 MB/sec (10000 msgs)
[96] 44,504 msgs/sec ~ 5.43 MB/sec (10000 msgs)
[97] 44,510 msgs/sec ~ 5.43 MB/sec (10000 msgs)
[98] 44,451 msgs/sec ~ 5.43 MB/sec (10000 msgs)
[99] 43,935 msgs/sec ~ 5.36 MB/sec (10000 msgs)
[100] 44,424 msgs/sec ~ 5.42 MB/sec (10000 msgs)
min 43,935 | avg 46,032 | max 53,083 | stddev 1,571 msgs

New cache slCacheMax 1024:
nats bench foo --msgs 1000000 --pub 100 --no-progress --multisubject --multisubjectmax 10

Pub stats: 16,458,413 msgs/sec ~ 1.96 GB/sec
[1] 2,657,763 msgs/sec ~ 324.43 MB/sec (10000 msgs)
[2] 2,580,069 msgs/sec ~ 314.95 MB/sec (10000 msgs)
[3] 2,275,689 msgs/sec ~ 277.79 MB/sec (10000 msgs)
[4] 1,193,337 msgs/sec ~ 145.67 MB/sec (10000 msgs)
[5] 1,188,740 msgs/sec ~ 145.11 MB/sec (10000 msgs)
[6] 1,176,078 msgs/sec ~ 143.56 MB/sec (10000 msgs)
[7] 1,165,005 msgs/sec ~ 142.21 MB/sec (10000 msgs)
[8] 1,107,938 msgs/sec ~ 135.25 MB/sec (10000 msgs)
[9] 937,584 msgs/sec ~ 114.45 MB/sec (10000 msgs)
[10] 1,189,850 msgs/sec ~ 145.25 MB/sec (10000 msgs)
[11] 658,129 msgs/sec ~ 80.34 MB/sec (10000 msgs)
[12] 781,367 msgs/sec ~ 95.38 MB/sec (10000 msgs)
[13] 927,796 msgs/sec ~ 113.26 MB/sec (10000 msgs)
[14] 618,187 msgs/sec ~ 75.46 MB/sec (10000 msgs)
[15] 644,622 msgs/sec ~ 78.69 MB/sec (10000 msgs)
[16] 522,279 msgs/sec ~ 63.75 MB/sec (10000 msgs)
[17] 431,688 msgs/sec ~ 52.70 MB/sec (10000 msgs)
[18] 430,320 msgs/sec ~ 52.53 MB/sec (10000 msgs)
[19] 433,716 msgs/sec ~ 52.94 MB/sec (10000 msgs)
[20] 431,982 msgs/sec ~ 52.73 MB/sec (10000 msgs)
[21] 430,435 msgs/sec ~ 52.54 MB/sec (10000 msgs)
[22] 431,569 msgs/sec ~ 52.68 MB/sec (10000 msgs)
[23] 420,054 msgs/sec ~ 51.28 MB/sec (10000 msgs)
[24] 390,646 msgs/sec ~ 47.69 MB/sec (10000 msgs)
[25] 349,142 msgs/sec ~ 42.62 MB/sec (10000 msgs)
[26] 394,816 msgs/sec ~ 48.20 MB/sec (10000 msgs)
[27] 393,622 msgs/sec ~ 48.05 MB/sec (10000 msgs)
[28] 284,472 msgs/sec ~ 34.73 MB/sec (10000 msgs)
[29] 283,543 msgs/sec ~ 34.61 MB/sec (10000 msgs)
[30] 283,618 msgs/sec ~ 34.62 MB/sec (10000 msgs)
[31] 268,305 msgs/sec ~ 32.75 MB/sec (10000 msgs)
[32] 282,912 msgs/sec ~ 34.54 MB/sec (10000 msgs)
[33] 256,735 msgs/sec ~ 31.34 MB/sec (10000 msgs)
[34] 256,923 msgs/sec ~ 31.36 MB/sec (10000 msgs)
[35] 256,762 msgs/sec ~ 31.34 MB/sec (10000 msgs)
[36] 282,496 msgs/sec ~ 34.48 MB/sec (10000 msgs)
[37] 284,392 msgs/sec ~ 34.72 MB/sec (10000 msgs)
[38] 256,869 msgs/sec ~ 31.36 MB/sec (10000 msgs)
[39] 283,372 msgs/sec ~ 34.59 MB/sec (10000 msgs)
[40] 283,464 msgs/sec ~ 34.60 MB/sec (10000 msgs)
[41] 278,247 msgs/sec ~ 33.97 MB/sec (10000 msgs)
[42] 283,565 msgs/sec ~ 34.61 MB/sec (10000 msgs)
[43] 284,228 msgs/sec ~ 34.70 MB/sec (10000 msgs)
[44] 283,452 msgs/sec ~ 34.60 MB/sec (10000 msgs)
[45] 283,441 msgs/sec ~ 34.60 MB/sec (10000 msgs)
[46] 281,503 msgs/sec ~ 34.36 MB/sec (10000 msgs)
[47] 280,016 msgs/sec ~ 34.18 MB/sec (10000 msgs)
[48] 270,185 msgs/sec ~ 32.98 MB/sec (10000 msgs)
[49] 246,305 msgs/sec ~ 30.07 MB/sec (10000 msgs)
[50] 246,381 msgs/sec ~ 30.08 MB/sec (10000 msgs)
[51] 246,595 msgs/sec ~ 30.10 MB/sec (10000 msgs)
[52] 270,857 msgs/sec ~ 33.06 MB/sec (10000 msgs)
[53] 258,925 msgs/sec ~ 31.61 MB/sec (10000 msgs)
[54] 257,159 msgs/sec ~ 31.39 MB/sec (10000 msgs)
[55] 255,666 msgs/sec ~ 31.21 MB/sec (10000 msgs)
[56] 255,756 msgs/sec ~ 31.22 MB/sec (10000 msgs)
[57] 255,211 msgs/sec ~ 31.15 MB/sec (10000 msgs)
[58] 226,324 msgs/sec ~ 27.63 MB/sec (10000 msgs)
[59] 217,988 msgs/sec ~ 26.61 MB/sec (10000 msgs)
[60] 208,789 msgs/sec ~ 25.49 MB/sec (10000 msgs)
[61] 216,971 msgs/sec ~ 26.49 MB/sec (10000 msgs)
[62] 214,455 msgs/sec ~ 26.18 MB/sec (10000 msgs)
[63] 213,976 msgs/sec ~ 26.12 MB/sec (10000 msgs)
[64] 212,760 msgs/sec ~ 25.97 MB/sec (10000 msgs)
[65] 195,842 msgs/sec ~ 23.91 MB/sec (10000 msgs)
[66] 207,080 msgs/sec ~ 25.28 MB/sec (10000 msgs)
[67] 206,069 msgs/sec ~ 25.16 MB/sec (10000 msgs)
[68] 187,103 msgs/sec ~ 22.84 MB/sec (10000 msgs)
[69] 198,172 msgs/sec ~ 24.19 MB/sec (10000 msgs)
[70] 184,627 msgs/sec ~ 22.54 MB/sec (10000 msgs)
[71] 184,666 msgs/sec ~ 22.54 MB/sec (10000 msgs)
[72] 197,167 msgs/sec ~ 24.07 MB/sec (10000 msgs)
[73] 196,847 msgs/sec ~ 24.03 MB/sec (10000 msgs)
[74] 195,048 msgs/sec ~ 23.81 MB/sec (10000 msgs)
[75] 195,123 msgs/sec ~ 23.82 MB/sec (10000 msgs)
[76] 192,799 msgs/sec ~ 23.54 MB/sec (10000 msgs)
[77] 192,506 msgs/sec ~ 23.50 MB/sec (10000 msgs)
[78] 194,721 msgs/sec ~ 23.77 MB/sec (10000 msgs)
[79] 192,895 msgs/sec ~ 23.55 MB/sec (10000 msgs)
[80] 178,332 msgs/sec ~ 21.77 MB/sec (10000 msgs)
[81] 189,251 msgs/sec ~ 23.10 MB/sec (10000 msgs)
[82] 189,157 msgs/sec ~ 23.09 MB/sec (10000 msgs)
[83] 189,086 msgs/sec ~ 23.08 MB/sec (10000 msgs)
[84] 187,399 msgs/sec ~ 22.88 MB/sec (10000 msgs)
[85] 184,510 msgs/sec ~ 22.52 MB/sec (10000 msgs)
[86] 183,878 msgs/sec ~ 22.45 MB/sec (10000 msgs)
[87] 183,843 msgs/sec ~ 22.44 MB/sec (10000 msgs)
[88] 183,412 msgs/sec ~ 22.39 MB/sec (10000 msgs)
[89] 182,453 msgs/sec ~ 22.27 MB/sec (10000 msgs)
[90] 179,352 msgs/sec ~ 21.89 MB/sec (10000 msgs)
[91] 178,657 msgs/sec ~ 21.81 MB/sec (10000 msgs)
[92] 178,335 msgs/sec ~ 21.77 MB/sec (10000 msgs)
[93] 177,008 msgs/sec ~ 21.61 MB/sec (10000 msgs)
[94] 176,572 msgs/sec ~ 21.55 MB/sec (10000 msgs)
[95] 176,461 msgs/sec ~ 21.54 MB/sec (10000 msgs)
[96] 174,288 msgs/sec ~ 21.28 MB/sec (10000 msgs)
[97] 165,141 msgs/sec ~ 20.16 MB/sec (10000 msgs)
[98] 165,192 msgs/sec ~ 20.17 MB/sec (10000 msgs)
[99] 175,104 msgs/sec ~ 21.38 MB/sec (10000 msgs)
[100] 175,497 msgs/sec ~ 21.42 MB/sec (10000 msgs)
min 165,141 | avg 404,686 | max 2,657,763 | stddev 449,939 msgs

@HeavyHorst HeavyHorst requested a review from a team as a code owner August 30, 2023 18:58
@wallyqs wallyqs changed the base branch from main to dev August 30, 2023 19:27
@wallyqs wallyqs changed the base branch from dev to main August 30, 2023 19:28
}

//go:noescape
//go:linkname memhash runtime.memhash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the idea of the PR is interesting and the results promising, relying on go:linkname in this way is dangerous as it ties us into the behaviour of a specific Go compiler implementation. This probably means that NATS Server would no longer compile using gccgo for example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to use the maphash package (what is already used in the server).
The performance seems to be even a little bit better.

Sidenote:

the maphash package does the same here: https://cs.opensource.google/go/go/+/refs/tags/go1.21.0:src/hash/maphash/maphash_runtime.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for switching to the maphash package, indeed it does call out with go:linkname in the happy case, but more importantly, it has a purego fallback.

@wallyqs wallyqs added the post-freeze We'll come back to this after the freeze period label Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
post-freeze We'll come back to this after the freeze period
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1M topics
3 participants