Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch applying events to kqueue #449

Open
wants to merge 7 commits into
base: unstable
Choose a base branch
from

Conversation

panjf2000
Copy link
Contributor

kqueue has the capability of batch applying events:

The kevent,() kevent64() and kevent_qos() system calls are used to
register events with the queue, and return any pending events to the
user. The changelist argument is a pointer to an array of kevent,
kevent64_s or kevent_qos_s structures, as defined in <sys/event.h>. All
changes contained in the changelist are applied before any pending events
are read from the queue. The nchanges argument gives the size of
changelist.

This PR implements this functionality for kqueue with which we're able to reduce plenty of system calls of kevent(2).

Copy link

codecov bot commented May 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.21%. Comparing base (a0aebb6) to head (31427eb).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #449      +/-   ##
============================================
+ Coverage     70.17%   70.21%   +0.03%     
============================================
  Files           109      109              
  Lines         59904    59904              
============================================
+ Hits          42039    42059      +20     
+ Misses        17865    17845      -20     

see 10 files with indirect coverage changes

@madolson
Copy link
Member

madolson commented May 8, 2024

Since we don't automatically test on FreeBSD: https://github.com/valkey-io/valkey/actions/runs/8994018097

@panjf2000
Copy link
Contributor Author

Since we don't automatically test on FreeBSD: valkey-io/valkey/actions/runs/8994018097

Thanks, I see it's succeeded. Just out of curiosity, why didn't we set up a macOS runner that would run by default?

@madolson
Copy link
Member

madolson commented May 8, 2024

Thanks, I see it's succeeded. Just out of curiosity, why didn't we set up a macOS runner that would run by default?

Runner execution time was the historical reason, we didn't want to burn resources on running all tests, since we have a daily set of tests that are slightly more comprehensive.

src/ae_kqueue.c Outdated Show resolved Hide resolved
kqueue has the capability of batch applying events.

This PR implements this functionality for kqueue with which
we're able to reduce plenty of system calls of `kevent(2)`.

---------

Signed-off-by: Andy Pan <i@andypan.me>

---------

Signed-off-by: Andy Pan <i@andypan.me>
@panjf2000
Copy link
Contributor Author

Hi @madolson, any new thoughts on this?

@madolson
Copy link
Member

@panjf2000 It makes sense to me, I just haven't had a sense to walk through it and convince myself the code is correct. Hopefully tomorrow?

@panjf2000
Copy link
Contributor Author

@panjf2000 It makes sense to me, I just haven't had a sense to walk through it and convince myself the code is correct. Hopefully tomorrow?

Sure, take your time. I was asking only to ensure that you'd like to continue this reveiwing.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect this to make the code more performant?

src/ae_kqueue.c Outdated Show resolved Hide resolved
src/ae_kqueue.c Outdated Show resolved Hide resolved
src/ae_kqueue.c Outdated Show resolved Hide resolved
src/ae_kqueue.c Outdated Show resolved Hide resolved
---------

Signed-off-by: Andy Pan <i@andypan.me>
@panjf2000
Copy link
Contributor Author

Do we expect this to make the code more performant?

This PR is expected to conserve plenty of extra system calls to kevent(2), which should have a significant benefit to the performance.

@panjf2000 panjf2000 requested a review from madolson May 13, 2024 01:38
@madolson
Copy link
Member

which should have a significant benefit to the performance.

Would you mind using valkey-benchmark -r 10000000 -d 100 -n 10000000 with and without to change to compare the performance? I'm not very happy with the odd failure modes, so trying to understand how much performance we're really going to see.

---------

Signed-off-by: Andy Pan <i@andypan.me>
@panjf2000
Copy link
Contributor Author

which should have a significant benefit to the performance.

Would you mind using valkey-benchmark -r 10000000 -d 100 -n 10000000 with and without to change to compare the performance? I'm not very happy with the odd failure modes, so trying to understand how much performance we're really going to see.

Environment

 Model : Mac Studio (2022)
    OS : macOS 14.1.2
   CPU : Apple M1 Max, 10-core CPU with 8 performance cores and 2 efficiency cores
Memory : 32GB unified memory

Benchmark command

# With RDB and AOF disabled on the server.
valkey-benchmark -r 10000000 -d 100 -n 10000000 -q

Benchmark results

valkey-io:unstable

PING_INLINE: 141031.80 requests per second, p50=0.175 msec
PING_MBULK: 164306.14 requests per second, p50=0.159 msec
SET: 160627.09 requests per second, p50=0.191 msec
GET: 156184.11 requests per second, p50=0.175 msec
INCR: 161022.81 requests per second, p50=0.167 msec
LPUSH: 164573.83 requests per second, p50=0.167 msec
RPUSH: 164098.53 requests per second, p50=0.167 msec
LPOP: 163655.41 requests per second, p50=0.167 msec
RPOP: 162211.27 requests per second, p50=0.167 msec
SADD: 160130.66 requests per second, p50=0.167 msec
HSET: 151906.42 requests per second, p50=0.191 msec
SPOP: 158737.72 requests per second, p50=0.175 msec
ZADD: 127665.01 requests per second, p50=0.319 msec
ZPOPMIN: 158866.33 requests per second, p50=0.175 msec
LPUSH (needed to benchmark LRANGE): 164546.77 requests per second, p50=0.167 msec
LRANGE_100 (first 100 elements): 86594.33 requests per second, p50=0.311 msec
LRANGE_300 (first 300 elements): 36351.34 requests per second, p50=0.663 msec
LRANGE_500 (first 500 elements): 22212.60 requests per second, p50=0.751 msec
LRANGE_600 (first 600 elements): 19722.66 requests per second, p50=0.839 msec
MSET (10 keys): 54174.70 requests per second, p50=0.839 msec
XADD: 157696.38 requests per second, p50=0.175 msec

panjf2000:kqueue-batch

PING_INLINE: 159212.86 requests per second, p50=0.159 msec
PING_MBULK: 184396.38 requests per second, p50=0.143 msec
SET: 173722.70 requests per second, p50=0.207 msec
GET: 179742.97 requests per second, p50=0.159 msec
INCR: 182832.06 requests per second, p50=0.159 msec
LPUSH: 174146.25 requests per second, p50=0.151 msec
RPUSH: 174794.62 requests per second, p50=0.151 msec
LPOP: 173250.17 requests per second, p50=0.151 msec
RPOP: 175260.27 requests per second, p50=0.151 msec
SADD: 181957.12 requests per second, p50=0.159 msec
HSET: 174380.08 requests per second, p50=0.175 msec
SPOP: 186985.80 requests per second, p50=0.159 msec
ZADD: 139279.66 requests per second, p50=0.303 msec
ZPOPMIN: 182648.41 requests per second, p50=0.151 msec
LPUSH (needed to benchmark LRANGE): 180554.31 requests per second, p50=0.151 msec
LRANGE_100 (first 100 elements): 93919.64 requests per second, p50=0.287 msec
LRANGE_300 (first 300 elements): 37977.62 requests per second, p50=0.639 msec
LRANGE_500 (first 500 elements): 23096.98 requests per second, p50=0.719 msec
LRANGE_600 (first 600 elements): 20357.48 requests per second, p50=0.815 msec
MSET (10 keys): 54659.74 requests per second, p50=0.823 msec
XADD: 170558.23 requests per second, p50=0.159 msec

@madolson

@madolson
Copy link
Member

Ok, those performance numbers are compelling, although I'm a little surprised at the result. During the typical command flow, we shouldn't call any of these functions, so I'm surprised to see so much improvement.

@panjf2000
Copy link
Contributor Author

The benchmark for commands of retrieving data with pipeline could call aeApiAddEvent for AE_WRITABLE. Posting these numbers for another reference.

./valkey-benchmark -r 10000000 -d 1024 -n 10000000 -P 100 -t get,lpop,rpop,spop,zpopmin -q

valkey-io:unstable:

GET: 3898635.50 requests per second, p50=1.151 msec
LPOP: 3837298.50 requests per second, p50=1.183 msec
RPOP: 3752345.25 requests per second, p50=1.215 msec
SPOP: 3979307.50 requests per second, p50=1.135 msec
ZPOPMIN: 3790750.50 requests per second, p50=1.199 msec

panjf2000:kqueue-batch:

GET: 4093327.75 requests per second, p50=1.095 msec
LPOP: 3971406.00 requests per second, p50=1.143 msec
RPOP: 4127115.00 requests per second, p50=1.095 msec
SPOP: 4152824.00 requests per second, p50=1.095 msec
ZPOPMIN: 3977724.75 requests per second, p50=1.143 msec

@madolson

@madolson
Copy link
Member

Ok, so I'm still worried about the panic and how it might impact production systems for edge cases related to kqueue failures. I would like to get the performance improvement, but don't want that to come at the cost of potential failures. Would you be happy with this being some type of build flag so that folks have to opt-in to it at build time?

@panjf2000
Copy link
Contributor Author

Ok, so I'm still worried about the panic and how it might impact production systems for edge cases related to kqueue failures. I would like to get the performance improvement, but don't want that to come at the cost of potential failures. Would you be happy with this being some type of build flag so that folks have to opt-in to it at build time?

I'm OK with that. But I'm not sure how you plan on doing that. The first way I can think of is to add a new configuration in valkey.conf. Is this what you had in mind? @madolson

@panjf2000
Copy link
Contributor Author

Another approach could be Makefile flags, something like BUILD_TLS or USE_SYSTEMD. This looks more feasible.

---------

Signed-off-by: Andy Pan <i@andypan.me>
@panjf2000
Copy link
Contributor Author

@madolson Please check out the latest commit.

@PingXie
Copy link
Member

PingXie commented May 20, 2024

@panjf2000, do you mind counting the number of kevent calls before and after this change for a given workload (./valkey-benchmark -r 10000000 -d 1024 -n 10000000 -P 100 -t get,lpop,rpop,spop,zpopmin -q would be fine too)? My understanding is that we only redister/de-register events on connection establishment and tear-down. It will be really helpful if you could show direct evidence that # of kevent calls was indeed significant and greatly reduced.

panic() on failure makes sense to me. Not being able to register/de-register events is a critical error IMO. The existing error handling of either ignoring the failure or trying to revert the change doesn't make much sense to me. That said, this is indeed a behavior/contract change for the event loop so introducing it with a build macro at the moment is safer. We should also update the README.md to make the behavior change more explicit. I think it is a bit too early to have it controlled by a server config.

src/ae_kqueue.c Outdated Show resolved Hide resolved
src/ae_kqueue.c Outdated Show resolved Hide resolved
@panjf2000
Copy link
Contributor Author

panjf2000 commented May 20, 2024

@panjf2000, do you mind counting the number of kevent calls before and after this change for a given workload (./valkey-benchmark -r 10000000 -d 1024 -n 10000000 -P 100 -t get,lpop,rpop,spop,zpopmin -q would be fine too)? My understanding is that we only redister/de-register events on connection establishment and tear-down. It will be really helpful if you could show direct evidence that # of kevent calls was indeed significant and greatly reduced.

panic() on failure makes sense to me. Not being able to register/de-register events is a critical error IMO. The existing error handling of either ignoring the failure or trying to revert the change doesn't make much sense to me. That said, this is indeed a behavior/contract change for the event loop so introducing it with a build macro at the moment is safer. We should also update the README.md to make the behavior change more explicit. I think it is a bit too early to have it controlled by a server config.

Not only do we register and unregister events for connections and disconnections, we also do that with AE_WRITABLE
when there is (no) pending data to write.

As for the statistics of the system calls to kevent, I've tried to count the system calls of kevent before, but the equivalent of strace on macOS -- dtruss is hard to use with SIP enabled, and the other one ktrace seems to be broken and deprecated. I guess I can try again later...

---------

Signed-off-by: Andy Pan <i@andypan.me>
@panjf2000
Copy link
Contributor Author

Benchmark command

./valkey-benchmark -r 10000000 -d 1024 -n 10000000 -P 100 -t get,lpop,rpop,spop,zpopmin -q

Branch: valkey-io:unstable

CALL                                        COUNT
…
accept                                         64
fcntl                                         174
setsockopt                                    263
kevent                                       1227
read                                        22255
write                                       22257
GET: 3496503.50 requests per second, p50=1.223 msec
LPOP: 3617945.00 requests per second, p50=1.191 msec
RPOP: 3617945.00 requests per second, p50=1.191 msec
SPOP: 3553660.50 requests per second, p50=1.207 msec
ZPOPMIN: 3673769.50 requests per second, p50=1.175 msec

Branch: panjf2000:kqueue-batch

CALL                                        COUNT
…
accept                                         67
fcntl                                         174
setsockopt                                    263
kevent                                        903
read                                        21085
write                                       21114
GET: 3834355.75 requests per second, p50=1.135 msec
LPOP: 3815337.50 requests per second, p50=1.151 msec
RPOP: 3675119.50 requests per second, p50=1.207 msec
SPOP: 3707823.50 requests per second, p50=1.191 msec
ZPOPMIN: 3732736.25 requests per second, p50=1.159 msec

@panjf2000 panjf2000 requested a review from PingXie May 20, 2024 13:54
---------

Signed-off-by: Andy Pan <i@andypan.me>
@panjf2000
Copy link
Contributor Author

Ping @madolson @PingXie

@madolson
Copy link
Member

@PingXie Related to your comment about event handling, I think we should look more seriously into making sure that connSet(Read|Write)Handler always succeed. Once that is done, we can update the kqueue batch handler. I documented my thoughts here: #528.

@panjf2000
Copy link
Contributor Author

panjf2000 commented May 22, 2024

Once that is done, we can update the kqueue batch handler.

To make it clearer, by "update the kqueue batch handler", you meant removing the USE_KQUEUE_BATCH macro in this PR and make it the only code path, right? If so, this PR is good to go? @madolson

@panjf2000
Copy link
Contributor Author

panjf2000 commented May 22, 2024

Or, did you mean that we need to fix #528 by changing the return type in the function signatures of connSet(Read/Write) from int to void (option 2 in #528, it seems more pragmatic) before we move forward with this PR? If that is it, I think I can take on that and open a new PR for #528. Or say, we do it in this PR directly, which can avoid some conflicts between PRs.

@madolson
Copy link
Member

To make it clearer, by "update the kqueue batch handler", you meant removing the USE_KQUEUE_BATCH macro in this PR and make it the only code path, right?

Yeah, this one. I think we can start with the USE_KQUEUE_BATCH macro and remove the USE_KQUEUE_BATCH when we solve some of the issues with aeCreateFileEvent sometimes returning an error. If it never throws an error, it should be safe to always batch the updates.

@panjf2000
Copy link
Contributor Author

To make it clearer, by "update the kqueue batch handler", you meant removing the USE_KQUEUE_BATCH macro in this PR and make it the only code path, right?

Yeah, this one. I think we can start with the USE_KQUEUE_BATCH macro and remove the USE_KQUEUE_BATCH when we solve some of the issues with aeCreateFileEvent sometimes returning an error. If it never throws an error, it should be safe to always batch the updates.

Great! Then I guess we can continue the code review here? @madolson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants