-
Notifications
You must be signed in to change notification settings - Fork 23.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CRASH] Redis 7.2.3 crashed in slotToKeyReplaceEntry #13205
Comments
crash in assemble 0x0000000000144b13 <+147>: call 0x134c60 <keyHashSlot>
0x0000000000144b18 <+152>: mov 0x38(%rbp),%rdx
0x0000000000144b1c <+156>: mov %eax,%eax
0x0000000000144b1e <+158>: shl $0x4,%rax
0x0000000000144b22 <+162>: add 0x50(%rdx),%rax <- crash here unsigned int hashslot = keyHashSlot(key, sdslen(key));
clusterDictMetadata *dictmeta = dictMetadata(d);
redisDb *db = dictmeta->db;
slotToKeys *slot_to_keys = &(*db->slots_to_keys).by_slot[hashslot]; |
Thank you @sundb. |
@SarthakSahu yes, but i don't know why yet, it still need some time. |
Hi @sundb Any break through here? Our cluster has been crashed again with same stacks trace. |
@SarthakSahu sorry, since i haven't had enought time to dig into it right now, could you provide more information so I can pinpoint it quickly? Or can you reproduce it quickly? |
Same as #12677? |
@stevelipinski Yes, they are same. |
Yes: ------ CONFIG DEBUG OUTPUT ------ I believe one minor difference is that in this case, activedefrag is set from the beginning, whereas in #12677, they mention setting it on a replica. |
@stevelipinski thanks a lot, this is usefull, and I will take the time to figue out why as soon as possible. |
Hi @sundb, Do we have any intermediate update to share. |
@SarthakSahu sorry, I've tried and failed to reproduce it, can you give me more clues about your system, special configurations, etc. |
@SarthakSahu disable activedefrag until bug is fixed |
AFAIK, Frequently short leaved data has been injected. This mean frequently data is keep injected and deleted. |
@SarthakSahu did you do anything special operations with the cluster? like slot migration? |
I've been trying to reproduce as well. The closest I can achieve is with a script that creates a bunch of strings, some of them having a ttl. Then by disabling and then re-enabling activedefrag, it crashes. But, it is crashing at a different point in the code:
|
Common thread I see here in these crashes, between 13205 and 12677 is the backtrace: Makes me wonder if there is some race condition occurring between threads when active defrag is running... |
@stevelipinski thanks, it will be fixed in next release. |
@sundb - Did you find the root cause? Because we are investigating, and if you already know what needs fixed, we will not spend more time on it. Thanks! |
@stevelipinski sorry for late, the reason of crash is that we forgot to call |
@sundb - thanks for sharing the patch. I also think that my above-mentioned crash was because of disabling activedefrag while it was actively running, which caused the old cursor to allow access to an out-of-range db: stevelipinski/redis@ecb7cd8 |
@stevelipinski defragment is just one way to trigger the crash, any code touching |
No - I did not use It looked like a different crash/backtrace than was being discussed in this orig issue. See above. |
@stevelipinski i saw
|
Nope - even with your fix:
Needs my change to reset expires_cursor in defrag.c to avoid this crash |
@sundb - Given your change to add |
@stevelipinski thanks, they do seem to two issue, welcome to create a PR to fix it.
you can reproduce it by using |
@stevelipinski i've reproduced it locally and manually, and your solution is right. |
The text was updated successfully, but these errors were encountered: