New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW] Handle spamming of custom LUA error messages in the INFO ERRORSTATS section with continued tracking of non LUA errors stats #487
Comments
does this redis/redis#13141 can somehow solve your problem? |
I really don't understand why Oran marked that as not a breaking change, it's possible someone had more than 128 errors. I feel like he was just trying to merge stuff before the license change. I think this should solve the issue @KarthikSubbarao brought up though. |
yes, that is indeed a potential breaking change. I have thought about add a new configuration item and defaulting to infinity to avoid the breaking change, but Oran does't seem to mention it and probable doesn't want to add new a configuration item for this.
Looking at it now, it may indeed be. |
@enjoy-binbin - (Just read the top comment) Yes, this will solve the issue of spamming For completeness - with redis/redis#13141, Whereas, the idea proposed in this issue was to have a config to disable custom LUA error messages while continuing to allow regular Error stats to function normally. But the main issue we wanted to address is misuse (spamming), so this should work. Thank you
If we think this is useful, we can consider making this improvement - by adding onto the existing logic |
Resetting all the error count will also lose visibility into other errors like MOVED etc that can help with debugging issues in the cluster. Is it possible to record all custom errors from lua in a separate RAX (based on |
Yes, looks like with the current implementation (from redis/redis#13141):
These issues aside, the PR addresses the concerns we have perfectly.
The proposal in this issue is to prevent tracking new error messages from LUA if the number of LUA error messages is greater than 128. Instead, we will track any additional LUA error type in a new counter: This will address the issue of spammed error messages / memory usage of the |
I like the idea of tracking lua errors in a separate RAX, however it can't just be based off of |
@KarthikSubbarao Thank you for driving this! I am reading through the PR and will provide technical comments there (if I have any) but wanted to ask here: |
@ranshid Thank you. Modules are similar in that they can result in custom error messages - but I would place Modules closer to the Valkey server in that it is the responsibility of those running the server to ensure that the modules they load do not lead to spamming the But (if required) we can indeed add support for preventing misuse by Modules by passing a flag in the |
The problem/use-case that the feature addresses
When using LUA to reply with custom error messages, the prefix of the messages be included in the
INFO ERRORSTATS
section as a unique entry. This can spam theINFO ALL
command or anyINFO
command variant that includes theERRORSTATS
section.redis/redis#13141 address this spamming of the
INFO
command'sERRORSTATS
section and the memory usage of theerrors
RAX. However, it has the following drawbacks:CONFIG RESETSTAT
which will clear command stat counters and error stat counters. So, we will lose visibility in both of these.Description of the feature
The proposal in this issue is to prevent tracking new error messages from LUA if the number of LUA error messages is greater than 128. Instead, we will track any additional LUA error type in a new counter:
errorstat_LUA
and if a non-LUA error (e.g. MOVED / CLUSTERDOWN) occurs, they will continue to be tracked as usual.This will address the issue of spammed error messages / memory usage of the errors RAX. Additionally, we will not have to execute
CONFIG RESETSTAT
to restore functionality - as normal error messages continue to be tracked.Based on feedback and thoughts from this discussion, I can propose a PR to support this feature.
Alternatives you've considered
NA
The text was updated successfully, but these errors were encountered: