Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redis-cli --keystats and --keystats-samples with --top and --cursor #12826

Merged
merged 36 commits into from May 25, 2024

Conversation

yveslb
Copy link
Contributor

@yveslb yveslb commented Dec 3, 2023

Added --keystats and --keystats-samples <n> commands.

The new commands combines memkeys and bigkeys with additional distribution data.
We often run memkeys and bigkeys one after the other. It will be convenient to just have one command.
Distribution and top 10 key sizes are useful when we have multiple keys taking a lot of memory.

Like for memkeys and bigkeys, we can use -i <n> and Ctrl-C to interrupt the scan loop. It will still show the statistics on the keys sampled so far.
We can use two new optional parameters:

  • --cursor <n> to continue from the last cursor after an interruption of the scan.
  • --top <n> to change the number of top key sizes (10 by default).

Implemented a fix for the --memkeys-samples 0 issue in order to use --keystats-samples 0.

Used hdr_histogram for the key size distribution.

For key length, hdr_histogram seemed overkill and preset ranges were used.

The memory used by keystats with hdr_histogram is around 7MB (compared to 3MB for memkeys or bigkeys).

Execution time is somewhat equivalent to adding memkeys and bigkeys time. Each scan loop is having more commands (key type, key size, key length/cardinality).

We can redirect the output to a file. In that case, no color or text refresh will happen.

Limitation:

  • As the information printed during the loop is refreshed (moving cursor up), stderr and information not fitting in the terminal window (width or height) might create some refresh issues.

Comments:

  • config.top_sizes_limit could be used globally like config.count, but it is passed as parameter to be consistent with config.memkeys_samples.
  • Not sure if we should move some utility functions to cli-common.c.

Got some tips and help from @ofirluzon.

src/redis-cli.c Fixed Show fixed Hide fixed
src/redis-cli.c Fixed Show fixed Hide fixed
@oranagra
Copy link
Member

@redis/redis-committers maybe one of you has time to review this one?

Copy link
Collaborator

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did some cleanup on last commit (hope you don't mind), mainly bool (we don't use bool in the codebase) and format.
then i also tried playing it, the feature look good, just a brief review

src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
@zuiderkwast
Copy link
Contributor

zuiderkwast commented Feb 26, 2024

I tried this feature briefely on some small dummy data. I think the feature can be useful, but probably the output and other user experience can be improved. (I haven't thought about it much yet.) Here is a text-based "screenshot" for the record:

$ ./redis-cli --keystats

# Scanning the entire keyspace to find the biggest keys and distribution information.
# Use -i 0.1 to sleep 0.1 sec per 100 SCAN commands (not usually needed).
# Use --cursor <n> to start the scan at the cursor <n> (usually after a Ctrl-C).
# Use --top <n> to display <n> top key sizes (default is 10).
# Ctrl-C to stop the scan.

100.00% ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Keys sampled: 9
Keys size:    520B

--- TOP 10 KEY SIZES
  1      88B hash       "h1"
  2      72B list       "l1"
  3      56B string     "x"
  4      56B string     "\xc3\xa4"
  5      56B string     "z"
  6      48B string     "5"
  7      48B string     "1"
  8      48B string     "7"
  9      48B string     "3"

--- TOP SIZE PER TYPE
list       "l1" is 72B
hash       "h1" is 88B
string     "x" is 56B

--- TOP LENGTH AND CARDINALITY PER TYPE
list       "l1" has 5 items
hash       "h1" has 4 fields
string     "\xc3\xa4" has 3B

--- Key Size  Percentile   Total Keys
         48B     0.0000%            4
         56B    50.0000%            7
         72B    87.5000%            8
         88B   100.0000%            9
Note: 0.01% size precision, Mean: 57B, StdDeviation: 12B

--- Key length  Percentile  Total keys
            2B   100.0000%           9
Total key length is 12B (1B avg)

--- Type |  Total keys| Keys %|Tot size|Avg size| Total length/card|Avg ln/card
list                 1  11.11%      72B      72B            5 items        5.00
hash                 1  11.11%      88B      88B           4 fields        4.00
string               7  77.78%     360B      51B                10B          1B

Then, I tried it again after populating 1M keys (using DEBUG POPULATE 1000000). It does a scan over the entire keyspace. It suppose it can be quite costly. The comment "Use -i 0.1 to sleep 0.1 sec per 100 SCAN commands (not usually needed)" worries me a little, [Edit] but --bigkeys does this too, so I suppose it's fine.

I've seen an oddity when aborting the scan with Ctrl+C. It prints the results of the keys scanned, along with a message like

Scan interrupted:
Use 'redis-cli --keystats --cursor 784131' to restart from the last cursor.

During the scan, there is a nice progress bar displayed during the scan. If I abort the scan after 75% and then resume using redis-cli --keystats --cursor 784131, the progress bar then goes from 0 to 25% and then it stops at 25%, which looks odd. I would expect it to continue from 75% and scan to reach 100%, but obviously it doesn't know that the cursor is at 75% of the scan.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Feb 27, 2024

The interactive progress bar and the possibility to resume if interrupted is a nice experience. Can we add those to --bigkeys and --hotkeys too? Similar features should have a similar user experience IMO.

Maybe even --scan can benefit from interrupt and resume using --cursor, but I guess it makes sense only if stdin is a TTY but stdout is not a TTY, i.e. if it's invoked like redis-cli --scan > file. Possibly we can display the progress bar and message on the TTY in that case, where it isn't mixed with the returned keys. Does it make sense?

The design of the output (headings and tables) is not similar to the output of the other features like --bigkeys, --memkeys and --stat. I think we should use a similar design as the existing features.

@yveslb
Copy link
Contributor Author

yveslb commented Feb 28, 2024

@zuiderkwast

You are correct about restarting at a cursor other than 0. It will be nicer to restart at the last percentage, but as you mentioned, I am not sure we can find a way to do so. I guess the user should be ok knowing that using --cursor will restart from 0% and will probably not reach 100%. In addition, between the scans, we might have added or remove some keys.

--keystats should have all the information from --memkeys and --bigkeys. To keep the existing code behavior and output, I did not want to change it too much (except to fix the issue with --memkeys-samples 0). The worry being to break any automations that rely on the output of existing commands. I am not sure the output of --memkeys, --bigkeys, and --hotkeys is designed to be parsed, but some users are creative.
If we think we should change the output of existing commands to be more consistent, we could do it in another pull request.

For the Redis Data Type summary difference, I did the change for readeability purpose.

--- Type |  Total keys| Keys %|Tot size|Avg size| Total length/card|Avg ln/card
list             10001  20.00%    2.25G  235.39K       188538 items       18.85
string           10003  20.00%   12.73M    1.30K             11.05M       1.13K
MBbloomCF            1   0.00%    1.11K    1.11K                 -           -
hash             10000  19.99%  127.31M   13.04K       47160 fields        4.72
TopK-TYPE            1   0.00%  112.62K  112.62K                 -           -
TDIS-TYPE            1   0.00%    9.58K    9.58K                 -           -
TSDB-TYPE            1   0.00%    4.14K    4.14K                 -           -
set              10000  19.99%   70.63M    7.23K      47390 members        4.74
CMSk-TYPE            1   0.00%  140.67K  140.67K                 -           -
zset             10000  19.99%   65.48M    6.70K      47098 members        4.71
MBbloom--            1   0.00%     280B     280B                 -           -
ReJSON-RL            4   0.01%    1.45K     370B                 -           -

vs

10001 lists with 2410602722 bytes (20.00% of keys, avg size 241036.17)
10003 strings with 13345720 bytes (20.00% of keys, avg size 1334.17)
1 MBbloomCFs with 1136 bytes (00.00% of keys, avg size 1136.00)
10000 hashs with 133490960 bytes (19.99% of keys, avg size 13349.10)
1 TopK-TYPEs with 115328 bytes (00.00% of keys, avg size 115328.00)
0 streams with 0 bytes (00.00% of keys, avg size 0.00)
1 TDIS-TYPEs with 9808 bytes (00.00% of keys, avg size 9808.00)
1 TSDB-TYPEs with 4240 bytes (00.00% of keys, avg size 4240.00)
10000 sets with 74061054 bytes (19.99% of keys, avg size 7406.11)
1 CMSk-TYPEs with 144048 bytes (00.00% of keys, avg size 144048.00)
10000 zsets with 68656760 bytes (19.99% of keys, avg size 6865.68)
1 MBbloom--s with 280 bytes (00.00% of keys, avg size 280.00)
4 ReJSON-RLs with 1483 bytes (00.01% of keys, avg size 370.75)

--keystats has more data to present as we combine --memkeys and --bigkeys.
I think it is more readeable using a table, but the inconsistency comment is valid, if we think consistency is more important than readeability.
I would say that most data presentation in --keystats breaks from --memkeys and --bigkeys.

@zuiderkwast
Copy link
Contributor

You are correct about restarting at a cursor other than 0. It will be nicer to restart at the last percentage, but as you mentioned, I am not sure we can find a way to do so. I guess the user should be ok knowing that using --cursor will restart from 0% and will probably not reach 100%. In addition, between the scans, we might have added or remove some keys.

I guess you're right. I don't have a better idea, so I think this is fine.

The worry being to break any automations that rely on the output of existing commands. I am not sure the output of --memkeys, --bigkeys, and --hotkeys is designed to be parsed, but some users are creative.

I think we cannot break the non-interactive output which the creative user can use in scripts, but in interactive mode it's OK to change things like progress bar.

If we think we should change the output of existing commands to be more consistent, we could do it in another pull request.

Yes, but my experience says if we postpone it, it will not be done at all. Bigkeys is faster than keystats when I tried, so some users may still want to continue to use it.

For the Redis Data Type summary difference, I did the change for readeability purpose.

Of course. I like the table, but it just the design of the heading and table header row (start with three dashes, pipe between columns) which is not a style we used elsewhere. I think it looks like some kind of markup/down syntax.

How about this style? (more graphic, less like a markup syntax)

Type        Total keys  Keys % Tot size Avg size  Total length/card Avg ln/card
--------- ------------ ------- -------- -------- ------------------ -----------
list             10001  20.00%    2.25G  235.39K       188538 items       18.85
string           10003  20.00%   12.73M    1.30K             11.05M       1.13K
MBbloomCF            1   0.00%    1.11K    1.11K                 -           -
...

For headings, instead of three dashes in the beginning of the line (like --- TOP 10 KEY SIZES), how about dashes in the beginning and in the end of the line (like --- TOP 10 KEY SIZES ---)? It is more similar to the heading in bigkeys (------- summary -------).

@yveslb
Copy link
Contributor Author

yveslb commented Feb 29, 2024

Your table presentation and trailing dashes are indeed looking better. Let me know if the output below is ok, and I can push a commit for that.

--- Top 10 key sizes ---
  1    2.04G list       "K2G"
...

--- Top size per type ---
list       "K2G" is 2.04G
...

--- Top length and cardinality per type ---
list       "list7659" has 50 items
...

Key size Percentile Total keys
-------- ---------- -----------
   2.04G  100.0000%       50014
...
Note: 0.01% size precision, Mean: 51.98K, StdDeviation: 9.32M

Key length     Percentile Total keys
-------------- ---------- -----------
           22B  100.0000%       50014
...
Total key length is 395.09K (8B avg)

Type        Total keys  Keys % Tot size Avg size  Total length/card Avg ln/card
--------- ------------ ------- -------- -------- ------------------ -----------
list             10001  20.00%    2.25G  235.39K       188538 items       18.85
...

To generalize the usage of the progress bar and allowing to use --cursor when interrupted, I can start implementing it for --keystats, --memkeys--bigkeys, --scan, and --hotkeys.
Not to break the non-interactive output, we need to only display the progress bar and cursor information in TTY, and make sure we keep the current output when redirecting to a file.

 if it's invoked like redis-cli --scan > file. Possibly we can display the progress bar and message on the TTY in that case, where it isn't mixed with the returned keys.

When redirecting to a file, you would like to still keep some information on the TTY.
If we take redis-cli --memkeys > file for instance, we will see in the terminal:

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

31.05% |||||||||||||||||||-----------------------------------------

Scan interrupted:
Use 'redis-cli --memkeys --cursor 11274' to restart from the last cursor.

And in the file:

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest zset   found so far "zadd5838" with 3240 bytes
[00.00%] Biggest zset   found so far "zadd7204" with 11419 bytes
[00.00%] Biggest list   found so far "list7659" with 67732 bytes
[30.50%] Biggest string found so far "string7372" with 72 bytes
^C
-------- summary -------

[51.77%] Sampled 25905 keys in the keyspace!
Total key length in bytes is 209634 (avg len 8.09)

Biggest   list found "K2G" has 2185363872 bytes
Biggest string found "string5150" has 6712 bytes
Biggest   hash found "hash6292" has 43404 bytes
...

If this is what you meant, I need to find out how to do that.

@zuiderkwast
Copy link
Contributor

Your table presentation and trailing dashes are indeed looking better. Let me know if the output below is ok, and I can push a commit for that.

Yes, that looks good. I'm glad you agree. :)

When redirecting to a file, you would like to still keep some information on the TTY.
If we take redis-cli --memkeys > file for instance, we will see in the terminal

Well, probably only for --scan. Here the output can be a very large list of keys, which the user probably wants to parse or use for something, while it can be nice to have some progress indicator if we are still on a TTY. It's different to the bigkeys/memkeys/hotkeys feature, and I'm not even sure it's a good idea, so I guess you're right that we should to leave this part for another PR. (An idea is if (isatty(STDERR_FILENO) && !isatty(STDOUT_FILENO)) { /* Show progress bar on stderr and print data on stdout. */ }.)

For bigkeys/memkeys/hotkeys, I think we can simply use similar logic as you did for keystats, i.e. show the progressbar if STDOUT is a TTY; otherwise hide it.

I see you're checking config.output == OUTPUT_STANDARD to determine if the progress bar animation should be displayed or not. This can be messed up by redis-cli --keystats --no-raw > file (progressbar updates stored inside the file, which becomes huge). So I think we should instead check isatty(STDOUT_FILENO) || getenv("FAKETTY").

@yveslb
Copy link
Contributor Author

yveslb commented Mar 1, 2024

I have pushed the new format for the tables and titles.

You are right, when redirecting to a file we only need the percentage and not the progress bar. I actually show the progress bar in the file. I will remove it.

Good catch on redis-cli --keystats --no-raw > file, I did not think about --no-raw. Refreshing the output requires to be more careful. Thank you for the help, isatty(STDOUT_FILENO) || getenv("FAKETTY") works like a charm. I will test some more and check where to add the progress bar to --memkeys--bigkeys, and --hotkeys, only if STDOUT is a TTY.

@yveslb
Copy link
Contributor Author

yveslb commented Mar 11, 2024

@zuiderkwast
I updated findBigKeys() and findHotKeys() to have the progress bar in TTY and the previous output when redirecting to a file.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine now.

It's a lot of lines added and I didn't read all of them very carefully, but the code is isolated and doesn't affect the basic operation of redis-cli, so I think it's safe to merge.

@enjoy-binbin Any comments? Have you tested it?

src/redis-cli.c Outdated Show resolved Hide resolved
@zuiderkwast zuiderkwast added the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Mar 11, 2024
Typo

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@yveslb
Copy link
Contributor Author

yveslb commented Mar 12, 2024

@enjoy-binbin raised two points:

Should we check that config.memkeys_samples is not negative?

We should. I added some checks after using strtoll().

Do we need to check hdr_record_value() return value?

The function returns false if the value we are trying to insert is is negative or larger than the highest_trackable_value.
We are inserting key sizes. We use unsigned long long for a key size and therefore cannot be negative.
In Keystats, the highest_trackable_value is set to 1 TB. It is unlikely we will have a key that large but we rather know if we cannot insert a key size.

Speaking of not checking a return value, I realized that I do not need to check the return value of zmalloc() and I did it... let me know if I should remove the check.

Copy link
Collaborator

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don’t have a very in-depth review, but looking at the output and the modified diff, the code is LGTM.

Speaking of not checking a return value, I realized that I do not need to check the return value of zmalloc() and I did it... let me know if I should remove the check.

yes, i think we can drop the check, it adds some code (which is good in generally) but is a bit overkill i think.

@yveslb
Copy link
Contributor Author

yveslb commented Mar 12, 2024

@enjoy-binbin @zuiderkwast
I have pushed a new commit to remove the unnecessary check of zmalloc().
Let me know if there is anything else I should do.
Thank you for your help, I appreciate it.

@CLAassistant
Copy link

CLAassistant commented Mar 24, 2024

CLA assistant check
All committers have signed the CLA.

@yveslb
Copy link
Contributor Author

yveslb commented Apr 9, 2024

@yveslb I submit a code style commit, please have a look.

Thanks for all the fix up!

@yveslb
Copy link
Contributor Author

yveslb commented Apr 11, 2024

@sundb let me know if you have more suggestions.
One comment. As suggested previously, the progress bar from keyStats() was added to findBigKeys() and findHotKeys(). As we did not want to break some potential automations created by the users, we kept the original output when redirecting to a file. However, this makes findBigKeys() and findHotKeys() a bit messy. I guess that's ok as we should see it from the user point of view.

@sundb
Copy link
Collaborator

sundb commented Apr 11, 2024

@yveslb thanks, I'm now busy in other place, come back later.

Copy link
Collaborator

@sundb sundb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
Co-authored-by: debing.sun <debing.sun@redis.com>
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated
Comment on lines 10021 to 10031
typedef struct size_dist_entry {
unsigned long long size;
unsigned long long count;
} size_dist_entry;

typedef struct size_dist {
unsigned long long total_count;
unsigned long long total_size;
unsigned long long max_size;
size_dist_entry *size_dist;
} size_dist;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about adding some comments for these parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest commit.

src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
src/redis-cli.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@sundb sundb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sundb sundb changed the title redis-cli --keystats and --keystats-samples. With --top and --cursor redis-cli --keystats and --keystats-samples with --top and --cursor May 25, 2024
@sundb sundb merged commit 6801a3c into redis:unstable May 25, 2024
13 checks passed
@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes indication that this issue needs to be mentioned in the release notes state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants