Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sar -B produces incorrect vmeff% in sysstat-11.7.3-7.el8.x86_64 #343

Open
gleventhal opened this issue Oct 19, 2022 · 6 comments
Open

sar -B produces incorrect vmeff% in sysstat-11.7.3-7.el8.x86_64 #343

gleventhal opened this issue Oct 19, 2022 · 6 comments

Comments

@gleventhal
Copy link

sysstat-11.7.3-7.el8.x86_64
It should be pgsteal / pgscan but it seems it's now: pgsteal / pgscan * 100

06:01:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
06:02:11 PM      0.00     95.00   6422.00      0.00   9790.00      0.00    141.00    274.00    194.33
06:02:21 PM      0.00      0.00   2608.00      0.00   2889.00      0.00     65.00    130.00    200.00
@lzaoral
Copy link
Contributor

lzaoral commented Aug 7, 2023

@sysstat I've done some analysis regarding this issue and it may affect any system with a reasonably recent kernel.

The method to compute %vmeff is correct because the corresponding source code did not change between the versions of sysstat in RHEL 7, 8 or even 9.

What has, however, changed are the contents of /proc/vmstat which are used to compute the values of pgscan columns. All recent versions of sysstat parse the /proc/vmstat file and sum all values with pgscan_direct and pgscan_kswapd prefixes which then correspond to the pgscank and pgscand columns produced by sar.

See the following experiment for a list of pgscan fields offered by given kernel versions in /proc/vmstat:

  • RHEL 7 (kernel-3.10.0-1160.95.1.el7.x86_64):
$ grep pgscan /proc/vmstat | cut -d' ' -f1
pgscan_kswapd_dma
pgscan_kswapd_dma32
pgscan_kswapd_normal
pgscan_kswapd_movable
pgscan_direct_dma
pgscan_direct_dma32
pgscan_direct_normal
pgscan_direct_movable
pgscan_direct_throttle
  • RHEL 8 (kernel-4.18.0-506.el8.x86_64):
$ grep pgscan /proc/vmstat | cut -d' ' -f1
pgscan_kswapd
pgscan_direct
pgscan_direct_throttle
pgscan_anon
pgscan_file
  • RHEL 9 (kernel-5.14.0-347.el9.x86_64)
$ grep pgscan /proc/vmstat | cut -d' ' -f1
pgscan_kswapd
pgscan_direct
pgscan_direct_throttle
pgscan_anon
pgscan_file
  • Fedora 38 (kernel-6.4.6-200.fc38.x86_64)
$ grep pgscan /proc/vmstat | cut -d' ' -f1
pgscan_kswapd
pgscan_direct
pgscan_khugepaged
pgscan_direct_throttle
pgscan_anon
pgscan_file

The pgscan_anon, pgscan_file and pgscan_khugepaged fields on newer kernels are ignored by sysstat which is the reason why the number of stolen pages may be higher than the number of scanned pages. Thus, sar may produce %vmeff values that are not correct.

edit: typo

@sysstat
Copy link
Owner

sysstat commented Aug 17, 2023

@lzaoral Thanks for your analysis.
The solution is then probably to sum all fields from /proc/vmstat starting with pgscan_.
If new fields using this prefix are added in the future, they will be taken into account.

@sysstat
Copy link
Owner

sysstat commented Aug 17, 2023

I'm also wondering whether %vmeff should still be displayed by sar or not. This is more a kernel metric than a system one and as such, it should probably be discarded...?

sysstat added a commit that referenced this issue Aug 19, 2023
Remove %vmeff metric displayed by sar -B (paging statistics).
With recent kernels, this metric was wrongly calculated. Decision was
made to remove it as it was more a kernel metric than a system one.

Signed-off-by: Sebastien GODARD <sysstat@users.noreply.github.com>
@petervanhooft
Copy link

I find %vmeff a handy proxy to see if there is memory pressure.

@gleventhal
Copy link
Author

I also find it a useful metric when the math makes sense. I don't see a valuable distinction between kernel and system metrics, that's a fungible thing from where I stand.

@gleventhal
Copy link
Author

Can't we just test for the behavior and conditionally do the correct thing to get the expected value? I know it's a bit of a shim, but I'd much prefer an if statement or 2 over losing a stat that I value for troubleshooting vm issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants