Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing battery metrics on debian - power_supply class charge_control_end_threshold ": no such device" #3019

Closed
marvin-sinister opened this issue May 16, 2024 · 2 comments · Fixed by prometheus/procfs#641

Comments

@marvin-sinister
Copy link

Host operating system: output of uname -a

Linux node1 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.8.0 (branch: debian/sid, revision: 1.8.0-1)
build user: team+pkg-go@tracker.debian.org
build date: 20240424-20:05:35
go version: go1.22.2
platform: linux/amd64
tags: unknown

node_exporter command line flags

It's started with systemd using:

EnvironmentFile=/etc/default/prometheus-node-exporter
ExecStart=/usr/bin/prometheus-node-exporter $ARGS

and $ARGS is empty:

# cat /etc/default/prometheus-node-exporter 
# Set the command-line arguments to pass to the server.
# Due to shell escaping, to pass backslashes for regexes, you need to double
# them (\\d for \d). If running under systemd, you need to double them again
# (\\\\d to mean \d), and escape newlines too.
ARGS=""

node_exporter log output

This line repeats:

May 16 10:19:56 node1 prometheus-node-exporter[1064]: ts=2024-05-16T08:19:56.181Z caller=collector.go:169 level=error msg="collector failed" name=powersupplyclass duration_seconds=0.006216025 err="could not get power_supply class info: error obtaining power_supply class info: failed to read file \"/sys/class/power_supply/BAT0/charge_control_end_threshold\": no such device"

Are you running node_exporter in Docker?

no

What did you do that produced an error?

started node exporter

What did you expect to see?

no error and battery stats in metrics

What did you see instead?

error and no battery stats in metrics

Some potentially useful info:

# ls -l /sys/class/power_supply/BAT0/
total 0
-rw-r--r-- 1 root root 4096 May 16 10:18 alarm
-r--r--r-- 1 root root 4096 May 16 10:18 capacity
-r--r--r-- 1 root root 4096 May 16 10:18 capacity_level
-rw-r--r-- 1 root root 4096 May 16 10:18 charge_behaviour
-rw-r--r-- 1 root root 4096 May 16 10:18 charge_control_end_threshold
-rw-r--r-- 1 root root 4096 May 16 10:18 charge_control_start_threshold
-rw-r--r-- 1 root root 4096 May 16 10:35 charge_start_threshold
-rw-r--r-- 1 root root 4096 May 16 10:35 charge_stop_threshold
-r--r--r-- 1 root root 4096 May 16 10:35 cycle_count
lrwxrwxrwx 1 root root    0 May 16 10:17 device -> ../../../PNP0C0A:00
-r--r--r-- 1 root root 4096 May 16 10:35 energy_full
-r--r--r-- 1 root root 4096 May 16 10:35 energy_full_design
-r--r--r-- 1 root root 4096 May 16 10:35 energy_now
drwxr-xr-x 3 root root    0 May 16 10:17 hwmon1
-r--r--r-- 1 root root 4096 May 16 10:35 manufacturer
-r--r--r-- 1 root root 4096 May 16 10:35 model_name
drwxr-xr-x 2 root root    0 May 16 10:18 power
-r--r--r-- 1 root root 4096 May 16 10:35 power_now
-r--r--r-- 1 root root 4096 May 16 10:17 present
-r--r--r-- 1 root root 4096 May 16 10:35 serial_number
-r--r--r-- 1 root root 4096 May 16 10:17 status
lrwxrwxrwx 1 root root    0 May 16 10:17 subsystem -> ../../../../../../../../../class/power_supply
-r--r--r-- 1 root root 4096 May 16 10:35 technology
-r--r--r-- 1 root root 4096 May 16 10:17 type
-rw-r--r-- 1 root root 4096 May 16 10:17 uevent
-r--r--r-- 1 root root 4096 May 16 10:35 voltage_min_design
-r--r--r-- 1 root root 4096 May 16 10:35 voltage_now
# cat /sys/class/power_supply/BAT0/charge_control_end_threshold 
cat: /sys/class/power_supply/BAT0/charge_control_end_threshold: No such device
# tlp-stat -b
--- TLP 1.5.0 --------------------------------------------

+++ Battery Care
Plugin: thinkpad-legacy
Supported features: charge thresholds, recalibration
Driver usage:
* tp-smapi (tp_smapi) = active (status, charge thresholds, recalibration)
Parameter value ranges:
* START_CHARGE_THRESH_BAT0/1:  2..96(default)
* STOP_CHARGE_THRESH_BAT0/1:   6..100(default)

+++ ThinkPad Battery Status: BAT0 (Main / Internal)
/sys/devices/platform/smapi/BAT0/manufacturer               = SANYO
/sys/devices/platform/smapi/BAT0/model                      = COMPATIBLE
/sys/devices/platform/smapi/BAT0/manufacture_date           = 2021-10-07
/sys/devices/platform/smapi/BAT0/first_use_date             = 2023-07-21
/sys/devices/platform/smapi/BAT0/cycle_count                =      4
/sys/devices/platform/smapi/BAT0/temperature                =     25 [°C]
/sys/devices/platform/smapi/BAT0/design_capacity            =  47520 [mWh]
/sys/devices/platform/smapi/BAT0/last_full_capacity         =  44610 [mWh]
/sys/devices/platform/smapi/BAT0/remaining_capacity         =  44610 [mWh]
/sys/devices/platform/smapi/BAT0/remaining_percent          =    100 [%]
/sys/devices/platform/smapi/BAT0/remaining_running_time_now = not_discharging [min]
/sys/devices/platform/smapi/BAT0/remaining_charging_time    = not_charging [min]
/sys/devices/platform/smapi/BAT0/power_now                  =      0 [mW]
/sys/devices/platform/smapi/BAT0/power_avg                  =      0 [mW]
/sys/devices/platform/smapi/BAT0/state                      = idle

/sys/devices/platform/smapi/BAT0/start_charge_thresh        =     96 [%]
/sys/devices/platform/smapi/BAT0/stop_charge_thresh         =    100 [%]
/sys/devices/platform/smapi/BAT0/force_discharge            =      0

Charge                                                      =  100.0 [%]
Capacity                                                    =   93.9 [%]
@dswarbrick
Copy link
Contributor

It's highly likely that this error is originating from the parsePowerSupply function in https://github.com/prometheus/procfs/blob/master/sysfs/class_power_supply.go, which attempts to read every file in the child dirs of /sys/class/power_supply/*.

There is some error handling in that function to handle read failures:

		if err != nil {
			if os.IsNotExist(err) || err.Error() == "operation not supported" || errors.Is(err, os.ErrInvalid) {
				continue
			}
			return nil, fmt.Errorf("failed to read file %q: %w", name, err)
		}

This would need to be expanded to handle the ENODEV returned when trying to read the charge_control_end_threshold file.

I have previously advised against this approach of "read every file and see what sticks", since it is inherently fragile. New entries are regularly appearing in sysfs, and applying such a generic approach often leads to failures like this. It might be a little more work to explicitly specify which files to read, rather than just globbing over a very liberal wildcard pattern, but it results in more robust code that won't freak out when something new appears.

@dswarbrick
Copy link
Contributor

The ENODEV likely originates in the tpacpi_battery_show function (https://github.com/torvalds/linux/blob/master/drivers/platform/x86/thinkpad_acpi.c#L9628), which is called when trying to read the charge_control_{end,start}_threshold entries on Thinkpad systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants