Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smart plugin causes "num_err_log_entries" to increase on a Seagate FireCuda 530 NVMe drive #4127

Open
viulian opened this issue Jul 15, 2023 · 0 comments · May be fixed by #4128 or #3974
Open

smart plugin causes "num_err_log_entries" to increase on a Seagate FireCuda 530 NVMe drive #4127

viulian opened this issue Jul 15, 2023 · 0 comments · May be fixed by #4128 or #3974
Labels
Bug A genuine bug

Comments

@viulian
Copy link

viulian commented Jul 15, 2023

  • Version of collectd: 5.12.0
  • Operating system / distribution: Fedora 37
  • Kernel version (if applicable): 6.3.8-100.fc37.x86_64

Expected behavior

I expect that when enabled, the smart plugin does not increase the num_err_log_entries of a Seagate FireCuda 530 NVMe drive.

Actual behavior

With the smart plugin enabled, every minute, the num_err_log_entries values increments by one

Steps to reproduce

I have few hard disks on my server, including a Samsung SSD 980 PRO 2T. collectd is configured with the smart plugin and worked perfectly fine with the Samsung NVMe.
But today I added a new Seagate FireCuda 530. After a reboot, I was curious to see if the smart plugin picks it up - and it did, however, I spotted that num_err_log_entries was increasing.

I found this https://www.osso.nl/blog/kioxia-nvme-num-err-log-entries-0xc004-smartctl/ website describing a similar problem - but there smartctl was used directly and smartctl bug was fixed long time ago. This pointed me into the direction of collectd / smart plugin, and thus, I started testing.

Without a <Plugin "smart"> tag (thus, auto-detect I assume), or with the drive enabled -> the num_err_log_entries is increasing. The only way to not have it increase is to disable it from being monitored:

<Plugin "smart">
        Disk "sda"
        Disk "sdb"
        Disk "sdc"
        Disk "sdd"
        Disk "sde"
       # Disk "nvme0n1" 
        Disk "nvme1n1"
        IgnoreSelected false
</Plugin>

The error reported is:

# nvme error-log /dev/nvme0n1
Error Log Entries for device:nvme0n1 entries:63
.................
 Entry[ 0]
.................
error_count     : 52
sqid            : 0
cmdid           : 0x9010
status_field    : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag       : 0
parm_err_loc    : 0x4
lba             : 0
nsid            : 0xffffffff
vs              : 0
trtype          : The transport type is not indicated or the error is not transport related.
cs              : 0
trtype_spec_info: 0
.................
 Entry[ 1]
.................
error_count     : 51
sqid            : 0
cmdid           : 0xa014
status_field    : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag       : 0
parm_err_loc    : 0x4
lba             : 0
nsid            : 0xffffffff
vs              : 0
trtype          : The transport type is not indicated or the error is not transport related.
cs              : 0
trtype_spec_info: 0
.................
 Entry[ 2]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A genuine bug
Projects
None yet
2 participants