Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NVME Drive reported as failed but smartctl and scruyiny analysis says otherwise #635

Closed
George-RG opened this issue May 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@George-RG
Copy link

Describe the bug
Scrutiny reports that the drive has failed the SMART test even though all the S.M.A.R.T NVME ATTRIBUTES shown on the details page say that the drive passed. Furthermore the command smartctl --xall --json --device nvme /dev/nvme0 reports

"smart_status": { "passed": true, "nvme": { "value": 0 } },

Expected behavior
I would assume that the expected behavior is to report a pass on the SMART test

Screenshots
image

Log Files

docker exec scrutiny smartctl --xall --device nvme /dev/nvme0 result:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.0-31-generic] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SK hynix BC711 HFM256GD3JX013N
Serial Number:                      FYB1N055410401Q0W
Firmware Version:                   HPS1
PCI Vendor/Subsystem ID:            0x1c5c
IEEE OUI Identifier:                0xace42e
Total NVM Capacity:                 256,060,514,304 [256 GB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            ace42e 002a128340
Local Time is:                      Thu May  9 15:33:04 2024 UTC
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x001f):   Security Format Frmw_DL NS_Mngmt Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     84 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +   6.3000W       -        -    0  0  0  0        5       5
 1 +   2.4000W       -        -    1  1  1  1       30      30
 2 +   1.9000W       -        -    2  2  2  2      100     100
 3 -   0.0500W       -        -    3  3  3  3     1000    1000
 4 -   0.0040W       -        -    3  3  3  3     1000    9000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        39 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    0%
Data Units Read:                    1,120,044 [573 GB]
Data Units Written:                 1,284,587 [657 GB]
Host Read Commands:                 17,954,557
Host Write Commands:                18,957,089
Controller Busy Time:               30
Power Cycles:                       170
Power On Hours:                     161
Unsafe Shutdowns:                   107
Media and Data Integrity Errors:    0
Error Information Log Entries:      5
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               39 Celsius
Temperature Sensor 2:               47 Celsius

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

docker info output:

Client: Docker Engine - Community
 Version:    26.1.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 25
  Running: 22
  Paused: 0
  Stopped: 3
 Images: 26
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.0-31-generic
 Operating System: Ubuntu 24.04 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 15.53GiB
 Name: homeserver
 ID: 455bdd3e-f7ba-4978-a605-762c6cc2e75c
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
@George-RG George-RG added the bug Something isn't working label May 9, 2024
@George-RG
Copy link
Author

It turns out that all I needed to do was to clear the database and let the collector fill it again as described in the docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant