%kb_name - %short_descr - Knowledge Portal

[lsh@ceph2401 ~]$ sudo cephadm shell
[sudo] password for lsh: 
Inferring fsid f2d0cd6e-8e43-11f0-aa90-a036bcc87e3b
Inferring config /var/lib/ceph/f2d0cd6e-8e43-11f0-aa90-a036bcc87e3b/mon.ceph2401/config
Using ceph image with id 'aade1b12b8e6' and tag 'v19' created on 2025-07-17 19:53:27 +0000 UTC
quay.io/ceph/ceph@sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102a753902f33ee16c26b6cee
[ceph: root@ceph2401 /]# ceph -s
  cluster:
    id:     f2d0cd6e-8e43-11f0-aa90-a036bcc87e3b
    health: HEALTH_WARN
            1 failed cephadm daemon(s)
 
  services:
    mon: 5 daemons, quorum ceph2401,ceph2402,ceph2405,ceph2403,ceph2404 (age 3M)
    mgr: ceph2402.rktinf(active, since 4M), standbys: ceph2401.vvyykk
    mds: 3/3 daemons up, 2 standby
    osd: 120 osds: 119 up (since 9h), 119 in (since 9h)
 
  data:
    volumes: 1/1 healthy
    pools:   7 pools, 2820 pgs
    objects: 268.74M objects, 111 TiB
    usage:   133 TiB used, 1.5 PiB / 1.6 PiB avail
    pgs:     2820 active+clean
 
  io:
    client:   367 MiB/s rd, 227 MiB/s wr, 5.73k op/s rd, 315 op/s wr
 
[ceph: root@ceph2401 /]#

Ok, so an OSD is probably down. But to get the specific error,

[ceph: root@ceph2401 /]# ceph health detail
HEALTH_WARN 1 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
    daemon osd.76 on ceph2403 is in error state

For some reason osd.76 is so dead this time I can't find a command to query what nvme it's supposed to be managing, but here's an indication:

[lsh@ceph2403 ~]$ sudo cephadm logs --name osd.76 | grep nvme
Inferring fsid f2d0cd6e-8e43-11f0-aa90-a036bcc87e3b
Apr 02 20:09:47 ceph2403 sudo[364183]:     ceph : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -x --json=o /dev/nvme6n2
Apr 02 20:09:48 ceph2403 sudo[364187]:     ceph : PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme micron_7450_mtfdkcc15t3tfr smart-log-add --json /dev/nvme6n2
Apr 03 20:03:18 ceph2403 sudo[1648196]:     ceph : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -x --json=o /dev/nvme6n2

GOTO the OSD replacement KBA.

CephadmDaemonFailed