Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to list Namespaces on initiator due to incorrect ANA state from preferred gateway #506

Open
sunilkumarn417 opened this issue Mar 15, 2024 · 8 comments

Comments

@sunilkumarn417
Copy link

Unable to list namespaces hosted from one of the subsytem which is directly asssociated to Gateway (say GW1) Via load-balancing-group id (say 1).

**Gateways and its details**
---------------------------

[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 gw info
CLI's version: 1.0.0
Gateway's version: 1.0.0
Gateway's name: client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node6.qzgyja
Gateway's load balancing group: 3
Gateway's address: 10.0.208.213
Gateway's port: 5500
SPDK version: 23.01.1

[root@ceph-1sunilkumar-z1afhw-node7 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.209.68 --server-port 5500 gw info
CLI's version: 1.0.0
Gateway's version: 1.0.0
Gateway's name: client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node7.vnbduo
Gateway's load balancing group: 1
Gateway's address: 10.0.209.68
Gateway's port: 5500
SPDK version: 23.01.1

**Namespaces associated with LBgroup Ids**
-------------------------------------------
[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 namespace list -n nqn.2016-06.io.spdk:sub1
Namespaces in subsystem nqn.2016-06.io.spdk:sub1:
╒════════╤════════════════════════╤════════╤══════════════╤═════════╤═════════╤═════════════════════╤═════════════╤═══════════╤═══════════╤════════════╤═════════════╕
│   NSID │ Bdev                   │ RBD    │ RBD          │ Image   │ Block   │ UUID                │        Load │ R/W IOs   │ R/W MBs   │ Read MBs   │ Write MBs   │
│        │ Name                   │ Pool   │ Image        │ Size    │ Size    │                     │   Balancing │ per       │ per       │ per        │ per         │
│        │                        │        │              │         │         │                     │       Group │ second    │ second    │ second     │ second      │
╞════════╪════════════════════════╪════════╪══════════════╪═════════╪═════════╪═════════════════════╪═════════════╪═══════════╪═══════════╪════════════╪═════════════╡
│      1 │ bdev_836f98c2-a90d-    │ rbd    │ sub1_image_1 │ 10 GiB  │ 512 B   │ 836f98c2-a90d-4bf7- │           1 │ unlimited │ unlimited │ unlimited  │ unlimited   │
│        │ 4bf7-9942-7160e04eb81f │        │              │         │         │ 9942-7160e04eb81f   │             │           │           │            │             │
├────────┼────────────────────────┼────────┼──────────────┼─────────┼─────────┼─────────────────────┼─────────────┼───────────┼───────────┼────────────┼─────────────┤
│      2 │ bdev_bff93e4f-099a-    │ rbd    │ sub1_image_2 │ 10 GiB  │ 512 B   │ bff93e4f-099a-40dc- │           1 │ unlimited │ unlimited │ unlimited  │ unlimited   │
│        │ 40dc-8dba-c31b7bcd14e9 │        │              │         │         │ 8dba-c31b7bcd14e9   │             │           │           │            │             │
╘════════╧════════════════════════╧════════╧══════════════╧═════════╧═════════╧═════════════════════╧═════════════╧═══════════╧═══════════╧════════════╧═════════════╛


[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 namespace list -n nqn.2016-06.io.spdk:sub2
Namespaces in subsystem nqn.2016-06.io.spdk:sub2:
╒════════╤════════════════════════╤════════╤══════════════╤═════════╤═════════╤═════════════════════╤═════════════╤═══════════╤═══════════╤════════════╤═════════════╕
│   NSID │ Bdev                   │ RBD    │ RBD          │ Image   │ Block   │ UUID                │        Load │ R/W IOs   │ R/W MBs   │ Read MBs   │ Write MBs   │
│        │ Name                   │ Pool   │ Image        │ Size    │ Size    │                     │   Balancing │ per       │ per       │ per        │ per         │
│        │                        │        │              │         │         │                     │       Group │ second    │ second    │ second     │ second      │
╞════════╪════════════════════════╪════════╪══════════════╪═════════╪═════════╪═════════════════════╪═════════════╪═══════════╪═══════════╪════════════╪═════════════╡
│      1 │ bdev_3d18c920-f869-    │ rbd    │ sub2_image_2 │ 10 GiB  │ 512 B   │ 3d18c920-f869-4f26- │           3 │ unlimited │ unlimited │ unlimited  │ unlimited   │
│        │ 4f26-a2c5-e83624e919d6 │        │              │         │         │ a2c5-e83624e919d6   │             │           │           │            │             │
├────────┼────────────────────────┼────────┼──────────────┼─────────┼─────────┼─────────────────────┼─────────────┼───────────┼───────────┼────────────┼─────────────┤
│      2 │ bdev_ee654a71-0c2b-    │ rbd    │ sub2_image_1 │ 10 GiB  │ 512 B   │ ee654a71-0c2b-4094- │           3 │ unlimited │ unlimited │ unlimited  │ unlimited   │
│        │ 4094-9caf-a62427890f6d │        │              │         │         │ 9caf-a62427890f6d   │             │           │           │            │             │
╘════════╧════════════════════════╧════════╧══════════════╧═════════╧═════════╧═════════════════════╧═════════════╧═══════════╧═══════════╧════════════╧═════════════╛

RBD images at Ceph-RBD
---------------------------
[ceph: root@ceph-1sunilkumar-z1afhw-node1-installer /]# rbd du
NAME          PROVISIONED  USED
sub1_image_1       10 GiB   0 B
sub1_image_2       10 GiB   0 B
sub2_image_1       10 GiB   0 B
sub2_image_2       10 GiB   0 B
<TOTAL>            40 GiB   0 B


**Listeners enabled on all subsystems for all Gateways**
------------------------------------------------------
[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 listener list -n nqn.2016-06.io.spdk:sub1
Listeners for nqn.2016-06.io.spdk:sub1:
╒════════════════════════════════════════════════════════╤═════════════╤══════════════════╤═══════════════════╕
│ Gateway                                                │ Transport   │ Address Family   │ Address           │
╞════════════════════════════════════════════════════════╪═════════════╪══════════════════╪═══════════════════╡
│ client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node6.qzgyja │ TCP         │ IPv4             │ 10.0.208.213:4420 │
├────────────────────────────────────────────────────────┼─────────────┼──────────────────┼───────────────────┤
│ client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node7.vnbduo │ TCP         │ IPv4             │ 10.0.209.68:4420  │
╘════════════════════════════════════════════════════════╧═════════════╧══════════════════╧═══════════════════╛


[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 listener list -n nqn.2016-06.io.spdk:sub2
Listeners for nqn.2016-06.io.spdk:sub2:
╒════════════════════════════════════════════════════════╤═════════════╤══════════════════╤═══════════════════╕
│ Gateway                                                │ Transport   │ Address Family   │ Address           │
╞════════════════════════════════════════════════════════╪═════════════╪══════════════════╪═══════════════════╡
│ client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node6.qzgyja │ TCP         │ IPv4             │ 10.0.208.213:4420 │
├────────────────────────────────────────────────────────┼─────────────┼──────────────────┼───────────────────┤
│ client.nvmeof.rbd.ceph-1sunilkumar-z1afhw-node7.vnbduo │ TCP         │ IPv4             │ 10.0.209.68:4420  │
╘════════════════════════════════════════════════════════╧═════════════╧══════════════════╧═══════════════════╛

At Client Side

As we can notice below, the namespaces from subsystem1 is not connected.

[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# nvme connect-all --traddr 10.0.209.68 --transport=tcp

[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme3n2          /dev/ng3n2            2                    Ceph bdev Controller                     0x2         10.74  GB /  10.74  GB    512   B +  0 B   23.01.1
/dev/nvme3n1          /dev/ng3n1            2                    Ceph bdev Controller                     0x1         10.74  GB /  10.74  GB    512   B +  0 B   23.01.1

[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0      11:0    1  514K  0 rom
vda     252:0    0   80G  0 disk
├─vda1  252:1    0    1M  0 part
├─vda2  252:2    0  200M  0 part /boot/efi
├─vda3  252:3    0  600M  0 part /boot
└─vda4  252:4    0 79.2G  0 part /
nvme3n1 259:5    0   10G  0 disk        ---> nvme namespace from nqn.2016-06.io.spdk:sub2
nvme3n2 259:7    0   10G  0 disk        ---> nvme namespace from nqn.2016-06.io.spdk:sub2

[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# nvme list-subsys
nvme-subsys3 - NQN=nqn.2016-06.io.spdk:sub2
\
 +- nvme3 tcp traddr=10.0.208.213,trsvcid=4420,src_addr=10.0.210.4 live
 +- nvme4 tcp traddr=10.0.209.68,trsvcid=4420,src_addr=10.0.210.4 live
nvme-subsys1 - NQN=nqn.2016-06.io.spdk:sub1
\
 +- nvme2 tcp traddr=10.0.209.68,trsvcid=4420,src_addr=10.0.210.4 live
 +- nvme1 tcp traddr=10.0.208.213,trsvcid=4420,src_addr=10.0.210.4 live


[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# nvme list-subsys /dev/nvme3n1
nvme-subsys3 - NQN=nqn.2016-06.io.spdk:sub2
\
 +- nvme3 tcp traddr=10.0.208.213,trsvcid=4420,src_addr=10.0.210.4 live optimized
 +- nvme4 tcp traddr=10.0.209.68,trsvcid=4420,src_addr=10.0.210.4 live inaccessible
[root@ceph-1sunilkumar-z1afhw-node8 cephuser]#


[root@ceph-1sunilkumar-z1afhw-node8 cephuser]# nvme list-subsys /dev/nvme3n2
nvme-subsys3 - NQN=nqn.2016-06.io.spdk:sub2
\
 +- nvme3 tcp traddr=10.0.208.213,trsvcid=4420,src_addr=10.0.210.4 live optimized
 +- nvme4 tcp traddr=10.0.209.68,trsvcid=4420,src_addr=10.0.210.4 live inaccessible

ANA States from both Gateways

GW: node6 , LBGroup-Id :3
-----------------------------
[root@ceph-1sunilkumar-z1afhw-node6 src]# /usr/libexec/spdk/scripts/rpc.py  nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:sub1 | head -n 24
[
  {
    "address": {
      "trtype": "TCP",
      "adrfam": "IPv4",
      "traddr": "10.0.208.213",
      "trsvcid": "4420"
    },
    "ana_states": [
      {
        "ana_group": 1,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 2,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 3,
        "ana_state": "optimized"
      },
      {
        "ana_group": 4,
        "ana_state": "inaccessible"

[root@ceph-1sunilkumar-z1afhw-node6 src]# /usr/libexec/spdk/scripts/rpc.py  nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:sub2 | head -n 24
[
  {
    "address": {
      "trtype": "TCP",
      "adrfam": "IPv4",
      "traddr": "10.0.208.213",
      "trsvcid": "4420"
    },
    "ana_states": [
      {
        "ana_group": 1,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 2,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 3,
        "ana_state": "optimized"
      },
      {
        "ana_group": 4,
        "ana_state": "inaccessible"

GW : node7, LBGroup ID: 1
---------------------------
[root@ceph-1sunilkumar-z1afhw-node7 src]# /usr/libexec/spdk/scripts/rpc.py  nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:sub1 | head -n 24
[
  {
    "address": {
      "trtype": "TCP",
      "adrfam": "IPv4",
      "traddr": "10.0.209.68",
      "trsvcid": "4420"
    },
    "ana_states": [
      {
        "ana_group": 1,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 2,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 3,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 4,
        "ana_state": "inaccessible"
[root@ceph-1sunilkumar-z1afhw-node7 src]# /usr/libexec/spdk/scripts/rpc.py  nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:sub2 | head -n 24
[
  {
    "address": {
      "trtype": "TCP",
      "adrfam": "IPv4",
      "traddr": "10.0.209.68",
      "trsvcid": "4420"
    },
    "ana_states": [
      {
        "ana_group": 1,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 2,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 3,
        "ana_state": "inaccessible"
      },
      {
        "ana_group": 4,
        "ana_state": "inaccessible"



@sunilkumarn417
Copy link
Author

Ceph Builds

http://quay.io/barakda1/ceph:47ea673ae9ebf51b2ebc505093bd7272422045e4
quay.io/barakda1/nvmeof:8677ba3
quay.io/barakda1/nvmeof-cli:8677ba3

@caroav
Copy link
Collaborator

caroav commented Mar 16, 2024

@sunilkumarn417 can you please add the output of: host list -n , for both subsystems.

@sunilkumarn417
Copy link
Author

@sunilkumarn417 can you please add the output of: host list -n , for both subsystems.

[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 host list -n nqn.2016-06.io.spdk:sub2
Hosts allowed to access nqn.2016-06.io.spdk:sub2:
╒════════════╕
│ Host NQN │
╞════════════╡
│ Any host │
╘════════════╛
[root@ceph-1sunilkumar-z1afhw-node6 cephuser]# podman run --rm -it quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.213 --server-port 5500 host list -n nqn.2016-06.io.spdk:sub1
Hosts allowed to access nqn.2016-06.io.spdk:sub1:
╒════════════╕
│ Host NQN │
╞════════════╡
│ Any host │
╘════════════╛

@caroav
Copy link
Collaborator

caroav commented Mar 18, 2024

This happens because for some reason 10.0.208.213 is optimized only on grp3, and 10.0.209.68 is inaccessible on all. At least that's what I see when I log into the systems now. The 2 namespaces that belong to sub1 are on grp 1, so they're currently not optimized on any listener so we cannot see them.
We need to understand how it got to the situation where grp 1 is not optimized on any gw. I suspect it might be related to the issue that we have, that we don't reassign the same grp id to the same gw, after removing a gw. Not sure.
@sunilkumarn417 can you describe the steps you did to cause failover. Did you also did any ceph adm command or other command to remove/add a gw?
Also can you open the mon logs to file on this setup?

@sunilkumarn417
Copy link
Author

sunilkumarn417 commented Mar 18, 2024

@caroav These are the steps I followed,

Ceph Nodes Inventory
10.0.210.141	ceph-1sunilkumar-z1afhw-node1-installer	ceph-1sunilkumar-z1afhw-node1-installer. - MON, MGR
10.0.211.144	ceph-1sunilkumar-z1afhw-node2	ceph-1sunilkumar-z1afhw-node2  - MON, MGR
10.0.208.216	ceph-1sunilkumar-z1afhw-node3	ceph-1sunilkumar-z1afhw-node3.  - MON, OSD Node 
10.0.211.89	ceph-1sunilkumar-z1afhw-node4	ceph-1sunilkumar-z1afhw-node4.  - OSD Node
10.0.211.212	ceph-1sunilkumar-z1afhw-node5	ceph-1sunilkumar-z1afhw-node5.  - OSD Node
10.0.208.213	ceph-1sunilkumar-z1afhw-node6	ceph-1sunilkumar-z1afhw-node6.  - NVMeoF GW
10.0.209.68	ceph-1sunilkumar-z1afhw-node7	ceph-1sunilkumar-z1afhw-node7.   - NVMeoF GW
10.0.210.4	ceph-1sunilkumar-z1afhw-node8	ceph-1sunilkumar-z1afhw-node8.   - Client
10.0.208.67	ceph-1sunilkumar-z1afhw-node9	ceph-1sunilkumar-z1afhw-node9
  1. Configured Ceph cluster - MON, MGR, OSDs.
  2. Deploy NVMeoGW Service on node6 and node7.
  3. (Node6 - Load Balancing GroupId: 2 ), (Node7 - Load Balancing GroupId: 1)
  4. Performed ceph orch daemon rm nvmeofgw.node6 daemon removal.
  5. Daemon got added back with different client id and Load balancing GroupId(3).
  6. (Node6 - Load Balancing GroupId: 3 ), (Node7 - Load Balancing GroupId: 1)
  7. Created 2 subsystems - nqn.2016-06.io.spdk:sub1 nqn.2016-06.io.spdk:sub2.
  8. Accept all hosts in both subsystems host *.
  9. Added all GW clients for both subsystems.
  10. Created 2 images/namespace per each subsystem as below .
  11. sub1_image1 sub1_image2 attached with Load balancing group Id 1 under Subsystem1.
  12. sub2_image1 sub2_image2 attached with Load balancing group Id 3 under Subsystem2.
  13. At client, nvme connect-all and noticed only images from sub2 are visible.

@sunilkumarn417
Copy link
Author

Able to hit the issue again.

@manasagowri
Copy link

manasagowri commented Mar 19, 2024

Able to reproduce this issue in another cluster which I created as well.

GW1

[root@ceph-rbd1-mytest-rxmvqg-node4 ~]# podman run quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.210.179 --server-port 5500 gw info CLI's version: 1.0.0 Gateway's version: 1.0.0 Gateway's name: client.nvmeof.nvmeof.ceph-rbd1-mytest-rxmvqg-node4.bxkxze Gateway's load balancing group: 2 Gateway's address: 10.0.210.179 Gateway's port: 5500 SPDK version: 23.01.1

[root@ceph-rbd1-mytest-rxmvqg-node4 src]# /usr/libexec/spdk/scripts/rpc.py nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:cnode1 | head -n 24 [ { "address": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "10.0.210.179", "trsvcid": "4420" }, "ana_states": [ { "ana_group": 1, "ana_state": "inaccessible" }, { "ana_group": 2, "ana_state": "inaccessible" }, { "ana_group": 3, "ana_state": "inaccessible" }, { "ana_group": 4, "ana_state": "inaccessible"

[root@ceph-rbd1-mytest-rxmvqg-node4 src]# /usr/libexec/spdk/scripts/rpc.py nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:cnode2 | head -n 24 [ { "address": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "10.0.210.179", "trsvcid": "4420" }, "ana_states": [ { "ana_group": 1, "ana_state": "inaccessible" }, { "ana_group": 2, "ana_state": "inaccessible" }, { "ana_group": 3, "ana_state": "inaccessible" }, { "ana_group": 4, "ana_state": "inaccessible"

GW2

[root@ceph-rbd1-mytest-rxmvqg-node5 ~]# podman run quay.io/barakda1/nvmeof-cli:8677ba3 --server-address 10.0.208.28 --server-port 5500 gw info CLI's version: 1.0.0 Gateway's version: 1.0.0 Gateway's name: client.nvmeof.nvmeof.ceph-rbd1-mytest-rxmvqg-node5.yovvcu Gateway's load balancing group: 1 Gateway's address: 10.0.208.28 Gateway's port: 5500 SPDK version: 23.01.1

[root@ceph-rbd1-mytest-rxmvqg-node5 src]# /usr/libexec/spdk/scripts/rpc.py nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:cnode2 | head -n 24 [ { "address": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "10.0.208.28", "trsvcid": "4420" }, "ana_states": [ { "ana_group": 1, "ana_state": "optimized" }, { "ana_group": 2, "ana_state": "inaccessible" }, { "ana_group": 3, "ana_state": "inaccessible" }, { "ana_group": 4, "ana_state": "inaccessible"

[root@ceph-rbd1-mytest-rxmvqg-node5 src]# /usr/libexec/spdk/scripts/rpc.py nvmf_subsystem_get_listeners nqn.2016-06.io.spdk:cnode1 | head -n 24 [ { "address": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "10.0.208.28", "trsvcid": "4420" }, "ana_states": [ { "ana_group": 1, "ana_state": "optimized" }, { "ana_group": 2, "ana_state": "inaccessible" }, { "ana_group": 3, "ana_state": "inaccessible" }, { "ana_group": 4, "ana_state": "inaccessible"

On client:

[root@ceph-rbd1-mytest-rxmvqg-node6 ~]# nvme list-subsys nvme-subsys3 - NQN=nqn.2016-06.io.spdk:cnode2 \ +- nvme3 tcp traddr=10.0.210.179,trsvcid=4420,src_addr=10.0.208.169 live +- nvme4 tcp traddr=10.0.208.28,trsvcid=4420,src_addr=10.0.208.169 live nvme-subsys1 - NQN=nqn.2016-06.io.spdk:cnode1 \ +- nvme2 tcp traddr=10.0.208.28,trsvcid=4420,src_addr=10.0.208.169 live +- nvme1 tcp traddr=10.0.210.179,trsvcid=4420,src_addr=10.0.208.169 live [root@ceph-rbd1-mytest-rxmvqg-node6 ~]# nvme list-subsys /dev/nvme3n1 nvme-subsys3 - NQN=nqn.2016-06.io.spdk:cnode2 \ +- nvme3 tcp traddr=10.0.210.179,trsvcid=4420,src_addr=10.0.208.169 live inaccessible +- nvme4 tcp traddr=10.0.208.28,trsvcid=4420,src_addr=10.0.208.169 live optimized [root@ceph-rbd1-mytest-rxmvqg-node6 ~]# nvme list-subsys /dev/nvme1n1 nvme-subsys1 - NQN=nqn.2016-06.io.spdk:cnode1 \ +- nvme2 tcp traddr=10.0.208.28,trsvcid=4420,src_addr=10.0.208.169 live +- nvme1 tcp traddr=10.0.210.179,trsvcid=4420,src_addr=10.0.208.169 live

`[root@ceph-rbd1-mytest-rxmvqg-node6 ~]# nvme list
Node Generic SN Model Namespace Usage Format FW Rev


/dev/nvme3n5 /dev/ng3n5 2 Ceph bdev Controller 0x5 536.87 GB / 536.87 GB 512 B + 0 B 23.01.1
/dev/nvme3n4 /dev/ng3n4 2 Ceph bdev Controller 0x4 536.87 GB / 536.87 GB 512 B + 0 B 23.01.1
/dev/nvme3n3 /dev/ng3n3 2 Ceph bdev Controller 0x3 536.87 GB / 536.87 GB 512 B + 0 B 23.01.1
/dev/nvme3n2 /dev/ng3n2 2 Ceph bdev Controller 0x2 536.87 GB / 536.87 GB 512 B + 0 B 23.01.1
/dev/nvme3n1 /dev/ng3n1 2 Ceph bdev Controller 0x1 536.87 GB / 536.87 GB 512 B + 0 B 23.01.1 `

@caroav
Copy link
Collaborator

caroav commented Mar 19, 2024

The issue is that the nvmeof monitor DB has zomobie gws. It is known and being taken care of. For now, the only way to avoid this issue is:

  1. Don't use any rm commands for the gw with ceph adm.
  2. In case you have a cluster that already hit it, you need to scratch install the ceph cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants