Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerhub ceph/daemon still calls deprecated ceph-disk command #1395

Closed
sychan opened this issue Jun 8, 2019 · 25 comments
Closed

Dockerhub ceph/daemon still calls deprecated ceph-disk command #1395

sychan opened this issue Jun 8, 2019 · 25 comments
Labels

Comments

@sychan
Copy link

sychan commented Jun 8, 2019

Is this a bug report or feature request?

  • Bug Report

Bug Report

What happened:
As of 6/7/2019 the ceph/daemon:latest image still exhibits the bug that was supposed to be fixed in PR1325 labelled "daemon/osd: Migrate ceph-disk to ceph-volume"

When trying to start an OSD using the osd or osd_ceph_disk entrypoints, the container crashed out with the message:

/opt/ceph-container/bin/osd_disk_prepare.sh: line 46: ceph-disk: command not found

What you expected to happen:

The PR was merged 3/25/2019, and the metadata on the docker image shows a build date of 6/7/2019:

docker inspect ceph/daemon
[
    {
        "Id": "sha256:4ee75b1843fb05ac68a62b011802e046ceb3334354d44b2e37bc63e98c054dfd",
        "RepoTags": [
            "ceph/daemon:latest"
        ],
        "RepoDigests": [
            "ceph/daemon@sha256:fdd2bde52e17f671343c7c68f05699e03ba5b0d6eb076e5a5bb7950255b54242"
        ],
        "Parent": "",
        "Comment": "",
        "Created": "2019-06-07T15:01:34.913369611Z",

So I would expect that this problem would no longer occur. Is there a different image or a different entrypoint I should be using?

How to reproduce it (minimal and precise):

Follow instructions in https://geek-cookbook.funkypenguin.co.nz/ha-docker-swarm/shared-storage-ceph/

Environment:

  • OS (e.g. from /etc/os-release): CentOS 7.6.1810
  • Kernel (e.g. uname -a): 3.10.0-957.10.1.el7.x86_64 Improve README #1 SMP Mon Mar 18 15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Docker version (e.g. docker version): 18.09.5
  • Ceph version (e.g. ceph -v): ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)
@wiryonolau
Copy link

How should I create OSD for new disk ? I try using ceph/ceph container but it stuck...

@dsavineau
Copy link
Contributor

The osd_ceph_disk entrypoint is just a leftover and will be remove in #1349 (WIP).
For now you need to use the osd_volume_activate entrypoint [1] but this requries to create the OSD with ceph-volume before the start of the container.

[1] https://github.com/ceph/ceph-container/blob/master/src/daemon/osd_scenarios/osd_volume_activate.sh

@sychan
Copy link
Author

sychan commented Jun 10, 2019

From the man page for ceph-volume http://docs.ceph.com/docs/mimic/man/8/ceph-volume/, it looks like the create action is just composed of prepare and activate, both of which have entrypoints in the docker image.

If I'm operating purely with the containers, do I simply run the image with the osd_ceph_disk_prepare and then the osd_ceph_disk_activate actions with the appropriate environment variables for each partition? After that, the volumes are ready to be used on an ongoing basis by the osd_volume_activate action, right?

@sychan
Copy link
Author

sychan commented Jun 10, 2019

I just tried the osd_ceph_disk_prepare entrypoint, and it seems to have a reference to the ceph-disk executable as well.

Looking more closely at the scripts, it looks like they need to be updated as well. The PR for removing ceph-disk support hasn't updated osd_disk_prepare.sh for ceph-volume prepare.
Based on the volume_activate script, it looks like I should use ceph-volume lvm prepare, but can you give me a hint where to find the value for journal device?

docker run --name cephdiskprep --net=host --pid=host --privileged=true -v /var/lib/ceph:/var/lib/ceph/ -v /etc/ceph:/etc/ceph -v /dev/:/dev/ -e OSD_FORCE_ZAP=1 -e OSD_DEVICE=/dev/mapper/data-data -e OSD_TYPE=disk ceph/daemon osd_ceph_disk_prepare
2019-06-10 22:27:41  /opt/ceph-container/bin/entrypoint.sh: static: does not generate config
HEALTH_OK
/opt/ceph-container/bin/osd_disk_prepare.sh: line 46: ceph-disk: command not found

@wiryonolau
Copy link

wiryonolau commented Jun 10, 2019

The osd_ceph_disk entrypoint is just a leftover and will be remove in #1349 (WIP).
For now you need to use the osd_volume_activate entrypoint [1] but this requries to create the OSD with ceph-volume before the start of the container.

[1] https://github.com/ceph/ceph-container/blob/master/src/daemon/osd_scenarios/osd_volume_activate.sh

Can I run this using ceph/ceph docker image ( daemon-base ) ?
Or do I have to install ceph library on the host machine ?

Does this ceph/daemon:v4.0.0-stable-4.0-nautilus-centos-7-x86_64 equal to ceph/ceph:v14.2 image.
I'm success on deploying mon and mgr, but not osd due to above problem.

@wiryonolau
Copy link

wiryonolau commented Jun 11, 2019

After successfully creating ceph_mon using ceph/daemon. I try prepare one of the disk with ceph/ceph image since ceph/daemon doesn't have lvm installed.

I'm thinking that perhaps I can prepare the disk using ceph/ceph and run it later using ceph/daemon osd_volume_activate, I prefer not to install any dependency/library on the host.

docker run \
   --rm \
   -it \
   --privileged \
   -v /etc/ceph:/etc/ceph \
   -v /var/lib/ceph:/var/lib/ceph \
   -v /dev/:/dev/ \
   --network ceph_public \
ceph/ceph:v14.2.1-20190430 \
ceph-volume lvm prepare --data /dev/sdb --bluestore --no-systemd

But it stuck here, no error log or any further output (> 30 minute). Is it networking or env problem ?

Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 671c32f4-7382-4814-8b5d-f98258ba99e7

Oh and since this is testing environment the disk is only 1GB
I once running without --network options this is what I get after more then 30 minute

stderr: [errno 110] error connecting to the cluster
-->  RuntimeError: Unable to create a new OSD id

So I add the --network and make sure the container can connect to ceph_mon container

@dsavineau
Copy link
Contributor

I just tried the osd_ceph_disk_prepare entrypoint, and it seems to have a reference to the ceph-disk executable as well.

@sychan All osd_ceph_disk_* entrypoints are using ceph-disk command as mentionned by their name. You can't use them with nautilus and need to use ceph-volume.
As I mentionned earlier, the only entrypoint with ceph-volume is for the OSD activation so you need to create the OSD before running the container with osd_volume_activate.

But it stuck here, no error log or any further output (> 30 minute). Is it networking or env problem ?

@wiryonolau There's NO entrypoint for the ceph-volume prepare but this can still be done by running the prepare command with the ceph/daemon container image. Because ceph-volume depends on lvm there's some bind mounts required to be able to prepare the OSD.
You can take a look to the ceph-ansible project for that [1]

[1] https://github.com/ceph/ceph-ansible/blob/master/library/ceph_volume.py#L190-L199

@sychan
Copy link
Author

sychan commented Jun 11, 2019

More progress, seems to be working, but I need to test further. I create a logical volume at /dev/vg00/osd00 and successfully prepared it using ceph-volume lvm prepare, then verified it existed using ceph-volume lvm list:

[root@minio03 ceph]# ./ceph-volume-prepare.sh 
+ docker run --rm --name cephvolprep --net=host --ipc=host --privileged=true -v /var/lib/ceph:/var/lib/ceph/ -v /var/log/ceph:/var/log/ceph/ -v /etc/ceph:/etc/ceph -v /dev/:/dev/ -v /run/lock/lvm:/run/lock/lvm:z -v /run/lvm/:/run/lvm/ -v /var/run/udev/:/var/run/udev/:z --entrypoint ceph-volume ceph/daemon lvm prepare --bluestore --data /dev/vg00/osd00
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new edfed7a0-384b-46ad-89dd-17d75e664998
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1
Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-1
Running command: /bin/chown -h ceph:ceph /dev/vg00/osd00
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/ln -s /dev/vg00/osd00 /var/lib/ceph/osd/ceph-1/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-1/activate.monmap
 stderr: got monmap epoch 3
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-1/keyring --create-keyring --name osd.1 --add-key AQDKMABdePNvChAABvvr6n3YVDs0PPOZltF7Xw==
 stdout: creating /var/lib/ceph/osd/ceph-1/keyring
added entity osd.1 auth(key=AQDKMABdePNvChAABvvr6n3YVDs0PPOZltF7Xw==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 1 --monmap /var/lib/ceph/osd/ceph-1/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-1/ --osd-uuid edfed7a0-384b-46ad-89dd-17d75e664998 --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: vg00/osd00
[root@minio03 ceph]# ./ceph-volume-list.sh
+ docker run --rm --name cephvolprep --net=host --ipc=host --privileged=true -v /var/lib/ceph:/var/lib/ceph/ -v /var/log/ceph:/var/log/ceph/ -v /etc/ceph:/etc/ceph -v /dev/:/dev/ -v /run/lock/lvm:/run/lock/lvm:z -v /run/lvm/:/run/lvm/ -v /var/run/udev/:/var/run/udev/:z --entrypoint ceph-volume ceph/daemon lvm list --format json
{
    "1": [
        {
            "devices": [
                "/dev/sdb1", 
                "/dev/sdc1"
            ], 
            "lv_name": "osd00", 
            "lv_path": "/dev/vg00/osd00", 
            "lv_size": "9.00t", 
            "lv_tags": "ceph.block_device=/dev/vg00/osd00,ceph.block_uuid=1D9G0O-Kuce-AGgw-xEKr-rh4d-AKNh-uX6EYc,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=f94e735a-cc71-4011-9450-1d573e41bb99,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=edfed7a0-384b-46ad-89dd-17d75e664998,ceph.osd_id=1,ceph.type=block,ceph.vdo=0", 
            "lv_uuid": "1D9G0O-Kuce-AGgw-xEKr-rh4d-AKNh-uX6EYc", 
            "name": "osd00", 
            "path": "/dev/vg00/osd00", 
            "tags": {
                "ceph.block_device": "/dev/vg00/osd00", 
                "ceph.block_uuid": "1D9G0O-Kuce-AGgw-xEKr-rh4d-AKNh-uX6EYc", 
                "ceph.cephx_lockbox_secret": "", 
                "ceph.cluster_fsid": "f94e735a-cc71-4011-9450-1d573e41bb99", 
                "ceph.cluster_name": "ceph", 
                "ceph.crush_device_class": "None", 
                "ceph.encrypted": "0", 
                "ceph.osd_fsid": "edfed7a0-384b-46ad-89dd-17d75e664998", 
                "ceph.osd_id": "1", 
                "ceph.type": "block", 
                "ceph.vdo": "0"
            }, 
            "type": "block", 
            "vg_name": "vg00"
        }
    ]
}

There is a slight inconsistency in the entrypoint script - the underlying function is named osd_volume_activate, but the entrypoint scenario passed to the container is osd_ceph_volume_activate. Once I fixed that, the osd seems to have started properly:

[root@minio03 ceph]# ./ceph-osd.sh
+ docker run -d --name cephosd --net=host --restart always --pid=host --privileged=true -v /var/lib/ceph:/var/lib/ceph/ -v /etc/ceph:/etc/ceph -v /dev/:/dev/ -e OSD_ID=1 ceph/daemon osd_ceph_volume_activate
e5fa3cc8b9c8412ec44baf3d22dc2887d5ebcfb8687b436a54c88c37dc225e3a
[root@minio03 ceph]# docker logs -f cephosd
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1
Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-1
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/vg00/osd00 --path /var/lib/ceph/osd/ceph-1 --no-mon-config
Running command: /bin/ln -snf /dev/vg00/osd00 /var/lib/ceph/osd/ceph-1/block
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
--> ceph-volume lvm activate successful for osd ID: 1
2019-06-11 23:19:11  /opt/ceph-container/bin/entrypoint.sh: SUCCESS

   blah blah blah

2019-06-11 23:19:14.344 7ff28d5ed700 -1 osd.1 9 set_numa_affinity unable to identify public interface 'br0' numa node: (2) No such file or directory
2019-06-11 23:19:14.344 7ff28d5ed700  1 osd.1 9 set_numa_affinity not setting numa affinity
2019-06-11 23:19:15.531 7ff2835d9700  1 osd.1 10 state: booting -> active

@wiryonolau
Copy link

wiryonolau commented Jun 12, 2019

I try running using

https://github.com/ceph/ceph-ansible/blob/master/library/ceph_volume.py#L190-L199

I use two different network for ceph, ceph_public and ceph_private my monitor is connecting to both network and has both CEPH_PUBLIC_NETWORK and CEPH_CLUSTER_NETWORK environment set to ceph_public subnet and ceph_private subnet

docker run \
    --rm \
    --network=ceph_private \
    --privileged=true \
    -v /var/lib/ceph:/var/lib/ceph/:z \
    -v /var/log/ceph:/var/log/ceph/:z \
    -v /etc/ceph:/etc/ceph:z \
    -v /dev/:/dev/ \
    -v /run/lock/lvm:/run/lock/lvm:z \
    -v /run/lvm/:/run/lvm/ \
    -v /var/run/udev/:/var/run/udev/:z \
    --entrypoint ceph-volume \
ceph/daemon:v4.0.0-stable-4.0-nautilus-centos-7-x86_64 \
lvm prepare --bluestore --data /dev/sdb --no-systemd

But it still stuck here with no further output and process still running

Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 16832e57-3442-4cc1-a22f-7da20b677378

@wiryonolau
Copy link

More progress, seems to be working, but I need to test further. I create a logical volume at /dev/vg00/osd00 and successfully prepared it using ceph-volume lvm prepare, then verified it existed using ceph-volume lvm list:

[root@minio03 ceph]# ./ceph-volume-prepare.sh 
+ docker run --rm --name cephvolprep --net=host --ipc=host --privileged=true -v /var/lib/ceph:/var/lib/ceph/ -v /var/log/ceph:/var/log/ceph/ -v /etc/ceph:/etc/ceph -v /dev/:/dev/ -v /run/lock/lvm:/run/lock/lvm:z -v /run/lvm/:/run/lvm/ -v /var/run/udev/:/var/run/udev/:z --entrypoint ceph-volume ceph/daemon lvm prepare --bluestore --data /dev/vg00/osd00
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new edfed7a0-384b-46ad-89dd-17d75e664998
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1
Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-1
Running command: /bin/chown -h ceph:ceph /dev/vg00/osd00
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/ln -s /dev/vg00/osd00 /var/lib/ceph/osd/ceph-1/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-1/activate.monmap
 stderr: got monmap epoch 3
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-1/keyring --create-keyring --name osd.1 --add-key AQDKMABdePNvChAABvvr6n3YVDs0PPOZltF7Xw==
 stdout: creating /var/lib/ceph/osd/ceph-1/keyring
added entity osd.1 auth(key=AQDKMABdePNvChAABvvr6n3YVDs0PPOZltF7Xw==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 1 --monmap /var/lib/ceph/osd/ceph-1/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-1/ --osd-uuid edfed7a0-384b-46ad-89dd-17d75e664998 --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: vg00/osd00
[root@minio03 ceph]# ./ceph-volume-list.sh
+ docker run --rm --name cephvolprep --net=host --ipc=host --privileged=true -v /var/lib/ceph:/var/lib/ceph/ -v /var/log/ceph:/var/log/ceph/ -v /etc/ceph:/etc/ceph -v /dev/:/dev/ -v /run/lock/lvm:/run/lock/lvm:z -v /run/lvm/:/run/lvm/ -v /var/run/udev/:/var/run/udev/:z --entrypoint ceph-volume ceph/daemon lvm list --format json
{
    "1": [
        {
            "devices": [
                "/dev/sdb1", 
                "/dev/sdc1"
            ], 
            "lv_name": "osd00", 
            "lv_path": "/dev/vg00/osd00", 
            "lv_size": "9.00t", 
            "lv_tags": "ceph.block_device=/dev/vg00/osd00,ceph.block_uuid=1D9G0O-Kuce-AGgw-xEKr-rh4d-AKNh-uX6EYc,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=f94e735a-cc71-4011-9450-1d573e41bb99,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=edfed7a0-384b-46ad-89dd-17d75e664998,ceph.osd_id=1,ceph.type=block,ceph.vdo=0", 
            "lv_uuid": "1D9G0O-Kuce-AGgw-xEKr-rh4d-AKNh-uX6EYc", 
            "name": "osd00", 
            "path": "/dev/vg00/osd00", 
            "tags": {
                "ceph.block_device": "/dev/vg00/osd00", 
                "ceph.block_uuid": "1D9G0O-Kuce-AGgw-xEKr-rh4d-AKNh-uX6EYc", 
                "ceph.cephx_lockbox_secret": "", 
                "ceph.cluster_fsid": "f94e735a-cc71-4011-9450-1d573e41bb99", 
                "ceph.cluster_name": "ceph", 
                "ceph.crush_device_class": "None", 
                "ceph.encrypted": "0", 
                "ceph.osd_fsid": "edfed7a0-384b-46ad-89dd-17d75e664998", 
                "ceph.osd_id": "1", 
                "ceph.type": "block", 
                "ceph.vdo": "0"
            }, 
            "type": "block", 
            "vg_name": "vg00"
        }
    ]
}

There is a slight inconsistency in the entrypoint script - the underlying function is named osd_volume_activate, but the entrypoint scenario passed to the container is osd_ceph_volume_activate. Once I fixed that, the osd seems to have started properly:

[root@minio03 ceph]# ./ceph-osd.sh
+ docker run -d --name cephosd --net=host --restart always --pid=host --privileged=true -v /var/lib/ceph:/var/lib/ceph/ -v /etc/ceph:/etc/ceph -v /dev/:/dev/ -e OSD_ID=1 ceph/daemon osd_ceph_volume_activate
e5fa3cc8b9c8412ec44baf3d22dc2887d5ebcfb8687b436a54c88c37dc225e3a
[root@minio03 ceph]# docker logs -f cephosd
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1
Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-1
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/vg00/osd00 --path /var/lib/ceph/osd/ceph-1 --no-mon-config
Running command: /bin/ln -snf /dev/vg00/osd00 /var/lib/ceph/osd/ceph-1/block
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
--> ceph-volume lvm activate successful for osd ID: 1
2019-06-11 23:19:11  /opt/ceph-container/bin/entrypoint.sh: SUCCESS

   blah blah blah

2019-06-11 23:19:14.344 7ff28d5ed700 -1 osd.1 9 set_numa_affinity unable to identify public interface 'br0' numa node: (2) No such file or directory
2019-06-11 23:19:14.344 7ff28d5ed700  1 osd.1 9 set_numa_affinity not setting numa affinity
2019-06-11 23:19:15.531 7ff2835d9700  1 osd.1 10 state: booting -> active

so ceph-volume doesn't support block device ?

@dsavineau
Copy link
Contributor

so ceph-volume doesn't support block device ?

You can either use a raw/partition or lvm device.

http://docs.ceph.com/docs/nautilus/ceph-volume/lvm/prepare/#bluestore

@sychan
Copy link
Author

sychan commented Jun 12, 2019

The ceph-osd process seems to run, and there aren't any errors in the logs that I can see. When I bring up the ceph-mds daemon, it seems to come up without any errors.
But when I bring up the rgw daemon, it seems to come up cleanly, but spends 5 minutes pinging the ceph-osd service and then bails out with a initialization timeout failure.

[root@minio03 ceph]# docker logs -f cephrgw
2019-06-12 17:04:16  /opt/ceph-container/bin/entrypoint.sh: STAYALIVE: container will not die if a command fails.
2019-06-12 17:04:25  /opt/ceph-container/bin/entrypoint.sh: static: does not generate config
2019-06-12 17:04:25  /opt/ceph-container/bin/entrypoint.sh: SUCCESS
exec: PID 4230: spawning /usr/bin/radosgw --cluster ceph --setuser ceph --setgroup ceph -d -n client.rgw.minio03 -k /var/lib/ceph/radosgw/ceph-rgw.minio03/keyring
exec: Waiting 4230 to quit
2019-06-12 17:04:25.431 7f44c6507780  1  Processor -- start
2019-06-12 17:04:25.431 7f44c6507780  1 --  start start
2019-06-12 17:04:25.431 7f44c6507780  1 --2-  >> v2:140.221.43.121:3300/0 conn(0x55ff45d83c00 0x55ff46980b00 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).connect
2019-06-12 17:04:25.431 7f44c6507780  1 --  --> v1:140.221.43.121:6789/0 -- auth(proto 0 40 bytes epoch 0) v1 -- 0x55ff45cb9680 con 0x55ff46988400
2019-06-12 17:04:25.431 7f44c6507780  1 --  --> v2:140.221.43.121:3300/0 -- mon_getmap magic: 0 v1 -- 0x55ff45cd4540 con 0x55ff45d83c00
2019-06-12 17:04:25.432 7f44b33c4700  1 --2- 140.221.43.121:0/3739250201 >> v2:140.221.43.121:3300/0 conn(0x55ff45d83c00 0x55ff46980b00 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2019-06-12 17:04:25.432 7f44b3bc5700  1 -- 140.221.43.121:0/3739250201 learned_addr learned my addr 140.221.43.121:0/3739250201 (peer_addr_for_me v1:140.221.43.121:33080/0)
2019-06-12 17:04:25.432 7f44b2bc3700  1 -- 140.221.43.121:0/3739250201 <== mon.0 v1:140.221.43.121:6789/0 1 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (unknown 3578549811 0 0) 0x55ff45cb9680 con 0x55ff46988400
2019-06-12 17:04:25.433 7f44b33c4700  1 -- 140.221.43.121:0/3739250201 >> v1:140.221.43.121:6789/0 conn(0x55ff46988400 legacy=0x55ff469c1000 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2019-06-12 17:04:25.433 7f44b33c4700  1 -- 140.221.43.121:0/3739250201 --> v2:140.221.43.121:3300/0 -- mon_subscribe({config=0+,monmap=0+}) v3 -- 0x55ff45cd6780 con 0x55ff45d83c00
2019-06-12 17:04:25.433 7f44b33c4700  1 --2- 140.221.43.121:0/3739250201 >> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] conn(0x55ff45d83c00 0x55ff46980b00 crc :-1 s=READY pgs=1288 cs=0 l=1 rx=0 tx=0).ready entity=mon.0 client_cookie=c4849ea92b89b1eb server_cookie=0 in_seq=0 out_seq=0
2019-06-12 17:04:25.433 7f44b2bc3700  1 -- 140.221.43.121:0/3739250201 <== mon.0 v2:140.221.43.121:3300/0 1 ==== mon_map magic: 0 v1 ==== 431+0+0 (crc 0 0 0) 0x55ff46a58400 con 0x55ff45d83c00
2019-06-12 17:04:25.433 7f44b2bc3700  1 -- 140.221.43.121:0/3739250201 <== mon.0 v2:140.221.43.121:3300/0 2 ==== config(0 keys) v1 ==== 4+0+0 (crc 0 0 0) 0x55ff45cd6780 con 0x55ff45d83c00
2019-06-12 17:04:25.434 7f44b2bc3700  1 -- 140.221.43.121:0/3739250201 <== mon.0 v2:140.221.43.121:3300/0 3 ==== mon_map magic: 0 v1 ==== 431+0+0 (crc 0 0 0) 0x55ff46a58a00 con 0x55ff45d83c00
2019-06-12 17:04:25.434 7f44c6507780  1 -- 140.221.43.121:0/3739250201 >> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] conn(0x55ff45d83c00 msgr2=0x55ff46980b00 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2019-06-12 17:04:25.434 7f44c6507780  1 --2- 140.221.43.121:0/3739250201 >> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] conn(0x55ff45d83c00 0x55ff46980b00 crc :-1 s=READY pgs=1288 cs=0 l=1 rx=0 tx=0).stop
2019-06-12 17:04:25.436 7f44c6507780  1 -- 140.221.43.121:0/3739250201 shutdown_connections 
2019-06-12 17:04:25.436 7f44c6507780  1 --2- 140.221.43.121:0/3739250201 >> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] conn(0x55ff45d83c00 0x55ff46980b00 unknown :-1 s=CLOSED pgs=1288 cs=0 l=1 rx=0 tx=0).stop
2019-06-12 17:04:25.436 7f44c6507780  1 -- 140.221.43.121:0/3739250201 shutdown_connections 
2019-06-12 17:04:25.436 7f44c6507780  1 -- 140.221.43.121:0/3739250201 wait complete.
2019-06-12 17:04:25.436 7f44c6507780  1 -- 140.221.43.121:0/3739250201 >> 140.221.43.121:0/3739250201 conn(0x55ff45d83400 msgr2=0x55ff469c0800 unknown :-1 s=STATE_NONE l=0).mark_down
2019-06-12 17:04:25.437 7f44c6507780  0 framework: beast
2019-06-12 17:04:25.437 7f44c6507780  0 framework conf key: port, val: 7480
2019-06-12 17:04:25.438 7f44c6507780  0 deferred set uid:gid to 167:167 (ceph:ceph)
2019-06-12 17:04:25.438 7f44c6507780  0 ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process radosgw, pid 4230
2019-06-12 17:04:25.474 7f44af934700 20 reqs_thread_entry: start
2019-06-12 17:04:25.475 7f44c6507780  1  Processor -- start
2019-06-12 17:04:25.475 7f44c6507780  1 --  start start
2019-06-12 17:04:25.475 7f44c6507780  1 --2-  >> v2:140.221.43.121:3300/0 conn(0x55ff46988400 0x55ff46980580 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).connect
2019-06-12 17:04:25.475 7f44c6507780  1 --  --> v1:140.221.43.121:6789/0 -- auth(proto 0 40 bytes epoch 0) v1 -- 0x55ff45cb98c0 con 0x55ff45d83c00
2019-06-12 17:04:25.475 7f44c6507780  1 --  --> v2:140.221.43.121:3300/0 -- mon_getmap magic: 0 v1 -- 0x55ff45cd4700 con 0x55ff46988400
2019-06-12 17:04:25.475 7f44b3bc5700  1 --2-  >> v2:140.221.43.121:3300/0 conn(0x55ff46988400 0x55ff46980580 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2019-06-12 17:04:25.476 7f44b33c4700  1 -- 140.221.43.121:0/1824513238 learned_addr learned my addr 140.221.43.121:0/1824513238 (peer_addr_for_me v1:140.221.43.121:33086/0)
2019-06-12 17:04:25.476 7f44b3bc5700  1 -- 140.221.43.121:0/1824513238 >> v1:140.221.43.121:6789/0 conn(0x55ff45d83c00 legacy=0x55ff469c1800 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2019-06-12 17:04:25.476 7f44b3bc5700  1 -- 140.221.43.121:0/1824513238 --> v2:140.221.43.121:3300/0 -- mon_subscribe({config=0+,monmap=0+}) v3 -- 0x55ff45cd6960 con 0x55ff46988400
2019-06-12 17:04:25.476 7f44b3bc5700  1 --2- 140.221.43.121:0/1824513238 >> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] conn(0x55ff46988400 0x55ff46980580 crc :-1 s=READY pgs=1290 cs=0 l=1 rx=0 tx=0).ready entity=mon.0 client_cookie=28fd169e5bbc1103 server_cookie=0 in_seq=0 out_seq=0
2019-06-12 17:04:25.477 7f44ae932700  1 -- 140.221.43.121:0/1824513238 <== mon.0 v2:140.221.43.121:3300/0 1 ==== mon_map magic: 0 v1 ==== 431+0+0 (crc 0 0 0) 0x55ff46a58200 con 0x55ff46988400
2019-06-12 17:04:25.477 7f44ae932700  1 -- 140.221.43.121:0/1824513238 <== mon.0 v2:140.221.43.121:3300/0 2 ==== config(0 keys) v1 ==== 4+0+0 (crc 0 0 0) 0x55ff45cd6960 con 0x55ff46988400
2019-06-12 17:04:25.477 7f44ae932700  1 -- 140.221.43.121:0/1824513238 <== mon.0 v2:140.221.43.121:3300/0 3 ==== mon_map magic: 0 v1 ==== 431+0+0 (crc 0 0 0) 0x55ff46af4400 con 0x55ff46988400
2019-06-12 17:04:25.477 7f44c6507780  1 -- 140.221.43.121:0/1824513238 >> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] conn(0x55ff46988400 msgr2=0x55ff46980580 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2019-06-12 17:04:25.477 7f44c6507780  1 --2- 140.221.43.121:0/1824513238 >> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] conn(0x55ff46988400 0x55ff46980580 crc :-1 s=READY pgs=1290 cs=0 l=1 rx=0 tx=0).stop
2019-06-12 17:04:25.479 7f44c6507780  1 -- 140.221.43.121:0/1824513238 shutdown_connections 
2019-06-12 17:04:25.479 7f44c6507780  1 --2- 140.221.43.121:0/1824513238 >> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] conn(0x55ff46988400 0x55ff46980580 unknown :-1 s=CLOSED pgs=1290 cs=0 l=1 rx=0 tx=0).stop
2019-06-12 17:04:25.479 7f44c6507780  1 -- 140.221.43.121:0/1824513238 shutdown_connections 
2019-06-12 17:04:25.479 7f44c6507780  1 -- 140.221.43.121:0/1824513238 wait complete.
2019-06-12 17:04:25.479 7f44c6507780  1 -- 140.221.43.121:0/1824513238 >> 140.221.43.121:0/1824513238 conn(0x55ff46988800 msgr2=0x55ff45d0f000 unknown :-1 s=STATE_NONE l=0).mark_down
2019-06-12 17:04:25.479 7f44c6507780  1  Processor -- start
2019-06-12 17:04:25.479 7f44c6507780  1 --  start start
2019-06-12 17:04:25.480 7f44c6507780  1 --2-  >> v2:140.221.43.121:3300/0 conn(0x55ff45d83c00 0x55ff46980580 unknown :-1 s=NONE pgs=0 cs=0 l=1 rx=0 tx=0).connect
2019-06-12 17:04:25.480 7f44c6507780  1 --  --> v1:140.221.43.121:6789/0 -- auth(proto 0 40 bytes epoch 0) v1 -- 0x55ff45cb9d40 con 0x55ff46989800
2019-06-12 17:04:25.480 7f44c6507780  1 --  --> v2:140.221.43.121:3300/0 -- mon_getmap magic: 0 v1 -- 0x55ff45cd48c0 con 0x55ff45d83c00
2019-06-12 17:04:25.480 7f44b3bc5700  1 --2- 140.221.43.121:0/3373164235 >> v2:140.221.43.121:3300/0 conn(0x55ff45d83c00 0x55ff46980580 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2019-06-12 17:04:25.480 7f44b33c4700  1 -- 140.221.43.121:0/3373164235 learned_addr learned my addr 140.221.43.121:0/3373164235 (peer_addr_for_me v1:140.221.43.121:33088/0)
2019-06-12 17:04:25.481 7f44b3bc5700  1 -- 140.221.43.121:0/3373164235 >> v1:140.221.43.121:6789/0 conn(0x55ff46989800 legacy=0x55ff469c1000 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2019-06-12 17:04:25.481 7f44b3bc5700  1 -- 140.221.43.121:0/3373164235 --> v2:140.221.43.121:3300/0 -- mon_subscribe({config=0+,monmap=0+}) v3 -- 0x55ff45cd6780 con 0x55ff45d83c00
2019-06-12 17:04:25.481 7f44b3bc5700  1 --2- 140.221.43.121:0/3373164235 >> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] conn(0x55ff45d83c00 0x55ff46980580 crc :-1 s=READY pgs=1292 cs=0 l=1 rx=0 tx=0).ready entity=mon.0 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2019-06-12 17:04:25.481 7f44ae131700  1 -- 140.221.43.121:0/3373164235 <== mon.0 v2:140.221.43.121:3300/0 1 ==== mon_map magic: 0 v1 ==== 431+0+0 (crc 0 0 0) 0x55ff46af4200 con 0x55ff45d83c00
2019-06-12 17:04:25.482 7f44ae131700  1 -- 140.221.43.121:0/3373164235 <== mon.0 v2:140.221.43.121:3300/0 2 ==== config(0 keys) v1 ==== 4+0+0 (crc 0 0 0) 0x55ff45cd6780 con 0x55ff45d83c00
2019-06-12 17:04:25.482 7f44ae131700  1 -- 140.221.43.121:0/3373164235 <== mon.0 v2:140.221.43.121:3300/0 3 ==== mon_map magic: 0 v1 ==== 431+0+0 (crc 0 0 0) 0x55ff46af4e00 con 0x55ff45d83c00
2019-06-12 17:04:25.482 7f44c6507780  1 -- 140.221.43.121:0/3373164235 --> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] -- mon_subscribe({mgrmap=0+}) v3 -- 0x55ff45cd6b40 con 0x55ff45d83c00
2019-06-12 17:04:25.482 7f44c6507780  1 -- 140.221.43.121:0/3373164235 --> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] -- mon_subscribe({osdmap=0}) v3 -- 0x55ff45cd6960 con 0x55ff45d83c00
2019-06-12 17:04:25.482 7f44c6507780 20 rados->read ofs=0 len=0
2019-06-12 17:04:25.484 7f44ae131700  1 -- 140.221.43.121:0/3373164235 <== mon.0 v2:140.221.43.121:3300/0 4 ==== mgrmap(e 7) v1 ==== 37735+0+0 (crc 0 0 0) 0x55ff45d342c0 con 0x55ff45d83c00
2019-06-12 17:04:25.484 7f44ae131700  1 --2- 140.221.43.121:0/3373164235 >> [v2:140.221.43.124:6800/95600,v1:140.221.43.124:6801/95600] conn(0x55ff45d83400 0x55ff46981b80 unknown :-1 s=NONE pgs=0 cs=0 l=1 rx=0 tx=0).connect
2019-06-12 17:04:25.484 7f44ae131700  1 -- 140.221.43.121:0/3373164235 <== mon.0 v2:140.221.43.121:3300/0 5 ==== osd_map(18..18 src has 1..18) v4 ==== 2520+0+0 (crc 0 0 0) 0x55ff45cfbb80 con 0x55ff45d83c00
2019-06-12 17:04:25.484 7f44c6507780  1 --2- 140.221.43.121:0/3373164235 >> [v2:140.221.43.121:6800/165386,v1:140.221.43.121:6801/165386] conn(0x55ff46989400 0x55ff46981080 unknown :-1 s=NONE pgs=0 cs=0 l=1 rx=0 tx=0).connect
2019-06-12 17:04:25.485 7f44c6507780  1 -- 140.221.43.121:0/3373164235 --> [v2:140.221.43.121:6800/165386,v1:140.221.43.121:6801/165386] -- osd_op(unknown.0.0:1 3.2 3:49953fa1:::default.realm:head [read 0~0] snapc 0=[] ondisk+read+known_if_redirected e18) v8 -- 0x55ff45d34dc0 con 0x55ff46989400
2019-06-12 17:04:25.485 7f44b33c4700  1 --2- 140.221.43.121:0/3373164235 >> [v2:140.221.43.124:6800/95600,v1:140.221.43.124:6801/95600] conn(0x55ff45d83400 0x55ff46981b80 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2019-06-12 17:04:25.485 7f44b43c6700  1 --2- 140.221.43.121:0/3373164235 >> [v2:140.221.43.121:6800/165386,v1:140.221.43.121:6801/165386] conn(0x55ff46989400 0x55ff46981080 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2019-06-12 17:04:25.486 7f44b43c6700  1 --2- 140.221.43.121:0/3373164235 >> [v2:140.221.43.121:6800/165386,v1:140.221.43.121:6801/165386] conn(0x55ff46989400 0x55ff46981080 crc :-1 s=READY pgs=331 cs=0 l=1 rx=0 tx=0).ready entity=osd.1 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2019-06-12 17:04:25.486 7f44b33c4700  1 --2- 140.221.43.121:0/3373164235 >> [v2:140.221.43.124:6800/95600,v1:140.221.43.124:6801/95600] conn(0x55ff45d83400 0x55ff46981b80 crc :-1 s=READY pgs=6692 cs=0 l=1 rx=0 tx=0).ready entity=mgr.4149 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2019-06-12 17:04:25.486 7f44ae131700  1 -- 140.221.43.121:0/3373164235 <== osd.1 v2:140.221.43.121:6800/165386 1 ==== osd_backoff(3.2 block id 1 [3:40000000::::0,3:60000000::::head) e18) v1 ==== 115+0+0 (crc 0 0 0) 0x55ff46af26c0 con 0x55ff46989400
2019-06-12 17:04:25.486 7f44ae131700  1 -- 140.221.43.121:0/3373164235 --> [v2:140.221.43.121:6800/165386,v1:140.221.43.121:6801/165386] -- osd_backoff(3.2 ack-block id 1 [3:40000000::::0,3:60000000::::head) e18) v1 -- 0x55ff46af2900 con 0x55ff46989400
2019-06-12 17:04:40.483 7f44ae932700  1 -- 140.221.43.121:0/3373164235 --> [v2:140.221.43.121:3300/0,v1:140.221.43.121:6789/0] -- mon_subscribe({osdmap=19}) v3 -- 0x55ff45cd6b40 con 0x55ff45d83c00
2019-06-12 17:04:40.483 7f44ae932700  1 -- 140.221.43.121:0/3373164235 --> [v2:140.221.43.121:6800/165386,v1:140.221.43.121:6801/165386] -- ping magic: 0 v1 -- 0x55ff45cd4700 con 0x55ff46989400
2019-06-12 17:04:45.483 7f44ae932700  1 -- 140.221.43.121:0/3373164235 --> [v2:140.221.43.121:6800/165386,v1:140.221.43.121:6801/165386] -- ping magic: 0 v1 -- 0x55ff45cd48c0 con 0x55ff46989400
2019-06-12 17:04:50.484 7f44ae932700  1 -- 140.221.43.121:0/3373164235 --> [v2:140.221.43.121:6800/165386,v1:140.221.43.121:6801/165386] -- ping magic: 0 v1 -- 0x55ff45cd4a80 con 0x55ff46989400

    [ 5 minutes worth of ping logs deleted ]

2019-06-12 17:09:20.497 7f44ae932700  1 -- 140.221.43.121:0/3373164235 --> [v2:140.221.43.121:6800/165386,v1:140.221.43.121:6801/165386] -- ping magic: 0 v1 -- 0x55ff45cd5500 con 0x55ff46989400
2019-06-12 17:09:25.438 7f44b23c2700 -1 Initialization timeout, failed to initialize
2019-06-12 17:09:25.497 7f44ae932700  1 -- 140.221.43.121:0/3373164235 --> [v2:140.221.43.121:6800/165386,v1:140.221.43.121:6801/165386] -- ping magic: 0 v1 -- 0x55ff45cd5340 con 0x55ff46989400
teardown: managing teardown after SIGCHLD
teardown: Waiting PID 4230 to terminate 
teardown: Process 4230 is terminated
An issue occured and you asked me to stay alive.
You can connect to me with: sudo docker exec -i -t  /bin/bash
The current environment variables will be reloaded by this bash to be in a similar context.
When debugging is over stop me with: pkill sleep
I'll sleep endlessly waiting for you darling, bye bye
^C
[root@minio03 ceph]# ss -ap | grep 680 | grep LISTEN
u_str  LISTEN     0      100    public/pickup 87250                 * 0                     users:(("pickup",pid=156801,fd=6),("master",pid=18217,fd=18))
tcp    LISTEN     0      128    140.221.43.121:6800                  *:*                     users:(("ceph-osd",pid=165386,fd=16))
tcp    LISTEN     0      128    140.221.43.121:6802                  *:*                     users:(("ceph-osd",pid=165386,fd=18))
tcp    LISTEN     0      128    140.221.43.121:6803                  *:*                     users:(("ceph-osd",pid=165386,fd=19))
tcp    LISTEN     0      128    140.221.43.121:6804                  *:*                     users:(("ceph-osd",pid=165386,fd=20))
tcp    LISTEN     0      128    140.221.43.121:6805                  *:*                     users:(("ceph-osd",pid=165386,fd=21))
tcp    LISTEN     0      128    140.221.43.121:6806                  *:*                     users:(("ceph-osd",pid=165386,fd=22))
tcp    LISTEN     0      128    140.221.43.121:6807                  *:*                     users:(("ceph-osd",pid=165386,fd=23))
tcp    LISTEN     0      128    140.221.43.121:6808                  *:*                     users:(("ceph-mds",pid=172139,fd=16))
tcp    LISTEN     0      128    140.221.43.121:6809                  *:*                     users:(("ceph-mds",pid=172139,fd=17))
[root@minio03 ceph]# 

Is there some configuration of the osd or rgw that needs to occur which I've missed?

@wiryonolau
Copy link

Since I'm testing I've been using Virtualbox, my host is Ubuntu 18.04. Does this cause the stuck issue ? Disk is using vmdk , I increase it to 8GB.

Does ceph-volume has debug mode ?

[2019-06-12 22:15:43,370][ceph_volume.process][INFO  ] stdout ID_SERIAL_SHORT=VBc3c9780e-066c380f
[2019-06-12 22:15:43,370][ceph_volume.process][INFO  ] stdout ID_TYPE=disk
[2019-06-12 22:15:43,370][ceph_volume.process][INFO  ] stdout MAJOR=8
[2019-06-12 22:15:43,370][ceph_volume.process][INFO  ] stdout MINOR=16
[2019-06-12 22:15:43,370][ceph_volume.process][INFO  ] stdout SUBSYSTEM=block
[2019-06-12 22:15:43,370][ceph_volume.process][INFO  ] stdout TAGS=:systemd:
[2019-06-12 22:15:43,370][ceph_volume.process][INFO  ] stdout USEC_INITIALIZED=79509
[2019-06-12 22:15:43,371][ceph_volume.process][INFO  ] Running command: /bin/ceph-authtool --gen-print-key
[2019-06-12 22:15:48,724][ceph_volume.process][INFO  ] stdout AQCPeQFdHT7XKhAAC31kttToTro1qMXNd6TsbA==
[2019-06-12 22:15:48,727][ceph_volume.process][INFO  ] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 1194f9a0-da6b-42ee-b6a3-353f385b592c

# No more output after this

@sychan
Copy link
Author

sychan commented Jun 13, 2019

I resolved the timeout issue - it seems to be related to the fact that the mds cephfs was stuck in up:creating state. Once I resolved that, the rgw came up. At this point, I think I'm good, but wiryonolau seems to still need resolution of his issues.

For what its worth, he's my docker command for ceph-volume prepare

docker run --rm --name cephvolprep --net=host --ipc=host --privileged=true \
-v /var/lib/ceph:/var/lib/ceph/ \
-v /var/log/ceph:/var/log/ceph/ \
-v /etc/ceph:/etc/ceph \
-v /dev/:/dev/ \
-v /run/lock/lvm:/run/lock/lvm:z \
-v /run/lvm/:/run/lvm/ \
-v /var/run/udev/:/var/run/udev/:z \
--entrypoint ceph-volume \
ceph/daemon \
lvm prepare --bluestore --data /dev/vg00/osd00

I used vgcreate on 2 partitions to create vg00 and then used lvm create the logical volume passed to ceph-volume.

@wiryonolau
Copy link

I resolved the timeout issue - it seems to be related to the fact that the mds cephfs was stuck in up:creating state. Once I resolved that, the rgw came up. At this point, I think I'm good, but wiryonolau seems to still need resolution of his issues.

For what its worth, he's my docker command for ceph-volume prepare

docker run --rm --name cephvolprep --net=host --ipc=host --privileged=true \
-v /var/lib/ceph:/var/lib/ceph/ \
-v /var/log/ceph:/var/log/ceph/ \
-v /etc/ceph:/etc/ceph \
-v /dev/:/dev/ \
-v /run/lock/lvm:/run/lock/lvm:z \
-v /run/lvm/:/run/lvm/ \
-v /var/run/udev/:/var/run/udev/:z \
--entrypoint ceph-volume \
ceph/daemon \
lvm prepare --bluestore --data /dev/vg00/osd00

I used vgcreate on 2 partitions to create vg00 and then used lvm create the logical volume passed to ceph-volume.

Hi sychan, have you ever try preparing using raw device directly ?

@wiryonolau
Copy link

wiryonolau commented Jun 13, 2019

This is my configuration

ceph.conf

[global]
fsid = 21934058-a989-498e-840e-d4ce729b7cf8
mon initial members = mon1
mon host = 192.168.20.32
public network = 192.168.20.0/24
cluster network = 192.168.21.0/24
osd journal size = 100
log file = /dev/null
osd_memory_target = 642170880
osd_memory_base = 347299840
osd_memory_cache_min = 494735360

ceph.mon.keyring

[mon.]
	key = AQCWwgFdh7p/NhAAkL+9rwA+g6bHnCaY8CQ5gw==
	caps mon = "allow *"
[client.admin]
	key = AQCRwgFduBGuMhAADG7tBbI1KE9IkPbq9GzNNA==
	caps mds = "allow *"
	caps mgr = "allow *"
	caps mon = "allow *"
	caps osd = "allow *"

ceph.client.admin.keyring

[client.admin]
	key = AQCRwgFduBGuMhAADG7tBbI1KE9IkPbq9GzNNA==
	caps mds = "allow *"
	caps mgr = "allow *"
	caps mon = "allow *"
	caps osd = "allow *"

I test by creating logical volume first instead of using raw device

pvcreate /dev/sdb
vgcreate vg00 /dev/sdb
lvcreate -n osd_sdb -l 100%FREE vg00

#lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0    8G  0 disk 
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0    7G  0 part 
  ├─centos-root 253:0    0  6.2G  0 lvm  /
  └─centos-swap 253:1    0  820M  0 lvm  [SWAP]
sdb               8:16   0    8G  0 disk 
└─vg00-osd_sdb  253:2    0    8G  0 lvm  
sr0              11:0    1 1024M  0 rom  

Then prepare the container

docker run \
    --rm \
    --privileged=true \
    --network=ceph_private \
    -v /var/lib/ceph:/var/lib/ceph/ \
    -v /var/log/ceph:/var/log/ceph/ \
    -v /etc/ceph:/etc/ceph \
    -v /dev/:/dev/ \
    -v /run/lock/lvm:/run/lock/lvm:z \
    -v /run/lvm/:/run/lvm/ \
    -v /var/run/udev/:/var/run/udev/:z \
    --entrypoint ceph-volume \
ceph/daemon:v4.0.0-stable-4.0-nautilus-centos-7-x86_64 \
lvm prepare --bluestore --data /dev/vg00/osd_sdb --no-systemd

This is the latest log in /var/log/ceph/ceph-volume.log

[2019-06-13 03:17:10,420][ceph_volume.main][INFO  ] Running command: ceph-volume  lvm prepare --bluestore --data /dev/vg00/osd_sdb
[2019-06-13 03:17:10,466][ceph_volume.process][INFO  ] Running command: /usr/sbin/dmsetup splitname --noheadings --separator=';' --nameprefixes /dev/mapper/vg00-osd_sdb
[2019-06-13 03:17:10,749][ceph_volume.process][INFO  ] stdout DM_VG_NAME='/dev/mapper/vg00'';'DM_LV_NAME='osd_sdb'';'DM_LV_LAYER=''
[2019-06-13 03:17:10,750][ceph_volume.process][INFO  ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2019-06-13 03:17:11,049][ceph_volume.process][INFO  ] stdout ";"/dev/centos/root";"root";"centos";"8r0Ajm-Luiz-i2oh-zJVK-ieqD-DM5o-jqPoiY";"<6.20g
[2019-06-13 03:17:11,049][ceph_volume.process][INFO  ] stdout ";"/dev/centos/swap";"swap";"centos";"teX3SE-DziB-aUyh-J0Pc-NVRO-zu2p-r7ibIQ";"820.00m
[2019-06-13 03:17:11,049][ceph_volume.process][INFO  ] stdout ";"/dev/vg00/osd_sdb";"osd_sdb";"vg00";"Ao06Q3-DvE8-VqgK-9vsm-u7kd-zsXu-os0qmD";"<8.00g
[2019-06-13 03:17:11,049][ceph_volume.process][INFO  ] Running command: /usr/sbin/dmsetup splitname --noheadings --separator=';' --nameprefixes /dev/mapper/centos-root
[2019-06-13 03:17:11,332][ceph_volume.process][INFO  ] stdout DM_VG_NAME='/dev/mapper/centos'';'DM_LV_NAME='root'';'DM_LV_LAYER=''
[2019-06-13 03:17:11,332][ceph_volume.process][INFO  ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2019-06-13 03:17:11,631][ceph_volume.process][INFO  ] stdout ";"/dev/centos/root";"root";"centos";"8r0Ajm-Luiz-i2oh-zJVK-ieqD-DM5o-jqPoiY";"<6.20g
[2019-06-13 03:17:11,631][ceph_volume.process][INFO  ] stdout ";"/dev/centos/swap";"swap";"centos";"teX3SE-DziB-aUyh-J0Pc-NVRO-zu2p-r7ibIQ";"820.00m
[2019-06-13 03:17:11,631][ceph_volume.process][INFO  ] stdout ";"/dev/vg00/osd_sdb";"osd_sdb";"vg00";"Ao06Q3-DvE8-VqgK-9vsm-u7kd-zsXu-os0qmD";"<8.00g
[2019-06-13 03:17:11,632][ceph_volume.process][INFO  ] Running command: /usr/sbin/dmsetup splitname --noheadings --separator=';' --nameprefixes /dev/mapper/centos-swap
[2019-06-13 03:17:11,917][ceph_volume.process][INFO  ] stdout DM_VG_NAME='/dev/mapper/centos'';'DM_LV_NAME='swap'';'DM_LV_LAYER=''
[2019-06-13 03:17:11,917][ceph_volume.process][INFO  ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2019-06-13 03:17:12,217][ceph_volume.process][INFO  ] stdout ";"/dev/centos/root";"root";"centos";"8r0Ajm-Luiz-i2oh-zJVK-ieqD-DM5o-jqPoiY";"<6.20g
[2019-06-13 03:17:12,217][ceph_volume.process][INFO  ] stdout ";"/dev/centos/swap";"swap";"centos";"teX3SE-DziB-aUyh-J0Pc-NVRO-zu2p-r7ibIQ";"820.00m
[2019-06-13 03:17:12,217][ceph_volume.process][INFO  ] stdout ";"/dev/vg00/osd_sdb";"osd_sdb";"vg00";"Ao06Q3-DvE8-VqgK-9vsm-u7kd-zsXu-os0qmD";"<8.00g
[2019-06-13 03:17:12,218][ceph_volume.process][INFO  ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2019-06-13 03:17:12,513][ceph_volume.process][INFO  ] stdout ";"/dev/centos/root";"root";"centos";"8r0Ajm-Luiz-i2oh-zJVK-ieqD-DM5o-jqPoiY";"<6.20g
[2019-06-13 03:17:12,513][ceph_volume.process][INFO  ] stdout ";"/dev/centos/swap";"swap";"centos";"teX3SE-DziB-aUyh-J0Pc-NVRO-zu2p-r7ibIQ";"820.00m
[2019-06-13 03:17:12,513][ceph_volume.process][INFO  ] stdout ";"/dev/vg00/osd_sdb";"osd_sdb";"vg00";"Ao06Q3-DvE8-VqgK-9vsm-u7kd-zsXu-os0qmD";"<8.00g
[2019-06-13 03:17:12,514][ceph_volume.process][INFO  ] Running command: /bin/udevadm info --query=property /dev/vg00/osd_sdb
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DEVLINKS=/dev/disk/by-id/dm-name-vg00-osd_sdb /dev/disk/by-id/dm-uuid-LVM-kArlh4zXPPWV93IoDED81jrTTlXhzetHAo06Q3DvE8VqgK9vsmu7kdzsXuos0qmD /dev/mapper/vg00-osd_sdb /dev/vg00/osd_sdb
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DEVNAME=/dev/dm-2
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DEVPATH=/devices/virtual/block/dm-2
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DEVTYPE=disk
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DM_LV_NAME=osd_sdb
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DM_NAME=vg00-osd_sdb
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DM_SUSPENDED=0
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DM_UDEV_PRIMARY_SOURCE_FLAG=1
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DM_UDEV_RULES_VSN=2
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DM_UUID=LVM-kArlh4zXPPWV93IoDED81jrTTlXhzetHAo06Q3DvE8VqgK9vsmu7kdzsXuos0qmD
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout DM_VG_NAME=vg00
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout MAJOR=253
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout MINOR=2
[2019-06-13 03:17:12,797][ceph_volume.process][INFO  ] stdout SUBSYSTEM=block
[2019-06-13 03:17:12,798][ceph_volume.process][INFO  ] stdout TAGS=:systemd:
[2019-06-13 03:17:12,798][ceph_volume.process][INFO  ] stdout USEC_INITIALIZED=5998142
[2019-06-13 03:17:12,798][ceph_volume.process][INFO  ] Running command: /bin/ceph-authtool --gen-print-key
[2019-06-13 03:17:18,115][ceph_volume.process][INFO  ] stdout AQA5wAFdueSEBhAAFAU4QFofUZT9qxWaDVn5xA==
[2019-06-13 03:17:18,116][ceph_volume.process][INFO  ] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new d7f2af93-182c-47d9-8535-62cd1b028033
###NO FURTHER OUTPUT AND OSD NOT PREPARED###

Shoud I open a new issue ?

@dsavineau
Copy link
Contributor

The OSD creation in the ceph-volume is fine but it seems to stuck when adding the OSD to the cluster so it's probably network related.
Can you contact the mon nodes from the osd nodes ?

Maybe you could try to change --network=ceph_private by --net=host

@wiryonolau
Copy link

The OSD creation in the ceph-volume is fine but it seems to stuck when adding the OSD to the cluster so it's probably network related.
Can you contact the mon nodes from the osd nodes ?

Maybe you could try to change --network=ceph_private by --net=host

Yep apparently it's network problem, I need to use ceph_public network though. ceph_private network is not usable even if it able to connect to ceph_mon. I got better error log now that mention keyring, so it should be easy to solve.

@jdobbs55
Copy link

jdobbs55 commented Dec 3, 2019

I've been trying the workaround above

docker run --rm --name cephvolprep --net=host --ipc=host --privileged=true \ -v /var/lib/ceph:/var/lib/ceph/ \ -v /var/log/ceph:/var/log/ceph/ \ -v /etc/ceph:/etc/ceph \ -v /dev/:/dev/ \ -v /run/lock/lvm:/run/lock/lvm:z \ -v /run/lvm/:/run/lvm/ \ -v /var/run/udev/:/var/run/udev/:z \ --entrypoint ceph-volume \ ceph/daemon \ lvm create --data /dev/sdb

The prepare is successful but when it goes into activate I see the following error

Running command: /bin/systemctl enable ceph-volume@lvm-1-0a157137-3391-4ea6-b364-db24732ca72c stderr: Operation failed: No such file or directory --> Was unable to complete a new OSD, will rollback changes Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.1 --yes-i-really-mean-it stderr: purged osd.1 --> RuntimeError: command returned non-zero exit status: 1

It's not clear to me what file is causing the error when it can't be found. I'm on a Cent OS 7 host machine.

@bloodstars
Copy link

solved, use this in nautilus docker, image is ceph/daemon:latest-nautilus-devel:

ceph-volume lvm create --data ${OSD_DEVICE} --no-systemd;
ceph-volume lvm list;
mv /var/lib/ceph/osd/ceph-${OSD_ID}/* /var/lib/ceph/osd/;
umount -l /var/lib/ceph/osd/ceph-${OSD_ID};
mv /var/lib/ceph/osd/* /var/lib/ceph/osd/ceph-${OSD_ID}/;
/usr/bin/ceph-osd -f -i ${OSD_ID};

@ghost
Copy link

ghost commented Mar 9, 2020

solved, use this in nautilus docker, image is ceph/daemon:latest-nautilus-devel:

ceph-volume lvm create --data ${OSD_DEVICE} --no-systemd;
ceph-volume lvm list;
mv /var/lib/ceph/osd/ceph-${OSD_ID}/* /var/lib/ceph/osd/;
umount -l /var/lib/ceph/osd/ceph-${OSD_ID};
mv /var/lib/ceph/osd/* /var/lib/ceph/osd/ceph-${OSD_ID}/;
/usr/bin/ceph-osd -f -i ${OSD_ID};

Hi !
There is always the problem in the latest build.
Where do you put theses commands ?

Thanks !

@stale
Copy link

stale bot commented Apr 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Apr 8, 2020
@stale stale bot closed this as completed Apr 15, 2020
@poelzi
Copy link

poelzi commented May 24, 2020

The nautilus docker conainer are next to unusable. The help interface suggests osd-disk commands that do not exist. The Readme suggests commands that do not exist, existing documentation is for outdated releases. So far, a frustrating experience

@willzhang
Copy link

yep啦啦啦呜呜呜

@Conmi-WhiteJoker
Copy link

yep啦啦啦呜呜呜

哥们是不是打错了?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants