Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synology iSCSI docker plugin installation: invalid argument #368

Open
sethicis opened this issue Feb 20, 2024 · 14 comments
Open

Synology iSCSI docker plugin installation: invalid argument #368

sethicis opened this issue Feb 20, 2024 · 14 comments

Comments

@sethicis
Copy link

sethicis commented Feb 20, 2024

The issue

I wanted to try out the Synology iSCSI driver with a docker swarm setup, but when I try to build and install the logs say:

Handler for POST /v1.44/plugins/sethicis/swarm-csi-synology-iscsi:v1.8.4/enable returned error: dial unix /run/docker/plugins/483d665bf410ce519ceb7f5efa3b52ad8bd4afa07fbc25d553ea2c0e6fb60800/csi-synology-iscsi.sock: connect: invalid argument

I'm kind of stumped as to what is happening, but it seems like something is going wrong during the grpc startup.

How I'm building

I'm using a modified version of the script written by olljanat.

my config.json

{
    "description": "democratic-csi storage driver for synology iscsi",
    "interface": {
      "types": ["docker.csinode/1.0", "docker.csicontroller/1.0"],
      "socket": "csi-synology-iscsi.sock"
    },
    "network": {
      "type": "host"
    },
    "mounts": [
      {
        "description": "Used to access the dynamically attached block devices",
        "name": "dev",
        "source": "/dev/",
        "destination": "/dev",
        "type": "bind",
        "options": [
          "rbind",
          "rshared"
        ]  
      }
    ],
    "env": [
      {
        "name": "CSI_ENDPOINT",
        "description": "the CSI endpoint to listen to internally",
        "value": "unix:///run/docker/plugins/csi-synology-iscsi.sock"
      }
    ],
    "entrypoint": [
      "/home/csi/app/entrypoint.sh"
    ],
    "workdir": "/home/csi/app",
    "linux": {
      "capabilities": [
        "CAP_SYS_ADMIN",
        "CAP_CHOWN"
      ],
      "AllowAllDevices": true,
      "devices": null
    },
    "PropagatedMount": "/data/published"
  }

My build.sh

#!/bin/bash

USAGE="Usage: ./build.sh <Docker Hub Organization> <Democratic CSI version>"

if [ "$1" == "--help" ] || [ "$#" -lt "2" ]; then
	echo $USAGE
	exit 0
fi

ORG=$1
VERSION=$2

set -x

rm -rf rootfs
docker plugin disable csi-synology-iscsi:latest
docker plugin rm csi-synology-iscsi:latest
docker plugin disable $ORG/swarm-csi-synology-iscsi:v$VERSION
docker plugin rm $ORG/swarm-csi-synology-iscsi:v$VERSION
docker rm -vf iscsifsimage

docker create --name iscsifsimage docker.io/democraticcsi/democratic-csi:v$VERSION
mkdir -p rootfs
docker export iscsifsimage | tar -x -C rootfs
docker rm -vf iscsifsimage
mkdir -p rootfs/home/csi/app/config
cp entrypoint.sh rootfs/home/csi/app/
cp synology-iscsi.yaml rootfs/home/csi/app/config/

docker plugin create $ORG/swarm-csi-synology-iscsi:v$VERSION .
docker plugin enable $ORG/swarm-csi-synology-iscsi:v$VERSION
docker plugin push $ORG/swarm-csi-synology-iscsi:v$VERSION
docker plugin disable $ORG/swarm-csi-synology-iscsi:v$VERSION
docker plugin rm $ORG/swarm-csi-synology-iscsi:v$VERSION
docker plugin install --alias csi-synology-iscsi --grant-all-permissions $ORG/swarm-csi-synology-iscsi:v$VERSION

How I'm invoking the build.sh script:

./build.sh sethicis 1.8.4

My entrypoint.sh

#!/bin/sh

bin/democratic-csi \
  --driver-config-file=config/synology-iscsi.yaml \
  --log-level=debug \
  --server-socket=/run/docker/plugins/csi-synology-iscsi.sock \
  --csi-version=1.5.0 \
  --csi-name=csi-synology-iscsi \
  --server-socket-permissions-mode=0755

My driver config yaml

driver: synology-iscsi
httpConnection:
  protocol: http
  host: 192.168.1.3 # changed for security
  port: 5000
  username: sethicis
  password: '<redacted>'
  allowInsecure: true
  # should be uniqe across all installs to the same nas
  session: "democratic-csi"
  serialize: true

# Choose the DSM volume this driver operates on. The default value is /volume1.
synology:
  volume: /volume2

iscsi:
  targetPortal: "192.168.1.3" # changed for security
  # for multipath
  targetPortals: [] # [ "server[:port]", "server[:port]", ... ]
  # leave empty to omit usage of -I with iscsiadm
  interface: ""
  # can be whatever you would like
  baseiqn: "iqn.2000-01.com.synology:csi."

  # MUST ensure uniqueness
  # full iqn limit is 223 bytes, plan accordingly
  namePrefix: "swarm-cluster"
  nameSuffix: ""

  # documented below are several blocks
  # pick the option appropriate for you based on what your backing fs is and desired features
  # you do not need to alter dev_attribs under normal circumstances but they may be altered in advanced use-cases
  # These options can also be configured per storage-class:
  # See https://github.com/democratic-csi/democratic-csi/blob/master/docs/storage-class-parameters.md
  lunTemplate:
    # can be static value or handlebars template
    #description: "{{ parameters.[csi.storage.k8s.io/pvc/namespace] }}-{{ parameters.[csi.storage.k8s.io/pvc/name] }}"
    
    # btrfs thin provisioning
    type: "THIN"
    # tpws = Hardware-assisted zeroing
    # caw = Hardware-assisted locking
    # 3pc = Hardware-assisted data transfer
    # tpu = Space reclamation
    # can_snapshot = Snapshot
    #dev_attribs:
    #- dev_attrib: emulate_tpws
    #  enable: 1
    #- dev_attrib: emulate_caw
    #  enable: 1
    #- dev_attrib: emulate_3pc
    #  enable: 1
    #- dev_attrib: emulate_tpu
    #  enable: 0
    #- dev_attrib: can_snapshot
    #  enable: 1

    # btfs thick provisioning
    # only zeroing and locking supported
    #type: "BLUN_THICK"
    # tpws = Hardware-assisted zeroing
    # caw = Hardware-assisted locking
    #dev_attribs:
    #- dev_attrib: emulate_tpws
    #  enable: 1
    #- dev_attrib: emulate_caw
    #  enable: 1

    # ext4 thinn provisioning UI sends everything with enabled=0
    #type: "THIN"

    # ext4 thin with advanced legacy features set
    # can only alter tpu (all others are set as enabled=1)
    #type: "ADV"
    #dev_attribs:
    #- dev_attrib: emulate_tpu
    #  enable: 1

    # ext4 thick
    # can only alter caw
    #type: "FILE"
    #dev_attribs:
    #- dev_attrib: emulate_caw
    #  enable: 1

  lunSnapshotTemplate:
    is_locked: true
    # https://kb.synology.com/en-me/DSM/tutorial/What_is_file_system_consistent_snapshot
    is_app_consistent: true

  targetTemplate:
    auth_type: 0
    max_sessions: 0

Full log output when attempting to enable the plugin

Feb 19 23:06:30 plex-server dockerd[789]: time="2024-02-19T23:06:30.395109313-05:00" level=warning msg="reference for unknown type: application/vnd.docker.plugin.v1+json" spanID=41e78604a50927ed traceID=d4dad63f5c49ae63c05817fdece80f50
Feb 19 23:06:30 plex-server dockerd[789]: time="2024-02-19T23:06:30.395253748-05:00" level=warning msg="reference for unknown type: application/vnd.docker.plugin.v1+json" digest="sha256:c79dfaa3f368026f39b51e673e9f9ee265ba64f53565f0b821c32eb1ccd2bd69" mediatype=application/vnd.docker.plugin.v1+json size=1025 spanID=41e78604a50927ed traceID=d4dad63f5c49ae63c05817fdece80f50
Feb 19 23:06:56 plex-server dockerd[789]: time="2024-02-19T23:06:56-05:00" level=info msg="grpc implementation: @grpc/grpc-js" plugin=cc204250ea3d6a2a657429dff6ad5119a47996cb5b84851ab75ea3dc07051a11
Feb 19 23:06:56 plex-server dockerd[789]: time="2024-02-19T23:06:56-05:00" level=info msg="\x1b[32minfo\x1b[39m: initializing csi driver: synology-iscsi {\"timestamp\":\"2024-02-20T04:06:56.991Z\"}" plugin=cc204250ea3d6a2a657429dff6ad5119a47996cb5b84851ab75ea3dc07051a11
Feb 19 23:06:57 plex-server dockerd[789]: time="2024-02-19T23:06:57-05:00" level=info msg="\x1b[34mdebug\x1b[39m: setting default identity service caps {\"timestamp\":\"2024-02-20T04:06:57.388Z\"}" plugin=cc204250ea3d6a2a657429dff6ad5119a47996cb5b84851ab75ea3dc07051a11
Feb 19 23:06:57 plex-server dockerd[789]: time="2024-02-19T23:06:57-05:00" level=info msg="\x1b[34mdebug\x1b[39m: setting default identity volume_expansion caps {\"timestamp\":\"2024-02-20T04:06:57.388Z\"}" plugin=cc204250ea3d6a2a657429dff6ad5119a47996cb5b84851ab75ea3dc07051a11
Feb 19 23:06:57 plex-server dockerd[789]: time="2024-02-19T23:06:57-05:00" level=info msg="\x1b[34mdebug\x1b[39m: setting default controller caps {\"timestamp\":\"2024-02-20T04:06:57.388Z\"}" plugin=cc204250ea3d6a2a657429dff6ad5119a47996cb5b84851ab75ea3dc07051a11
Feb 19 23:06:57 plex-server dockerd[789]: time="2024-02-19T23:06:57-05:00" level=info msg="\x1b[34mdebug\x1b[39m: setting default node caps {\"timestamp\":\"2024-02-20T04:06:57.391Z\"}" plugin=cc204250ea3d6a2a657429dff6ad5119a47996cb5b84851ab75ea3dc07051a11
Feb 19 23:06:57 plex-server dockerd[789]: time="2024-02-19T23:06:57-05:00" level=info msg="\x1b[32minfo\x1b[39m: starting csi server - node version: v16.18.0, package version: 1.8.4, config file: /home/csi/app/config/synology-iscsi.yaml, csi-name: csi-synology-iscsi, csi-driver: synology-iscsi, csi-mode: controller,node, csi-version: 1.5.0, address: , socket: unix:///run/docker/plugins/csi-synology-iscsi.sock {\"timestamp\":\"2024-02-20T04:06:57.393Z\"}" plugin=cc204250ea3d6a2a657429dff6ad5119a47996cb5b84851ab75ea3dc07051a11
Feb 19 23:07:19 plex-server dockerd[789]: time="2024-02-19T23:07:19.247374729-05:00" level=info msg="ignoring event" container=cc204250ea3d6a2a657429dff6ad5119a47996cb5b84851ab75ea3dc07051a11 module=libcontainerd namespace=plugins.moby topic=/tasks/delete type="*events.TaskDelete"
Feb 19 23:07:19 plex-server dockerd[789]: time="2024-02-19T23:07:19.320348193-05:00" level=error msg="Handler for POST /v1.44/plugins/csi-synology-iscsi:latest/enable returned error: dial unix /run/docker/plugins/cc204250ea3d6a2a657429dff6ad5119a47996cb5b84851ab75ea3dc07051a11/csi-synology-iscsi.sock: connect: invalid argument"
Feb 19 23:07:23 plex-server dockerd[789]: time="2024-02-19T23:07:23.985488063-05:00" level=info msg="NetworkDB stats plex-server(923980edf7a5) - netID:gdoojbu21k3x2okfont9oekc3 leaving:false netPeers:3 entries:10 Queue qLen:0 netMsg/s:0"
Feb 19 23:07:24 plex-server dockerd[789]: time="2024-02-19T23:07:23.985663986-05:00" level=info msg="NetworkDB stats plex-server(923980edf7a5) - netID:x5zvh5l98120gyxwfz7g0kbt6 leaving:false netPeers:3 entries:9 Queue qLen:0 netMsg/s:0"

Platform Info

I'm running this on Ubuntu 22.04.3 with Docker v25.0.1.

@sethicis
Copy link
Author

I've never attempted to build a docker plugin myself, so its entirely possible that I'm just misconfiguring something.

@travisghansen
Copy link
Member

You are brave! I am not much help here but maybe @olljanat can provide some tips.

@olljanat
Copy link

iSCSI implementation in Linux is a bit tricky. I have used only this legacy type of Docker plugin with iSCSI. It is not open source but there is something I know based on it.

Check these prerequirements first:

  • open-iscsi package is installed from your distro package manager.
  • iscsid process is running
  • You have properly configured server end.
    • Run discovery command iscsiadm --mode discovery -t sendtargets --portal 192.168.1.3 to verify.
    • Depending if your tests managed to create lun already it might be needed to create one manually first.

Then update config.json to this so you have all needed capabilities and mounts in-place:

{
    "description": "democratic-csi storage driver for synology iscsi",
    "entrypoint": [
        "/home/csi/app/entrypoint.sh"
    ],
    "env": [
        {
            "description": "the CSI endpoint to listen to internally",
            "name": "CSI_ENDPOINT",
            "value": "unix:///run/docker/plugins/csi-synology-iscsi.sock"
        }
    ],
    "interface": {
        "socket": "csi-synology-iscsi.sock",
        "types": [
            "docker.csinode/1.0",
            "docker.csicontroller/1.0"
        ]
    },
    "linux": {
        "AllowAllDevices": true,
        "capabilities": [
            "CAP_SYS_ADMIN",
            "CAP_CHOWN",
            "CAP_SYS_PTRACE",
            "CAP_IPC_LOCK",
            "CAP_IPC_OWNER",
            "CAP_NET_ADMIN",
            "CAP_MKNOD",
            "CAP_SYS_MODULE"
        ],
        "devices": null
    },
    "mounts": [
        {
            "description": "Used to access the dynamically attached block devices",
            "destination": "/dev",
            "name": "dev",
            "options": [
                "rbind",
                "rshared"
            ],
            "source": "/dev/",
            "type": "bind"
        },
        {
            "destination": "/etc/iscsi",
            "name": "/etc/iscsi",
            "options": [
                "bind"
            ],
            "source": "/etc/iscsi",
            "type": "bind"
        },
        {
            "destination": "/lib/modules",
            "name": "/lib/modules",
            "options": [
                "bind"
            ],
            "source": "/lib/modules",
            "type": "bind"
        },
        {
            "destination": "/sbin/iscsiadm",
            "name": "/sbin/iscsiadm",
            "options": [
                "bind"
            ],
            "source": "/sbin/iscsiadm",
            "type": "bind"
        },
        {
            "destination": "/host/proc",
            "name": "/proc",
            "options": [
                "bind"
            ],
            "source": "/proc",
            "type": "bind"
        }
    ],
    "network": {
        "type": "host"
    },
    "PropagatedMount": "/data/published",
    "workdir": "/home/csi/app"
}

@travisghansen
Copy link
Member

Ah, for iscsi I assume you bind mount / to /host and I handle the iscsiadm command using chroot via a wrapper. So you do need the daemon running but do not need to bind mount individual binaries or run the discover command manually.

@olljanat
Copy link

Manual discovery of course is just test and it is bit tricky to troubleshoot these plugins.
But sure you can handle iscsiadm inside of container too. Depending how it is implemented. Nutanix most probably just used that method to avoid need update their plugin in case there ever is breaking changes in open-iscsi which needs update for both daemon and client.

@sethicis
Copy link
Author

@olljanat , I appreciate the reply -- I don't want you think I'm ghosting you. I haven't had a chance to try your suggestion yet; I should sometime tomorrow evening to try it out and report back.

@sethicis
Copy link
Author

sethicis commented Feb 22, 2024

@olljanat @travisghansen , I have some unfortunate news. The above suggestions didn't seem to help.

At first I added all the individual mounts like what @olljanat listed above and tried to build that, which resulted in the same dial unix: invalid argument error, but then I realized that he was just recounting how he has gotten iscsi to work before for the legacy docker plugin. Derp

I then followed what @travisghansen, said about how he configured it to perform a chroot based on the entire host file system mounted to /host. (per the suggestion in this article).

That too resulted in the same dial unix: invalid argument error.

Here's my current config.json:

{
  "description": "democratic-csi storage driver for synology iscsi",
  "entrypoint": [
    "/home/csi/app/entrypoint.sh"
  ],
  "env": [
    {
      "name": "CSI_ENDPOINT",
      "description": "the CSI endpoint to listen to internally",
      "value": "unix:///run/docker/plugins/csi-synology-iscsi.sock"
    }
  ],
  "interface": {
    "types": ["docker.csinode/1.0", "docker.csicontroller/1.0"],
    "socket": "csi-synology-iscsi.sock"
  },
  "network": {
    "type": "host"
  },
  "linux": {
    "capabilities": [
      "CAP_SYS_ADMIN",
      "CAP_CHOWN",
      "CAP_SYS_PTRACE",
      "CAP_IPC_LOCK",
      "CAP_IPC_OWNER",
      "CAP_NET_ADMIN",
      "CAP_MKNOD",
      "CAP_SYS_MODULE"
    ],
    "AllowAllDevices": true,
    "devices": null
  },
  "mounts": [
    {
      "description": "entire filesystem mounted for chroot access",
      "name": "/host",
      "source": "/",
      "destination": "/host",
      "options": [
          "bind"
        ],
      "type": "bind"
    }
  ],
  "workdir": "/home/csi/app",
  "PropagatedMount": "/data/published"
}

I used a modified version of the debug.sh script that @olljanat wrote originally for the host localpath example to verify the entrypoint, but it doesn't shed much additional light.

I do not get the dial unix error when running the debug.sh, but that's probably because there's no actual plugin sock in the debug environment.

My suspicion is that the error is happening when grpc tries to connect to the plugin sock file, but I'm not exactly sure what is going wrong.

Since my previous configuration, the config change I did based on my misunderstanding of what @olljanat meant, and my configuration to match how @travisghansen setup the Dockerfile all result in the same error I wonder if the current issue is happening here.

Do either of you have any suggestions on how I can approach debugging the sock file?

@olljanat
Copy link

dial unix is just generic error which means that CSI plugin either crashed or returned error because prerequirements are not in-place.

So did you tested that iscsi discovery works?
First from host and would be good to test inside of CSI container too https://docs.docker.com/engine/extend/#debugging-plugins (for that you need use example my host path config together with those extra capabilities because it must be running).

Then enable debug logging to Docker daemon and CSI plugin. Also check if you get something to dmesg.

And if there is no hint on those, then only options is add more debug logging to code.

@sethicis
Copy link
Author

sethicis commented Feb 22, 2024

@olljanat, yes, sorry -- I should have listed that in my post. I did verify that iscsid and iscsiadm are working as expected. I was able to see my Synology's iSCSI targets.

I tried enabling the GRPC_VERBOSITY=debug and GRPC_TRACE=call_stream, but I saw no change in my tail of the dockerd logs when attempting to build the plugin.

I did some additional debugging after that post using the same article on plugin debugging that you posted. What I discovered is that the democratic-csi sock is not responding to curl calls; while weave, another plugin I have installed, responds to NetworkDriver.GetCapabilities, and returns a 404 for driver endpoints that it doesn't implement (like VolumeDriver.List).

My current hypothesis is that the grpc-js library is not actually setting up the listener properly (maybe a missing dependency?). When I have some more time I'm going to add my own explicit debug endpoint to the server definition and see if I can hit that when I'm running the plugin as a debug docker container.

dial unix is just generic error which means that CSI plugin either crashed or returned error because prerequirements are not in-place.

Yeah, I kind of suspected that because I commented out the bin/democratic-csi invocation in the entrypoint.sh just to see if I could get a different error, but got the exact same dial unix: connect: invalid agrument message.

@derekpovah
Copy link

This is interesting since I am working on a Truenas version of this and I can enable it just fine. The only real difference I see is I'm pushing to a private registry that's running on my laptop, but I can't imagine that's significant. Just for fun, I tried to build and run the Synology version, and I'm seeing the same errors. Does the plugin need to make a connection with the actual Truenas/Synology hardware to initialize properly?

I haven't had the time to do a deep dive on the inner workings of democratic-csi yet, but I may have time in the near future.

@travisghansen
Copy link
Member

Hmm, that’s really strange. They both should be exactly the same (in the sense of building a plugin). The active config for the app is the only thing that should alter the behavior at all.

@olljanat
Copy link

@sethicis Btw, now you have capabilities from my example but whole root fs mount. So most probably you need to add CAP_SYS_CHROOT

I handle the iscsiadm command using chroot via a wrapper.

@travisghansen how this works? Does your CSI plugin do chroot automatically when it finds /host mount or should it be part of entrypoint script in plugin container?

@travisghansen
Copy link
Member

It does this via a wrapper script builtin to the container: https://github.com/democratic-csi/democratic-csi/blob/master/docker/iscsiadm

@sethicis
Copy link
Author

Ok, something is definitely different between the two grpc servers. (My currently installed weave plugin and my built democratic-csi).

I was able to get the democratic-csi grpc server to respond to my connection calls, but I had to use grpcurl to get a peep out of it -- anything else just resulted in the following error curl: (1) Received HTTP/0.9 when not allowed.

Using grpcurl I was able to confirm that the services (Identity, Controller, and Node) were all present.
I was also about to get GRPC_TRACE to work. After digging through the source code for grpc-js v1.9.9 (the version used by democratic-csi) I was able to deduce that the proper GRPC_TRACE value was server. This gives me a read out as the server is setup and if any connection attempts come through.

However, now for the bad news. Even with the trace now working properly there is no additional information about what is happening during the plugin install / enable phase. I see the server get created, but no connection attempts ever seem to arrive and docker I guess just times out and kills the plugin.

Comparing behavior between the weave plugin and mine. I noticed that the weave plugin does NOT work with grpcurl, it terminates with the following error:

Failed to dial target host "/host/run/docker/plugins/30ae619716b8a46b5275dbf2567ab9a916d6f157bfae0a528f1884d066d054c3/weave.sock": context deadline exceeded

But again, weave does work as expected with curl, while my build of democratic-csi does not.

I'm going to think on this mystery some more and start on it again fresh tomorrow. I'll probably dig into the source code of weave and see if there are any differences between how the two grpc servers are setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants