Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No longer able to make volume ids with '[]' characters for Nomad per_alloc mounting #360

Open
CarbonCollins opened this issue Jan 6, 2024 · 12 comments

Comments

@CarbonCollins
Copy link

Hi,

Recently updated to 1.8.4 of democratic CSI (its been about a year since I last updated...) and now I am having issues as I can no longer create volumes with an id containing the [ or ] characters.

Looks like about 9 months ago the following check was added: https://github.com/democratic-csi/democratic-csi/blame/79f16a0cf636f376f394f9b10084d949004b1e62/src/driver/index.js#L461 which regex excludes these chars from being present in the volume ID. and I can see that log present in the controller logs when my terraform deploy attempts to create the volume:

{"host":"4815a1687890","level":"error","message":"handler error - driver: ControllerSmbClientDriver method: CreateVolume error: {\"name\":\"GrpcError\",\"code\":3,\"message\":\"generated volume_id 'traefik-data[0]' contains invalid characters: '[]'\"}","service":"democratic-csi","timestamp":"2024-01-06T20:56:41.924Z"}

I currently have a couple of Nomad jobs which make use of the per-alloc functionality which allows me to run multiple instances of the same task (for HA) and each allocation will attempt to use the volume with an incrementing suffix which includes []... for example my job could request to have traefik-data mounted and the first allocation will attach to a volume traefik-data[0] while the second instance will connect to traefik-data[1] and this is configured by adding that flag like so:

    volume "traefik-data" {
      type            = "csi"
      source          = "traefik-data"
      attachment_mode = "file-system"
      access_mode     = "single-node-writer"
      per_alloc       = true
    }

The nomad docs for per_alloc are found here: https://developer.hashicorp.com/nomad/docs/job-specification/volume#per_alloc

I am not sure if this is a democratic CSI issue or a Nomad issue as I am not sure if the volume ids have requirements in the CSI spec personally...

@CarbonCollins
Copy link
Author

Just a bit more info at least, I rolled back to 1.8.3 as the check is not present in that release and my deployments now work \o/ but I couldn't see any mention of these validations being added in the changelog for 1.8.4

@travisghansen
Copy link
Member

Welcome! I see you are using smbclient driver, can you send over the full config (minus any sensitive data of course)?

@CarbonCollins
Copy link
Author

Sure,

this is my templates version of it at least:

driver: smb-client
instance_id: {{ env "NOMAD_ALLOC_ID" }}
smb:
  shareHost: {{ index .Data.data "host" }}
  shareBasePath: "{{ index .Data.data "share" }}"
  controllerBasePath: "/storage"
  dirPermissionsMode: "0777"
  dirPermissionsUser: 3205
  dirPermissionsGroup: 3205

and for terraform which I use to create and register the csi volume in nomad is:

data "nomad_plugin" "storage" {
  plugin_id        = var.plugin_id
  wait_for_healthy = true
}

resource "nomad_csi_volume" "traefik_data" {
  count      = 2
  depends_on = [data.nomad_plugin.storage]

  lifecycle {
    prevent_destroy = true
  }

  namespace = "c3-networking"

  plugin_id    = data.nomad_plugin.storage.plugin_id
  volume_id    = format("traefik-data[%d]", count.index)
  name         = format("traefik-data[%d]", count.index)
  capacity_min = "1GiB"
  capacity_max = "2GiB"

  capability {
    access_mode     = "single-node-writer"
    attachment_mode = "file-system"
  }

  mount_options {
    fs_type = "cifs"
    mount_flags = [
      "vers=3.0",
      format("uid=%d", var.cifs_user_id),
      format("gid=%d", var.cifs_group_id),
      "file_mode=0600",
      "dir_mode=0700",
      "noperm",
      "nobrl",
      format("username=%s", data.vault_kv_secret_v2.volume_credentials.data["user"]),
      format("password=%s", data.vault_kv_secret_v2.volume_credentials.data["pass"])
    ]
  }

@travisghansen
Copy link
Member

I see. smb is already tricky because of an 80 char limit on share names as well. Would it be problematic in your case if the folder name becomes a hash vs a straight pass through of the supplied name? That's actually what I do in CI.

Given this is standard Nomad behavior though it does make me pause. To my knowledge that would fail in a zfs world and begs the question of how to handle it for non smb scenarios...

@CarbonCollins
Copy link
Author

I don't see any issue with the folder itself becoming a hash personally. I know right now my current share path is 3 characters long so I don't think I am anywhere near the 80 char limit right now :P

I also don't see any reason why my folders in the share cant be in the format like traefik-data-%d if that a configurable thing? That way in Nomad its registered as traefik-data[%d] and in the share its traefik-data-%d if this is what you are worried about?

@CarbonCollins
Copy link
Author

To add on that last point though I am not sure if Nomad allows the differentiation when submitting the request for a new volume to have a different downstream name vs the id of the volume...

@travisghansen
Copy link
Member

Yeah the name vs the id should not be an issue for Nomad (it's part of the csi spec and the 2 should not necessarily be considered as equal by any means).

You can set something like this:

Available options for algorithms are here:

Understood about the 80..in the case of smb-client I guess the share is pre-determined and not connected to the newly created directory...the other drivers create a new share per pv speaking generally so more of an issue there.

I don't think setting the hash on the config will both any of the existing volumes you have already created and should only impact new ones going forward.

@CarbonCollins
Copy link
Author

CarbonCollins commented Jan 9, 2024

Ok so I tried out the private config that you pointed out like so:

driver: smb-client
instance_id: {{ env "NOMAD_ALLOC_ID" }}
smb:
  shareHost: {{ index .Data.data "host" }}
  shareBasePath: "{{ index .Data.data "share" }}"
  controllerBasePath: "/storage"
  dirPermissionsMode: "0777"
  dirPermissionsUser: 3205
  dirPermissionsGroup: 3205

_private:
  csi:
    volume:
      idHash:
        strategy: crc16

The deployment from terraform operated successfully, no error was returned and the job started and mounted to the share.

I'm not sure where this hash is applied though as its still looking for a folder at //{shareHost}/{shareBasePath}/v/traefik-data[0]

On a complete side not it did not create the folder in the share if it was not previously, I remember when I first started using democraftic CSI the smb driver did not create the volume folders for you but is this still the case now?

@CarbonCollins
Copy link
Author

CarbonCollins commented Jan 9, 2024

Another observation that I found was if I set the name in the terraform config to Terraform Data 1 then the controller looks for a folder in the share called //{shareHost}/{shareBasePath}/v/Traefik Data 1 but I assume this is correct?

@travisghansen
Copy link
Member

Did it create a directory with a hashed name? I am little foggy on how terraform ties into this since I don’t use that. It seems terraform is being used to manually create the volume definition when the definition should be created by nomad after the request/rpc has responded with specific details.

@CarbonCollins
Copy link
Author

It did not create the directory in the share no…

as for the provider as far as I know it’s provides two different resources for csi volumes. The one I am using ‘nomad_csi_volume’ instructs nomad to create and register so it should be telling democratic csi to create the volume and then registers it. The other resource ‘nomad_csi_volume_registration’ is just the registration so you would create the share out of band and then register it into nomad. This is at least how I understand it to work

@gjrtimmer
Copy link

I have the same problem, with Synology.

Error

$ nomad volume create jobs/db/mysql0-iscsi.nomad
Error creating volume: Unexpected response code: 500 (rpc error: 1 error occurred:
        * controller create volume: CSI.ControllerCreateVolume: volume "mysql[0]" snapshot source &{"" ""} is not compatible with these parameters: rpc error: code = InvalidArgument desc = generated volume_id 'mysql[0]' contains invalid characters: '[]')

Volume Config

type         = "csi"
id           = "mysql[0]"
name         = "mysql[0]"
namespace    = "blog"
plugin_id    = "org.democratic-csi.iscsi"
capacity_min = "10GiB"
capacity_max = "10GiB"

capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants