Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The provider doesn't seem to understand that volumes inherit from volume.XYZ keys on the pool #43

Open
stgraber opened this issue Mar 11, 2024 · 17 comments · May be fixed by #68
Open

The provider doesn't seem to understand that volumes inherit from volume.XYZ keys on the pool #43

stgraber opened this issue Mar 11, 2024 · 17 comments · May be fixed by #68
Assignees
Labels
Bug Confirmed to be a bug

Comments

@stgraber
Copy link
Member

I've noticed OpenTofu getting pretty confused about a volume.zfs.remove_snapshots=true config key appearing on a volume that's defined in terraform. That's because the config key was set on the parent storage pool and so propagated to the new volume.

That's normal and we definitely don't want terraform to try to rectify this by deleting and re-creating the volume as it attempted here :)

@maveonair
Copy link
Member

@stgraber
Copy link
Member Author

stgraber commented Mar 19, 2024

Isn't that the list of computed pool keys as opposed to inherited keys on the volumes and buckets?

@maveonair
Copy link
Member

maveonair commented Mar 19, 2024

Isn't that the list of computed pool keys as opposed to inherited keys on the volumes and buckets?

In other words, you first created the pool without Terraform/OpenTofu and then imported it with Terraform/OpenTofu?

How can I reproduce your setup to get the same issue?

@stgraber
Copy link
Member Author

incus storage pool create foo zfs volume.zfs.remove_snapshots=true`

And then have Terraform create a new storage volume using that foo storage pool.

@adamcstephens
Copy link
Contributor

Does this same behavior happen if the pool is created with a tf resource?

@stgraber
Copy link
Member Author

Good question, though not really relevant as the case I was dealing with was one where Incus is acting as a cloud, so the TF user doesn't have any control on the storage pools, they get an empty project and can only create project-tied resources.

@stgraber
Copy link
Member Author

stgraber@castiana:~/demo/test$ tofu apply

OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

OpenTofu will perform the following actions:

  # incus_storage_pool.test will be created
  + resource "incus_storage_pool" "test" {
      + config = {
          + "volume.zfs.remove_snapshots" = "true"
        }
      + driver = "zfs"
      + name   = "test"
    }

  # incus_volume.test will be created
  + resource "incus_volume" "test" {
      + config       = {
          + "size" = "20GiB"
        }
      + content_type = "block"
      + location     = (known after apply)
      + name         = "test"
      + pool         = "test"
      + target       = (known after apply)
      + type         = "custom"
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  OpenTofu will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

incus_storage_pool.test: Creating...
incus_storage_pool.test: Creation complete after 1s [name=test]
incus_volume.test: Creating...
╷
│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to incus_volume.test, provider "provider[\"registry.opentofu.org/lxc/incus\"]" produced an unexpected new value: .config: new element "zfs.remove_snapshots" has appeared.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
stgraber@castiana:~/demo/test$ cat main.tf 
terraform {
  required_providers {
    incus = {
      source  = "lxc/incus"
      version = "0.1.1"
    }
  }
}

resource "incus_storage_pool" "test" {
  name   = "test"
  driver = "zfs"
  config = {
    "volume.zfs.remove_snapshots" = "true"
  }
}

resource "incus_volume" "test" {
  name         = "test"
  pool         = incus_storage_pool.test.name
  content_type = "block"
  config = {
    "size" = "20GiB"
  }
}
stgraber@castiana:~/demo/test$ 

So yeah, even if created by TF, it still happens.

@adamcstephens
Copy link
Contributor

adamcstephens commented Mar 20, 2024

Yeah I suspected it was the case either way. Inherited resources should be ignored completely since they technically come from another resource. Can they be excluded or identified in the api?

where Incus is acting as a cloud, so the TF user doesn't have any control on the storage pools, they get an empty project and can only create project-tied resources.

If we ever need to represent unmanaged resources, I think the idiomatic solution is to model Data Sources that can still be represented on the TF side.

@stgraber
Copy link
Member Author

The provider can do GetStoragePool then look at the Config map for a anything that's volume.XYZ then for any such key, XYZ should be treated as inherited in the volume or bucket if its value matches that of the pool volume.XYZ

@maveonair
Copy link
Member

maveonair commented May 14, 2024

@stgraber Regarding the storage pool volume, this PR fixes the problem that the volume takes into account the inherited configuration value: #66

Regarding the creation of storage pools, however, I am not sure whether a correction is really necessary here:

Test Case

  1. Create the storage pool without Terraform, e.g.:
$ incus storage create test zfs volume.zfs.remove_snapshots=true
  1. Define the resource in Terraform and import it:
resource "incus_storage_pool" "test" {
  name    = "test"
  project = "default"
  driver  = "zfs"
}
$ terraform import incus_storage_pool.test default/test
incus_storage_pool.test: Importing from ID "default/test"...
incus_storage_pool.test: Import prepared!
  Prepared incus_storage_pool for import
incus_storage_pool.test: Refreshing state... [name=test]

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.
  1. Run terraform plan again:
$ terraform plan
incus_storage_pool.test: Refreshing state... [name=test]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # incus_storage_pool.test will be updated in-place
  ~ resource "incus_storage_pool" "test" {
      ~ config      = {
          - "volume.zfs.remove_snapshots" = "true" -> null
        }
        name        = "test"
        # (3 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

So when we created the pool, the configuration was imported correctly, as we can see in the status file:

$ cat terraform.tfstate
{
  "version": 4,
  "terraform_version": "1.8.3",
  "serial": 62,
  "lineage": "863f8d8e-b4f8-b4cc-3bbe-5a247ae7e02e",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "incus_storage_pool",
      "name": "test",
      "provider": "provider[\"registry.terraform.io/lxc/incus\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "config": {
              "volume.zfs.remove_snapshots": "true"
            },
            "description": "",
            "driver": "zfs",
            "name": "test",
            "project": "default",
            "remote": null,
            "target": null
          },
          "sensitive_attributes": []
        }
      ]
    }
  ],
  "check_results": null
}

But when we run terraform plan again, the local configuration state is different from the remote state. This means that the resource definition is missing the corresponding configuration entry and therefore wants to force the state change. To rectify this, you must add the corresponding configuration entry to the resource definition:

resource "incus_storage_pool" "test" {
  name    = "test"
  project = "default"
  driver  = "zfs"
  config = {
    "volume.zfs.remove_snapshots" = "true"
  }
}
$ terraform plan
incus_storage_pool.test: Refreshing state... [name=test]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are
needed.

So we could argue that this behavior is correct if you want to use the resource incus_storage_pool. If you want to ignore this completely, then we should rather introduce a data source for incus_storage_pool like we did for incus_profile that allows you to read the status but not change anything. How do you see this?

@stgraber
Copy link
Member Author

The behavior for the storage pool is fine, the problem is really with storage pool keys propagating to volumes. I don't think just piling up "computed" keys is correct here.

One can create a volume and specifically set say block.filesystem=ext4 during creation time.
But someone can similarly create a volume without specifying any block.filesystem but because the pool the volume is created in has volume.block.filesystem set to xfs, the resulting volume will now have Incus set block.filesystem=xfs unless another value is specified.

This is true of ALL volume keys, possibly including arbitrary user ones (not tested), so a storage pool having volume.user.foo=abc should result in a volume being created on that pool to get user.foo=abc.

So if going with marking keys computed, every single storage volume config key would need to be treated as computed which doesn't seem right.

@maveonair
Copy link
Member

Your example with block.filesytem is currently handled as a computed key: https://github.com/lxc/terraform-provider-incus/blob/main/internal/storage/resource_storage_volume.go#L382 as we have inherited when we forked the provider. Or am I wrong?

@stgraber
Copy link
Member Author

Bad example, but we have 20+ other keys which aren't covered by the hardcoded list, as mentioned it's effectively every single volume config key, which is why treating them all as computed kinda feels wrong.

@stgraber
Copy link
Member Author

Just looking at one storage driver (ZFS):

  • block.filesystem
  • block.mount_options
  • security.shared
  • security.shifted
  • security.unmapped
  • snapshots.expiry
  • snapshots.pattern
  • snapshots.schedule
  • zfs.blocksize
  • zfs.block_mode
  • zfs.delegate
  • zfs.remove_snapshots
  • zfs.use_refquota
  • zfs.reserve_space

@maveonair
Copy link
Member

maveonair commented May 15, 2024

I see and funnily we have implemented many of those keys example already in the computed keys for the storage pool based on the driver:

func (_ StoragePoolModel) ComputedKeys(driver string) []string {

So if I understand you correctly then you want to avoid that we hard code these keys but rather make a lookup first to Incus to determine which keys are already set and ignore them? Do you don’t see an issue that we might end up in a state situation where actually the storage volume to resource cannot be trusted anymore because we accept what ever we get from Incus, instead of “forcing” the user to be very explicit about its intentions?

@stgraber
Copy link
Member Author

I don't know how flexible the terraform plugin logic is for that, but ideally, we'd make it ignore zfs.use_refquota being set to true automatically if volume.zfs.use_refquota is set to true on the storage pool as that happens internally automatically and so isn't a change that should get reverted by Terraform.

An example would be:

  1. Pool has volume.zfs.use_refquota=true set, user asks to create a new volume and only specifies size=50GiB. That should create the volume which will then have Incus automatically set zfs.use_refquota=true. Running Terraform again should have it consider everything to be clean.

  2. Pool doesn't have volume.zfs.use_refquota set at all, user asks to create a new volume and only specifies size=50GiB. That should still all be clean. Then the user manually sets zfs.use_refquota=true on the volume. Terraform should pick that up as a manual change that needs to be reverted as the pool doesn't have volume.zfs.use_refquota=true set on it and the user didn't ask for the volume to have zfs.use_refquota set.

@maveonair
Copy link
Member

@stgraber Please have a look at #68 and I would be grateful if you could test whether it works as you have described.

@maveonair maveonair self-assigned this May 27, 2024
@maveonair maveonair added the Bug Confirmed to be a bug label May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
3 participants