Expanding PVC Volume Template Results in Data Loss #1385

tman5 · 2024-04-02T11:38:00Z

When trying to expand the PVC volume template the operator will delete/re-create the PVC volumes instead of just resizing them. We are using Rook-Ceph as the storage provider and have successfully resized PVCs without delete/re-create. We can also manually edit the PVC itself and it will expand. We are using version 0.22.2 of the operator. I've reproduced it in multiple clusters.

We have tried it without the storageManagement options as well and it just results in a loop where the operator will continually try to delete/re-create the PVCs

    storageManagement:
      provisioner: Operator
      reclaimPolicy: Retain

---
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"

metadata:
  name: "clickhouse"

spec:
  defaults:
    templates:
      dataVolumeClaimTemplate: default
      podTemplate: clickhouse:23.7.1.2470-alpine
    storageManagement:
      provisioner: Operator
      reclaimPolicy: Retain

  configuration:
    settings:
         # to allow scrape metrics via embedded prometheus protocol
         prometheus/endpoint: /metrics
         prometheus/port: 8888
         prometheus/metrics: true
         prometheus/events: true
         prometheus/asynchronous_metrics: true
    zookeeper:
      nodes:
      - host: clickhouse-keeper.clickhouse.svc.cluster.local
    users:
      default/networks/ip: "::/0"
      default/password: password
      default/profile: default
      # use cluster Pod CIDR for more security
      backup/networks/ip: 0.0.0.0/0
      # PASSWORD=backup_password; echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
      backup/password_sha256_hex: eb94c11d77f46a0290ba8c4fca1a7fd315b72e1e6c83146e42117c568cc3ea4d
    clusters:
      - name: replicated
        layout:
          shardsCount: 1
          replicasCount: 3
    files:
      config.xml: |
          <?xml version="1.0"?>
          <yandex>
            <remote_servers>
                <!-- Test only shard config for testing distributed storage -->
                <ch_cluster>
                    <shard>
                        <internal_replication>True</internal_replication>
                          <replica>
                              <host>chi-clickhouse-replicated-0-0</host>
                              <port>9000</port>
                              <secure>0</secure>
                          </replica>
                          <replica>
                              <host>chi-clickhouse-replicated-0-1</host>
                              <port>9000</port>
                              <secure>0</secure>
                          </replica>
                          <replica>
                              <host>chi-clickhouse-replicated-0-2</host>
                              <port>9000</port>
                              <secure>0</secure>
                          </replica>
                    </shard>
                </ch_cluster>
            </remote_servers>


            <!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
                By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
                Values for substitutions are specified in /clickhouse/name_of_substitution elements in that file.
              -->

            <!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.
                Optional. If you don't use replicated tables, you could omit that.

                See https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/
              -->

            <zookeeper>
                <node>
                    <host>clickhouse-keeper.clickhouse.svc.cluster.local</host>
                    <port>2181</port>
                    <secure>0</secure>
                </node>
            </zookeeper>
            <!--
              OpenTelemetry log contains OpenTelemetry trace spans.
            -->
            <opentelemetry_span_log>
              <!--
                  The default table creation code is insufficient, this <engine> spec
                  is a workaround. There is no 'event_time' for this log, but two times,
                  start and finish. It is sorted by finish time, to avoid inserting
                  data too far away in the past (probably we can sometimes insert a span
                  that is seconds earlier than the last span in the table, due to a race
                  between several spans inserted in parallel). This gives the spans a
                  global order that we can use to e.g. retry insertion into some external
                  system.
              -->
              <engine>
                  engine MergeTree
                  partition by toYYYYMM(finish_date)
                  order by (finish_date, finish_time_us, trace_id)
              </engine>
              <database>system</database>
              <table>opentelemetry_span_log</table>
              <flush_interval_milliseconds>7500</flush_interval_milliseconds>
            </opentelemetry_span_log>
          </yandex>
  templates:
    volumeClaimTemplates:
      - name: default
        spec:
          accessModes:
            - ReadWriteOnce
          reclaimPolicy: Retain
          resources:
            requests:
              storage: 55Gi
    podTemplates:
      - name: clickhouse:23.7.1.2470-alpine
        metadata:
          annotations:
              prometheus.io/scrape: 'true'
              prometheus.io/port: '8888'
              prometheus.io/path: '/metrics'
              # need separate prometheus scrape config, look to https://github.com/prometheus/prometheus/issues/3756
              clickhouse.backup/scrape: 'true'
              clickhouse.backup/port: '7171'
              clickhouse.backup/path: '/metrics'
        spec:
          containers:
            - name: clickhouse-pod
              image: clickhouse-server:23.7.1.2470-alpine
            - name: clickhouse-backup
              image: clickhouse-backup:latest
              imagePullPolicy: Always
              command:
                - bash
                - -xc
                - "/bin/clickhouse-backup server"
              env:
                - name: CLICKHOUSE_PASSWORD
                  value: password
                - name: LOG_LEVEL
                  value: "debug"
                - name: ALLOW_EMPTY_BACKUPS
                  value: "true"
                - name: API_LISTEN
                  value: "0.0.0.0:7171"
                # INSERT INTO system.backup_actions to execute backup
                - name: API_CREATE_INTEGRATION_TABLES
                  value: "true"
                - name: BACKUPS_TO_KEEP_REMOTE
                  value: "3"
                # change it for production S3
                - name: REMOTE_STORAGE
                  value: "s3"
                - name: S3_ACL
                  value: "private"
                - name: S3_ENDPOINT
                  value: https://minio
                - name: S3_BUCKET
                  value: clickhouse-backups
                # {shard} macro defined by clickhouse-operator
                - name: S3_PATH
                  value: backup/shard-{shard}
                - name: S3_ACCESS_KEY
                  value: clickhouse_backups_rw
                - name: S3_DISABLE_CERT_VERIFICATION
                  value: "true"
                - name: S3_SECRET_KEY
                  value: password
                - name: S3_FORCE_PATH_STYLE
                  value: "true"
              ports:
                - name: backup-rest
                  containerPort: 7171

The text was updated successfully, but these errors were encountered:

hodgesrm · 2024-04-02T11:53:12Z

Thanks. Would it be possible to attach the operator log as a file to this case? I would like to see if there is an issue with operator reconciliation. If you can access rook logs, please attach those as well.

tman5 · 2024-04-02T12:36:43Z

dev_clickhouse_operator.txt

Explore-logs-2024-04-02 08 36 22.txt

alex-zaitsev · 2024-04-02T12:55:18Z

@tman5 , could you show your storage classes?

kubectl get storageclasses -o wide

And it would be useful to see one of PVCs created by an operator.

tman5 · 2024-04-02T13:23:29Z

NAME                          PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ceph-bucket                   rook-ceph.ceph.rook.io/bucket   Delete          Immediate           false                  208d
ceph-filesystem               rook-ceph.cephfs.csi.ceph.com   Delete          Immediate           true                   208d
rook-ceph-block (default)     rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   208d
sc-smb-mssql-database-repos   smb.csi.k8s.io                  Retain          Immediate           false                  182d
sc-smb-mssql-deploy-scripts   smb.csi.k8s.io                  Retain          Immediate           false                  182d
sc-smb-mssql-wss              smb.csi.k8s.io                  Retain          Immediate           false                  182d

This is one of the PVCs that will perpetually be in a terminating state:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
  creationTimestamp: "2024-04-02T12:05:28Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-04-02T12:05:34Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    argocd.argoproj.io/instance: featbit-clickhouse-dev2
    clickhouse.altinity.com/app: chop
    clickhouse.altinity.com/chi: clickhouse
    clickhouse.altinity.com/cluster: replicated
    clickhouse.altinity.com/namespace: clark-developer-featbit
    clickhouse.altinity.com/object-version: 241ccf05924775f258c440aecb86eecc549bb3ce
    clickhouse.altinity.com/reclaimPolicy: Retain
    clickhouse.altinity.com/replica: "0"
    clickhouse.altinity.com/shard: "0"
  name: default-chi-clickhouse-replicated-0-0-0
  namespace: clark-developer-featbit
  resourceVersion: "298826497"
  uid: f9ea50da-82a6-47b9-9231-8a53022d5d03
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 60Gi
  storageClassName: rook-ceph-block
  volumeMode: Filesystem
  volumeName: pvc-f9ea50da-82a6-47b9-9231-8a53022d5d03
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 60Gi
  phase: Bound

Slach · 2024-04-02T14:44:26Z

E0402 12:08:00.875175       1 creator.go:175] updatePersistentVolumeClaim():clark-developer-featbit/default-chi-clickhouse-replicated-0-1-0:unable to Update PVC err: Operation cannot be fulfilled on persistentvolumeclaims "default-chi-clickhouse-replicated-0-1-0": the object has been modified; please apply your changes to the latest version and try again
E0402 12:08:00.875219       1 worker-chi-reconciler.go:1000] reconcilePVCFromVolumeMount():ERROR unable to reconcile PVC(clark-developer-featbit/default-chi-clickhouse-replicated-0-1-0) err: Operation cannot be fulfilled on persistentvolumeclaims "default-chi-clickhouse-replicated-0-1-0": the object has been modified; please apply your changes to the latest version and try again

it means someone like ArgoCD changed PVC

could you try to deploy CHI without argocd
and try to rescale?

tman5 · 2024-04-02T14:48:08Z

Is there a way to make it work with argo?

alex-zaitsev · 2024-04-02T18:43:36Z

Errors can not lead to PVC deletion. I wonder if this is actually ArgoCD that deleted it?

hodgesrm · 2024-04-04T16:29:20Z

@tman5 Assuming you are using Argo CD can you describe how you have configured CI/CD and exactly what are the steps you apply to make a change to volume size? It seems possible that multiple actors are trying to manage the CHI resources or at least the underlying volume.

p.s., Argo CD normally is fine with changes to storage size. I've done it many times on AWS EBS volumes.

tman5 · 2024-04-05T11:56:22Z

This is my argo-cd config:

---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: clickhouse
  namespace: argo-cd
spec:
  destination:
    namespace: clickhouse
    server: https://kube-server
  project: dev
  source:
    path: ./overlays/dev1/clickhouse
    repoURL: https://repo.local
    targetRevision: master
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    retry:
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m0s
      limit: 2
    syncOptions:
    - CreateNamespace=true
    - PruneLast=true
    - PrunePropagationPolicy=foreground
    - ServerSideApply=true
    - --sync-hook-timeout=60s
    - --sync-wait=60s

It points to a repo that has a kustomize file:

---
kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

resources:
  - ../../../base/clickhouse-keeper/
  - ../clickhouse-operator/
  - manifest.yml
  - clickhouse-backup-rw-password.yml

namespace: clickhouse
...

Then the manifest file is what i posted above. I edit the PVC size in that manifest, commit it to the repo and then let argo do it's thing

tman5 · 2024-04-05T11:57:47Z

In the clickhouse-operator directory, this is the kustomize file:

---
kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

helmCharts:
  - name: altinity-clickhouse-operator
    releaseName: clickhouse-operator
    namespace: clickhouse
    repo: https://docs.altinity.com/clickhouse-operator/
    version: 0.22.2
    valuesInline:
      configs:
        configdFiles:
          01-clickhouse-02-logger.xml: |
            <!-- IMPORTANT -->
            <!-- This file is auto-generated -->
            <!-- Do not edit this file - all changes would be lost -->
            <!-- Edit appropriate template in the following folder: -->
            <!-- deploy/builder/templates-config -->
            <!-- IMPORTANT -->
            <yandex>
                <logger>
                    <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
                    <level>warning</level>
                    <log>/var/log/clickhouse-server/clickhouse-server.log</log>
                    <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
                    <size>1000M</size>
                    <count>10</count>
                    <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
                    <console>1</console>
                </logger>
            </yandex>


...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expanding PVC Volume Template Results in Data Loss #1385

Expanding PVC Volume Template Results in Data Loss #1385

tman5 commented Apr 2, 2024

hodgesrm commented Apr 2, 2024

tman5 commented Apr 2, 2024

alex-zaitsev commented Apr 2, 2024

tman5 commented Apr 2, 2024

Slach commented Apr 2, 2024

tman5 commented Apr 2, 2024

alex-zaitsev commented Apr 2, 2024

hodgesrm commented Apr 4, 2024

tman5 commented Apr 5, 2024

tman5 commented Apr 5, 2024

Expanding PVC Volume Template Results in Data Loss #1385

Expanding PVC Volume Template Results in Data Loss #1385

Comments

tman5 commented Apr 2, 2024

hodgesrm commented Apr 2, 2024

tman5 commented Apr 2, 2024

alex-zaitsev commented Apr 2, 2024

tman5 commented Apr 2, 2024

Slach commented Apr 2, 2024

tman5 commented Apr 2, 2024

alex-zaitsev commented Apr 2, 2024

hodgesrm commented Apr 4, 2024

tman5 commented Apr 5, 2024

tman5 commented Apr 5, 2024