Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanding PVC Volume Template Results in Data Loss #1385

Open
tman5 opened this issue Apr 2, 2024 · 10 comments
Open

Expanding PVC Volume Template Results in Data Loss #1385

tman5 opened this issue Apr 2, 2024 · 10 comments

Comments

@tman5
Copy link

tman5 commented Apr 2, 2024

When trying to expand the PVC volume template the operator will delete/re-create the PVC volumes instead of just resizing them. We are using Rook-Ceph as the storage provider and have successfully resized PVCs without delete/re-create. We can also manually edit the PVC itself and it will expand. We are using version 0.22.2 of the operator. I've reproduced it in multiple clusters.

We have tried it without the storageManagement options as well and it just results in a loop where the operator will continually try to delete/re-create the PVCs

    storageManagement:
      provisioner: Operator
      reclaimPolicy: Retain
---
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"

metadata:
  name: "clickhouse"

spec:
  defaults:
    templates:
      dataVolumeClaimTemplate: default
      podTemplate: clickhouse:23.7.1.2470-alpine
    storageManagement:
      provisioner: Operator
      reclaimPolicy: Retain

  configuration:
    settings:
         # to allow scrape metrics via embedded prometheus protocol
         prometheus/endpoint: /metrics
         prometheus/port: 8888
         prometheus/metrics: true
         prometheus/events: true
         prometheus/asynchronous_metrics: true
    zookeeper:
      nodes:
      - host: clickhouse-keeper.clickhouse.svc.cluster.local
    users:
      default/networks/ip: "::/0"
      default/password: password
      default/profile: default
      # use cluster Pod CIDR for more security
      backup/networks/ip: 0.0.0.0/0
      # PASSWORD=backup_password; echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
      backup/password_sha256_hex: eb94c11d77f46a0290ba8c4fca1a7fd315b72e1e6c83146e42117c568cc3ea4d
    clusters:
      - name: replicated
        layout:
          shardsCount: 1
          replicasCount: 3
    files:
      config.xml: |
          <?xml version="1.0"?>
          <yandex>
            <remote_servers>
                <!-- Test only shard config for testing distributed storage -->
                <ch_cluster>
                    <shard>
                        <internal_replication>True</internal_replication>
                          <replica>
                              <host>chi-clickhouse-replicated-0-0</host>
                              <port>9000</port>
                              <secure>0</secure>
                          </replica>
                          <replica>
                              <host>chi-clickhouse-replicated-0-1</host>
                              <port>9000</port>
                              <secure>0</secure>
                          </replica>
                          <replica>
                              <host>chi-clickhouse-replicated-0-2</host>
                              <port>9000</port>
                              <secure>0</secure>
                          </replica>
                    </shard>
                </ch_cluster>
            </remote_servers>


            <!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
                By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
                Values for substitutions are specified in /clickhouse/name_of_substitution elements in that file.
              -->

            <!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.
                Optional. If you don't use replicated tables, you could omit that.

                See https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/
              -->

            <zookeeper>
                <node>
                    <host>clickhouse-keeper.clickhouse.svc.cluster.local</host>
                    <port>2181</port>
                    <secure>0</secure>
                </node>
            </zookeeper>
            <!--
              OpenTelemetry log contains OpenTelemetry trace spans.
            -->
            <opentelemetry_span_log>
              <!--
                  The default table creation code is insufficient, this <engine> spec
                  is a workaround. There is no 'event_time' for this log, but two times,
                  start and finish. It is sorted by finish time, to avoid inserting
                  data too far away in the past (probably we can sometimes insert a span
                  that is seconds earlier than the last span in the table, due to a race
                  between several spans inserted in parallel). This gives the spans a
                  global order that we can use to e.g. retry insertion into some external
                  system.
              -->
              <engine>
                  engine MergeTree
                  partition by toYYYYMM(finish_date)
                  order by (finish_date, finish_time_us, trace_id)
              </engine>
              <database>system</database>
              <table>opentelemetry_span_log</table>
              <flush_interval_milliseconds>7500</flush_interval_milliseconds>
            </opentelemetry_span_log>
          </yandex>
  templates:
    volumeClaimTemplates:
      - name: default
        spec:
          accessModes:
            - ReadWriteOnce
          reclaimPolicy: Retain
          resources:
            requests:
              storage: 55Gi
    podTemplates:
      - name: clickhouse:23.7.1.2470-alpine
        metadata:
          annotations:
              prometheus.io/scrape: 'true'
              prometheus.io/port: '8888'
              prometheus.io/path: '/metrics'
              # need separate prometheus scrape config, look to https://github.com/prometheus/prometheus/issues/3756
              clickhouse.backup/scrape: 'true'
              clickhouse.backup/port: '7171'
              clickhouse.backup/path: '/metrics'
        spec:
          containers:
            - name: clickhouse-pod
              image: clickhouse-server:23.7.1.2470-alpine
            - name: clickhouse-backup
              image: clickhouse-backup:latest
              imagePullPolicy: Always
              command:
                - bash
                - -xc
                - "/bin/clickhouse-backup server"
              env:
                - name: CLICKHOUSE_PASSWORD
                  value: password
                - name: LOG_LEVEL
                  value: "debug"
                - name: ALLOW_EMPTY_BACKUPS
                  value: "true"
                - name: API_LISTEN
                  value: "0.0.0.0:7171"
                # INSERT INTO system.backup_actions to execute backup
                - name: API_CREATE_INTEGRATION_TABLES
                  value: "true"
                - name: BACKUPS_TO_KEEP_REMOTE
                  value: "3"
                # change it for production S3
                - name: REMOTE_STORAGE
                  value: "s3"
                - name: S3_ACL
                  value: "private"
                - name: S3_ENDPOINT
                  value: https://minio
                - name: S3_BUCKET
                  value: clickhouse-backups
                # {shard} macro defined by clickhouse-operator
                - name: S3_PATH
                  value: backup/shard-{shard}
                - name: S3_ACCESS_KEY
                  value: clickhouse_backups_rw
                - name: S3_DISABLE_CERT_VERIFICATION
                  value: "true"
                - name: S3_SECRET_KEY
                  value: password
                - name: S3_FORCE_PATH_STYLE
                  value: "true"
              ports:
                - name: backup-rest
                  containerPort: 7171
@hodgesrm
Copy link
Member

hodgesrm commented Apr 2, 2024

Thanks. Would it be possible to attach the operator log as a file to this case? I would like to see if there is an issue with operator reconciliation. If you can access rook logs, please attach those as well.

@tman5
Copy link
Author

tman5 commented Apr 2, 2024

@alex-zaitsev
Copy link
Member

@tman5 , could you show your storage classes?

kubectl get storageclasses -o wide

And it would be useful to see one of PVCs created by an operator.

@tman5
Copy link
Author

tman5 commented Apr 2, 2024

NAME                          PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ceph-bucket                   rook-ceph.ceph.rook.io/bucket   Delete          Immediate           false                  208d
ceph-filesystem               rook-ceph.cephfs.csi.ceph.com   Delete          Immediate           true                   208d
rook-ceph-block (default)     rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   208d
sc-smb-mssql-database-repos   smb.csi.k8s.io                  Retain          Immediate           false                  182d
sc-smb-mssql-deploy-scripts   smb.csi.k8s.io                  Retain          Immediate           false                  182d
sc-smb-mssql-wss              smb.csi.k8s.io                  Retain          Immediate           false                  182d

This is one of the PVCs that will perpetually be in a terminating state:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
  creationTimestamp: "2024-04-02T12:05:28Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-04-02T12:05:34Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    argocd.argoproj.io/instance: featbit-clickhouse-dev2
    clickhouse.altinity.com/app: chop
    clickhouse.altinity.com/chi: clickhouse
    clickhouse.altinity.com/cluster: replicated
    clickhouse.altinity.com/namespace: clark-developer-featbit
    clickhouse.altinity.com/object-version: 241ccf05924775f258c440aecb86eecc549bb3ce
    clickhouse.altinity.com/reclaimPolicy: Retain
    clickhouse.altinity.com/replica: "0"
    clickhouse.altinity.com/shard: "0"
  name: default-chi-clickhouse-replicated-0-0-0
  namespace: clark-developer-featbit
  resourceVersion: "298826497"
  uid: f9ea50da-82a6-47b9-9231-8a53022d5d03
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 60Gi
  storageClassName: rook-ceph-block
  volumeMode: Filesystem
  volumeName: pvc-f9ea50da-82a6-47b9-9231-8a53022d5d03
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 60Gi
  phase: Bound

@Slach
Copy link
Collaborator

Slach commented Apr 2, 2024

E0402 12:08:00.875175       1 creator.go:175] updatePersistentVolumeClaim():clark-developer-featbit/default-chi-clickhouse-replicated-0-1-0:unable to Update PVC err: Operation cannot be fulfilled on persistentvolumeclaims "default-chi-clickhouse-replicated-0-1-0": the object has been modified; please apply your changes to the latest version and try again
E0402 12:08:00.875219       1 worker-chi-reconciler.go:1000] reconcilePVCFromVolumeMount():ERROR unable to reconcile PVC(clark-developer-featbit/default-chi-clickhouse-replicated-0-1-0) err: Operation cannot be fulfilled on persistentvolumeclaims "default-chi-clickhouse-replicated-0-1-0": the object has been modified; please apply your changes to the latest version and try again

it means someone like ArgoCD changed PVC

could you try to deploy CHI without argocd
and try to rescale?

@tman5
Copy link
Author

tman5 commented Apr 2, 2024

Is there a way to make it work with argo?

@alex-zaitsev
Copy link
Member

Errors can not lead to PVC deletion. I wonder if this is actually ArgoCD that deleted it?

@hodgesrm
Copy link
Member

hodgesrm commented Apr 4, 2024

@tman5 Assuming you are using Argo CD can you describe how you have configured CI/CD and exactly what are the steps you apply to make a change to volume size? It seems possible that multiple actors are trying to manage the CHI resources or at least the underlying volume.

p.s., Argo CD normally is fine with changes to storage size. I've done it many times on AWS EBS volumes.

@tman5
Copy link
Author

tman5 commented Apr 5, 2024

This is my argo-cd config:

---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: clickhouse
  namespace: argo-cd
spec:
  destination:
    namespace: clickhouse
    server: https://kube-server
  project: dev
  source:
    path: ./overlays/dev1/clickhouse
    repoURL: https://repo.local
    targetRevision: master
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    retry:
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m0s
      limit: 2
    syncOptions:
    - CreateNamespace=true
    - PruneLast=true
    - PrunePropagationPolicy=foreground
    - ServerSideApply=true
    - --sync-hook-timeout=60s
    - --sync-wait=60s

It points to a repo that has a kustomize file:

---
kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

resources:
  - ../../../base/clickhouse-keeper/
  - ../clickhouse-operator/
  - manifest.yml
  - clickhouse-backup-rw-password.yml

namespace: clickhouse
...

Then the manifest file is what i posted above. I edit the PVC size in that manifest, commit it to the repo and then let argo do it's thing

@tman5
Copy link
Author

tman5 commented Apr 5, 2024

In the clickhouse-operator directory, this is the kustomize file:

---
kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

helmCharts:
  - name: altinity-clickhouse-operator
    releaseName: clickhouse-operator
    namespace: clickhouse
    repo: https://docs.altinity.com/clickhouse-operator/
    version: 0.22.2
    valuesInline:
      configs:
        configdFiles:
          01-clickhouse-02-logger.xml: |
            <!-- IMPORTANT -->
            <!-- This file is auto-generated -->
            <!-- Do not edit this file - all changes would be lost -->
            <!-- Edit appropriate template in the following folder: -->
            <!-- deploy/builder/templates-config -->
            <!-- IMPORTANT -->
            <yandex>
                <logger>
                    <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
                    <level>warning</level>
                    <log>/var/log/clickhouse-server/clickhouse-server.log</log>
                    <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
                    <size>1000M</size>
                    <count>10</count>
                    <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
                    <console>1</console>
                </logger>
            </yandex>


...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants