Is taking data backups of pvc with velero possible. #1355

manishtradebyte · 2024-02-29T09:46:58Z

I have a question
Will the data backup taken through velero be enough to to restore ?
Do i need to take special care to maintain consistencty of data to avoid corruption?

Slach · 2024-02-29T10:11:06Z

We don't have enough experience with Velero

try to apply following configs in CHI, it could affect performance

spec:
  configuration:
    files:
      users.d/fsync_medata.xml: |-
      <clickhouse>
        <profiles><default><fsync_metadata>1</fsync_metadata></default></profiles>
      </clickhouse>
     config.d/merge_tree_fsync.xml: |-
     <clickhouse>
       <merge_tree>
         <fsync_after_insert>1</fsync_after_insert>
         <fsync_part_directory>1</fsync_part_directory>
         <min_compressed_bytes_to_fsync_after_fetch>1</min_compressed_bytes_to_fsync_after_fetch>
         <min_compressed_bytes_to_fsync_after_merge>1</min_compressed_bytes_to_fsync_after_merge>
         <min_rows_to_fsync_after_merge>1</min_rows_to_fsync_after_merge>
       </merge_tree> 
     </clickhouse>
     users.d/distributed_fsync.xml: |-
     <clickhouse>
       <profiles><default>
         <fsync_after_insert>1</fsync_after_insert>
         <fsync_directories>1</fsync_directories>
       </default></profiles>
     </clickhouse>

Could you notify us and provide manifest for velero, if you will success

manishtradebyte · 2024-03-03T17:35:09Z

I was able to backup and restore using basic velero configuration .
But I am not able to find a way to quiesce the database during backup .
I need this in order to take consistent backups which are not affected by write operations which happen at the same time I take the backup.

Slach · 2024-03-03T17:59:55Z

@manishtradebyte

But I am not able to find a way to quiesce the database during backup .

You could to try

SYSTEM STOP MERGES
SYSTEM STOP REPLICATION FETCHES

and
Detach all engine=Kafka , engine=Nats and engine=RabbitMQ tables
and
Attach back and

SYSTEM START MERGES
SYSTEM START REPLICATION FETCHES

when backup complete
but it would have some side effects like replication lag

manishtradebyte · 2024-03-04T23:39:11Z

what do you mean by detach engine?

I dont use any of these tables engine=Kafka , engine=Nats and engine=RabbitMQ tables
Do i need to detach ReplicatedMergeTree and Distributed tables?

also is it

SYSTEM START REPLICATION FETCHES
or
SYSTEM START  FETCHES

Slach · 2024-03-05T04:58:17Z

i mean execute DETACH/ATTACH TABLE db.kafka_table for stop background kafka and nats, and rabbitmq

i don't know about something like SYSTEM STOP MESSAGING BROKER

manishtradebyte · 2024-03-21T14:51:38Z

I tried to take the backup of pvc's using velero for

clickhouse
zookeeper

Deleted everything and restored the pvc's from backup and the deployed zookeeper and clickhouse (chi)

Every thing seems to work fine but when i drop database from the restored cluster the replica paths of tables from zookeeper seems to not get deleted. This leads to to a error when ii try to recreate the same table again.

Slach · 2024-03-22T07:59:41Z

@manishtradebyte did you use DROP DATABASE db SYNC ?

manishtradebyte · 2024-03-22T08:23:35Z

No.. when should i run this after restoring backup?

Slach · 2024-03-22T08:26:23Z

How exactly did you make "Deleted everything"?

manishtradebyte · 2024-03-22T09:34:58Z

basicallu deleted the cluster and pvc's removed the zookeeper deployment and its pvc's.

Slach · 2024-03-22T10:30:33Z

@manishtradebyte
thereafter, did you just restore PVC+zk manifests+chlickhouse manifests with velero or just restore PVC with velero and re-deploy manifests manually?

manishtradebyte · 2024-03-22T10:45:34Z

the just restored the pvc's using velero fro noth zk and chi.

then i deployed the manifests manually

Slach · 2024-03-22T10:51:16Z

In this case, clickhouse-operator will try to restore schema during restoration, but this is weird, why you rececive replica path already exists, cause /var/lib/clickhouse/metadata should be mount from PVC.

Could you share your clickhouse pod generated manifest in yaml format?

kubectl get pod -n <your-ns> pod-name-0-0-0 -o yaml

manishtradebyte · 2024-03-22T11:11:18Z

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-03-22T10:11:30Z"
  generateName: chi-clickhouse-cluster_name-0-0-
  labels:
    clickhouse.altinity.com/app: chop
    clickhouse.altinity.com/chi: clickhouse
    clickhouse.altinity.com/cluster: cluster_name
    clickhouse.altinity.com/namespace: clickhouse-backup
    clickhouse.altinity.com/ready: "yes"
    clickhouse.altinity.com/replica: "0"
    clickhouse.altinity.com/shard: "0"
    controller-revision-hash: chi-clickhouse-cluster_name-0-0-55dfd6875
    statefulset.kubernetes.io/pod-name: chi-clickhouse-cluster_name-0-0-0
  name: chi-clickhouse-cluster_name-0-0-0
  namespace: clickhouse-backup
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: chi-clickhouse-cluster_name-0-0
    uid: 3c1d2074-6241-44c7-b3f2-db7b8e5e5bd1
  resourceVersion: "170085224"
  uid: 23d437b1-cc7a-4aa8-8cd5-5e6b6984a1fa
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - eu-central-1a
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            clickhouse.altinity.com/app: chop
            clickhouse.altinity.com/chi: clickhouse
            clickhouse.altinity.com/namespace: clickhouse-backup
        topologyKey: kubernetes.io/hostname
  containers:
  - image: clickhouse/clickhouse-server:24.1
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 10
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      initialDelaySeconds: 60
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    name: clickhouse-pod
    ports:
    - containerPort: 9000
      name: tcp
      protocol: TCP
    - containerPort: 8123
      name: http
      protocol: TCP
    - containerPort: 9009
      name: interserver
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "2"
        memory: 6Gi
      requests:
        cpu: "1"
        memory: 4Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/clickhouse-server/config.d/
      name: chi-clickhouse-common-configd
    - mountPath: /etc/clickhouse-server/users.d/
      name: chi-clickhouse-common-usersd
    - mountPath: /etc/clickhouse-server/conf.d/
      name: chi-clickhouse-deploy-confd-cluster_name-0-0
    - mountPath: /var/lib/clickhouse
      name: default
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-c6bp4
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostAliases:
  - hostnames:
    - chi-clickhouse-cluster_name-0-0
    ip: 127.0.0.1
  hostname: chi-clickhouse-cluster_name-0-0-0
  nodeName: ip-10-64-195-208.eu-central-1.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  subdomain: chi-clickhouse-cluster_name-0-0
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default
    persistentVolumeClaim:
      claimName: default-chi-clickhouse-cluster_name-0-0-0
  - configMap:
      defaultMode: 420
      name: chi-clickhouse-common-configd
    name: chi-clickhouse-common-configd
  - configMap:
      defaultMode: 420
      name: chi-clickhouse-common-usersd
    name: chi-clickhouse-common-usersd
  - configMap:
      defaultMode: 420
      name: chi-clickhouse-deploy-confd-cluster_name-0-0
    name: chi-clickhouse-deploy-confd-cluster_name-0-0
  - name: kube-api-access-c6bp4
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:12:33Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:13:05Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:13:05Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:12:33Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://a992fe1658c93fb0972d3577b613bc1a3cc324008d5fd291726ce0383f25fb0f
    image: docker.io/clickhouse/clickhouse-server:24.1
    imageID: docker.io/clickhouse/clickhouse-server@sha256:7029f00d469e0d5d32f6c2dd3c5fd9110344b5902b4401c05da705a321e3fc86
    lastState: {}
    name: clickhouse-pod
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-03-22T10:12:53Z"
  hostIP: 10.64.195.208
  phase: Running
  podIP: 10.64.195.158
  podIPs:
  - ip: 10.64.195.158
  qosClass: Burstable
  startTime: "2024-03-22T10:12:33Z"

Slach · 2024-03-22T11:46:52Z

  - mountPath: /var/lib/clickhouse
      name: default

/var/lib/clickhouse/metadata/
should attach exists tables during apply manifests

could you share?

kubectl describe chi -n clickhouse-backup clickhouse

Where is your operator installed?

kubectl get deployment --all-namespaces | grep clickhouse-operator

manishtradebyte · 2024-03-22T13:50:28Z

operator is installed in the same namespace.

I tried to apply restore again and it seems to work. The tables a created and when i drop them i can recreate them as well

you can close this issue.if you want.

also it would be great if this gets resolved Altinity/clickhouse-backup#860. I am only using velero because clickhouse-backup doesnt work

Thanks a lot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is taking data backups of pvc with velero possible. #1355

Is taking data backups of pvc with velero possible. #1355

manishtradebyte commented Feb 29, 2024

Slach commented Feb 29, 2024 •

edited

manishtradebyte commented Mar 3, 2024

Slach commented Mar 3, 2024

manishtradebyte commented Mar 4, 2024 •

edited

Slach commented Mar 5, 2024

manishtradebyte commented Mar 21, 2024

Slach commented Mar 22, 2024

manishtradebyte commented Mar 22, 2024 •

edited

Slach commented Mar 22, 2024

manishtradebyte commented Mar 22, 2024

Slach commented Mar 22, 2024 •

edited

manishtradebyte commented Mar 22, 2024

Slach commented Mar 22, 2024

manishtradebyte commented Mar 22, 2024

Slach commented Mar 22, 2024

manishtradebyte commented Mar 22, 2024

Is taking data backups of pvc with velero possible. #1355

Is taking data backups of pvc with velero possible. #1355

Comments

manishtradebyte commented Feb 29, 2024

Slach commented Feb 29, 2024 • edited

manishtradebyte commented Mar 3, 2024

Slach commented Mar 3, 2024

manishtradebyte commented Mar 4, 2024 • edited

Slach commented Mar 5, 2024

manishtradebyte commented Mar 21, 2024

Slach commented Mar 22, 2024

manishtradebyte commented Mar 22, 2024 • edited

Slach commented Mar 22, 2024

manishtradebyte commented Mar 22, 2024

Slach commented Mar 22, 2024 • edited

manishtradebyte commented Mar 22, 2024

Slach commented Mar 22, 2024

manishtradebyte commented Mar 22, 2024

Slach commented Mar 22, 2024

manishtradebyte commented Mar 22, 2024

Slach commented Feb 29, 2024 •

edited

manishtradebyte commented Mar 4, 2024 •

edited

manishtradebyte commented Mar 22, 2024 •

edited

Slach commented Mar 22, 2024 •

edited