Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is taking data backups of pvc with velero possible. #1355

Open
manishtradebyte opened this issue Feb 29, 2024 · 16 comments
Open

Is taking data backups of pvc with velero possible. #1355

manishtradebyte opened this issue Feb 29, 2024 · 16 comments

Comments

@manishtradebyte
Copy link

I have a question
Will the data backup taken through velero be enough to to restore ?
Do i need to take special care to maintain consistencty of data to avoid corruption?

@Slach
Copy link
Collaborator

Slach commented Feb 29, 2024

We don't have enough experience with Velero

try to apply following configs in CHI, it could affect performance

spec:
  configuration:
    files:
      users.d/fsync_medata.xml: |-
      <clickhouse>
        <profiles><default><fsync_metadata>1</fsync_metadata></default></profiles>
      </clickhouse>
     config.d/merge_tree_fsync.xml: |-
     <clickhouse>
       <merge_tree>
         <fsync_after_insert>1</fsync_after_insert>
         <fsync_part_directory>1</fsync_part_directory>
         <min_compressed_bytes_to_fsync_after_fetch>1</min_compressed_bytes_to_fsync_after_fetch>
         <min_compressed_bytes_to_fsync_after_merge>1</min_compressed_bytes_to_fsync_after_merge>
         <min_rows_to_fsync_after_merge>1</min_rows_to_fsync_after_merge>
       </merge_tree> 
     </clickhouse>
     users.d/distributed_fsync.xml: |-
     <clickhouse>
       <profiles><default>
         <fsync_after_insert>1</fsync_after_insert>
         <fsync_directories>1</fsync_directories>
       </default></profiles>
     </clickhouse>

Could you notify us and provide manifest for velero, if you will success

@manishtradebyte
Copy link
Author

I was able to backup and restore using basic velero configuration .
But I am not able to find a way to quiesce the database during backup .
I need this in order to take consistent backups which are not affected by write operations which happen at the same time I take the backup.

@Slach
Copy link
Collaborator

Slach commented Mar 3, 2024

@manishtradebyte

But I am not able to find a way to quiesce the database during backup .

You could to try

SYSTEM STOP MERGES
SYSTEM STOP REPLICATION FETCHES

and
Detach all engine=Kafka , engine=Nats and engine=RabbitMQ tables
and
Attach back and

SYSTEM START MERGES
SYSTEM START REPLICATION FETCHES

when backup complete
but it would have some side effects like replication lag

@manishtradebyte
Copy link
Author

manishtradebyte commented Mar 4, 2024

what do you mean by detach engine?

I dont use any of these tables engine=Kafka , engine=Nats and engine=RabbitMQ tables
Do i need to detach ReplicatedMergeTree and Distributed tables?

also is it

SYSTEM START REPLICATION FETCHES
or
SYSTEM START  FETCHES

@Slach
Copy link
Collaborator

Slach commented Mar 5, 2024

i mean execute DETACH/ATTACH TABLE db.kafka_table for stop background kafka and nats, and rabbitmq

i don't know about something like SYSTEM STOP MESSAGING BROKER

@manishtradebyte
Copy link
Author

I tried to take the backup of pvc's using velero for

  1. clickhouse
  2. zookeeper

Deleted everything and restored the pvc's from backup and the deployed zookeeper and clickhouse (chi)

Every thing seems to work fine but when i drop database from the restored cluster the replica paths of tables from zookeeper seems to not get deleted. This leads to to a error when ii try to recreate the same table again.

@Slach
Copy link
Collaborator

Slach commented Mar 22, 2024

@manishtradebyte did you use DROP DATABASE db SYNC ?

@manishtradebyte
Copy link
Author

manishtradebyte commented Mar 22, 2024

No.. when should i run this after restoring backup?

@Slach
Copy link
Collaborator

Slach commented Mar 22, 2024

How exactly did you make "Deleted everything"?

@manishtradebyte
Copy link
Author

basicallu deleted the cluster and pvc's removed the zookeeper deployment and its pvc's.

@Slach
Copy link
Collaborator

Slach commented Mar 22, 2024

@manishtradebyte
thereafter, did you just restore PVC+zk manifests+chlickhouse manifests with velero or just restore PVC with velero and re-deploy manifests manually?

@manishtradebyte
Copy link
Author

the just restored the pvc's using velero fro noth zk and chi.

then i deployed the manifests manually

@Slach
Copy link
Collaborator

Slach commented Mar 22, 2024

In this case, clickhouse-operator will try to restore schema during restoration, but this is weird, why you rececive replica path already exists, cause /var/lib/clickhouse/metadata should be mount from PVC.

Could you share your clickhouse pod generated manifest in yaml format?

kubectl get pod -n <your-ns> pod-name-0-0-0 -o yaml

@manishtradebyte
Copy link
Author

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-03-22T10:11:30Z"
  generateName: chi-clickhouse-cluster_name-0-0-
  labels:
    clickhouse.altinity.com/app: chop
    clickhouse.altinity.com/chi: clickhouse
    clickhouse.altinity.com/cluster: cluster_name
    clickhouse.altinity.com/namespace: clickhouse-backup
    clickhouse.altinity.com/ready: "yes"
    clickhouse.altinity.com/replica: "0"
    clickhouse.altinity.com/shard: "0"
    controller-revision-hash: chi-clickhouse-cluster_name-0-0-55dfd6875
    statefulset.kubernetes.io/pod-name: chi-clickhouse-cluster_name-0-0-0
  name: chi-clickhouse-cluster_name-0-0-0
  namespace: clickhouse-backup
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: chi-clickhouse-cluster_name-0-0
    uid: 3c1d2074-6241-44c7-b3f2-db7b8e5e5bd1
  resourceVersion: "170085224"
  uid: 23d437b1-cc7a-4aa8-8cd5-5e6b6984a1fa
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - eu-central-1a
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            clickhouse.altinity.com/app: chop
            clickhouse.altinity.com/chi: clickhouse
            clickhouse.altinity.com/namespace: clickhouse-backup
        topologyKey: kubernetes.io/hostname
  containers:
  - image: clickhouse/clickhouse-server:24.1
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 10
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      initialDelaySeconds: 60
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    name: clickhouse-pod
    ports:
    - containerPort: 9000
      name: tcp
      protocol: TCP
    - containerPort: 8123
      name: http
      protocol: TCP
    - containerPort: 9009
      name: interserver
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "2"
        memory: 6Gi
      requests:
        cpu: "1"
        memory: 4Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/clickhouse-server/config.d/
      name: chi-clickhouse-common-configd
    - mountPath: /etc/clickhouse-server/users.d/
      name: chi-clickhouse-common-usersd
    - mountPath: /etc/clickhouse-server/conf.d/
      name: chi-clickhouse-deploy-confd-cluster_name-0-0
    - mountPath: /var/lib/clickhouse
      name: default
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-c6bp4
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostAliases:
  - hostnames:
    - chi-clickhouse-cluster_name-0-0
    ip: 127.0.0.1
  hostname: chi-clickhouse-cluster_name-0-0-0
  nodeName: ip-10-64-195-208.eu-central-1.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  subdomain: chi-clickhouse-cluster_name-0-0
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default
    persistentVolumeClaim:
      claimName: default-chi-clickhouse-cluster_name-0-0-0
  - configMap:
      defaultMode: 420
      name: chi-clickhouse-common-configd
    name: chi-clickhouse-common-configd
  - configMap:
      defaultMode: 420
      name: chi-clickhouse-common-usersd
    name: chi-clickhouse-common-usersd
  - configMap:
      defaultMode: 420
      name: chi-clickhouse-deploy-confd-cluster_name-0-0
    name: chi-clickhouse-deploy-confd-cluster_name-0-0
  - name: kube-api-access-c6bp4
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:12:33Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:13:05Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:13:05Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:12:33Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://a992fe1658c93fb0972d3577b613bc1a3cc324008d5fd291726ce0383f25fb0f
    image: docker.io/clickhouse/clickhouse-server:24.1
    imageID: docker.io/clickhouse/clickhouse-server@sha256:7029f00d469e0d5d32f6c2dd3c5fd9110344b5902b4401c05da705a321e3fc86
    lastState: {}
    name: clickhouse-pod
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-03-22T10:12:53Z"
  hostIP: 10.64.195.208
  phase: Running
  podIP: 10.64.195.158
  podIPs:
  - ip: 10.64.195.158
  qosClass: Burstable
  startTime: "2024-03-22T10:12:33Z"

@Slach
Copy link
Collaborator

Slach commented Mar 22, 2024

  - mountPath: /var/lib/clickhouse
      name: default

/var/lib/clickhouse/metadata/
should attach exists tables during apply manifests

could you share?

kubectl describe chi -n clickhouse-backup clickhouse

Where is your operator installed?

kubectl get deployment --all-namespaces | grep clickhouse-operator

@manishtradebyte
Copy link
Author

operator is installed in the same namespace.

I tried to apply restore again and it seems to work. The tables a created and when i drop them i can recreate them as well

you can close this issue.if you want.

also it would be great if this gets resolved Altinity/clickhouse-backup#860. I am only using velero because clickhouse-backup doesnt work

Thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants