Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/backup/status should return latest command status if no one in progress commands executed #887

Open
frankwg opened this issue Apr 8, 2024 · 9 comments
Assignees
Milestone

Comments

@frankwg
Copy link

frankwg commented Apr 8, 2024

I used the locally built 2.5.0 for the testing and found the /backup/status endpoint returns empty after /backup/actions with {"command","create_remote <backup_name>"} or /backup/upload/<local_backup_name> was issued. But, it returns correctly while previous request is /backup/list or /backup/clean.

Note: the use_embedded_backup_restore: true was used. Also, the upload to the s3 was not successful.

@Slach
Copy link
Collaborator

Slach commented Apr 8, 2024

according to
https://github.com/Altinity/clickhouse-backup/tree/master/ReadMe.md

GET /backup/status
Display list of currently running asynchronous operations: curl -s localhost:7171/backup/status | jq .

When it return empty list, it means no one operation is currently running.

Check GET /backup/actions for status and list of all commands which run via API after clickhouse-backup server started

@Slach Slach closed this as completed Apr 8, 2024
@Slach
Copy link
Collaborator

Slach commented Apr 8, 2024

Also, the upload to the s3 was not successful.

Do you have logs?
how did you start clickhouse-backup server ? is this standalone or docker or kubernetes?

@Slach Slach reopened this Apr 8, 2024
@frankwg
Copy link
Author

frankwg commented Apr 8, 2024

I was using the clickhouse-operator and a minio deployment from the README.md

apiVersion: v1
kind: Secret
metadata:
  name: clickhouse-backup-config
stringData:
  config.yml: |
    general:
      remote_storage: s3
      log_level: debug
      restore_schema_on_cluster: "{cluster}"
      allow_empty_backups: true
      backups_to_keep_remote: 3
    clickhouse:
      use_embedded_backup_restore: true
      embedded_backup_disk: backups
      timeout: 4h
      skip_table_engines:
        - GenerateRandom
    api:
      listen: "0.0.0.0:7171"
      create_integration_tables: true
    s3:
      acl: private
      endpoint: http://s3-backup-minio:9000
      bucket: clickhouse
      path: backup/shard-{shard}
      access_key: backup-access-key
      secret_key: backup-secret-key
      force_path_style: true
      disable_ssl: true
      debug: true

---
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: one-sidecar-embedded
spec:
  defaults:
    templates:
      podTemplate: clickhouse-backup
      dataVolumeClaimTemplate: data-volume
  configuration:
    profiles:
      default/distributed_ddl_task_timeout: 14400
    files:
      config.d/backup_disk.xml: |
        <clickhouse>
          <storage_configuration>
            <disks>
              <backups>
                <type>local</type>
                <path>/var/lib/clickhouse/backups/</path>
              </backups>
            </disks>
          </storage_configuration>
          <backups>
            <allowed_disk>backups</allowed_disk>
            <allowed_path>backups/</allowed_path>
          </backups>
        </clickhouse>     
    settings:
      # to allow scrape metrics via embedded prometheus protocol
      prometheus/endpoint: /metrics
      prometheus/port: 8888
      prometheus/metrics: true
      prometheus/events: true
      prometheus/asynchronous_metrics: true
      # need install zookeeper separately, look to https://github.com/Altinity/clickhouse-operator/tree/master/deploy/zookeeper/ for details
    zookeeper:
      nodes:
        - host: zookeeper
          port: 2181
      session_timeout_ms: 5000
      operation_timeout_ms: 5000
    clusters:
      - name: default
        layout:
          # 2 shards one replica in each
          shardsCount: 2
          replicas:
            - templates:
                podTemplate: pod-with-backup
            - templates:
                podTemplate: pod-clickhouse-only
  templates:
    volumeClaimTemplates:
      - name: data-volume
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi
    podTemplates:
      - name: pod-with-backup
        metadata:
          annotations:
            prometheus.io/scrape: 'true'
            prometheus.io/port: '8888'
            prometheus.io/path: '/metrics'
            # need separate prometheus scrape config, look to https://github.com/prometheus/prometheus/issues/3756
            clickhouse.backup/scrape: 'true'
            clickhouse.backup/port: '7171'
            clickhouse.backup/path: '/metrics'
        spec:
          securityContext:
            runAsUser: 101
            runAsGroup: 101
            fsGroup: 101
          containers:
            - name: clickhouse-pod
              image: clickhouse/clickhouse-server
              command:
                - clickhouse-server
                - --config-file=/etc/clickhouse-server/config.xml
            - name: clickhouse-backup
              image: clickhouse-backup:build-docker
              # image: altinity/clickhouse-backup:master
              imagePullPolicy: IfNotPresent
              command:
                # - bash
                # - -xc
                # - "/bin/clickhouse-backup server"
                - "/src/build/linux/amd64/clickhouse-backup"
                - "server"
                # require to avoid double scraping clickhouse and clickhouse-backup containers
              ports:
                - name: backup-rest
                  containerPort: 7171
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/clickhouse-backup/config.yml
                  subPath: config.yml
          volumes:
            - name: config-volume
              secret:
                secretName: clickhouse-backup-config
      - name: pod-clickhouse-only
        metadata:
          annotations:
            prometheus.io/scrape: 'true'
            prometheus.io/port: '8888'
            prometheus.io/path: '/metrics'
        spec:
          securityContext:
            runAsUser: 101
            runAsGroup: 101
            fsGroup: 101
          containers:
            - name: clickhouse-pod
              image: clickhouse/clickhouse-server
              command:
                - clickhouse-server
                - --config-file=/etc/clickhouse-server/config.xml

@frankwg
Copy link
Author

frankwg commented Apr 8, 2024

logs from clickhouse-backup

2024/04/08 08:13:32.460629 debug calculate parts list `default`.`join` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024.04.08 08:13:32.456847 [ 10 ] {699cffe9-4cd2-4465-9483-8c090e67dd38} <Debug> executeQuery: (from 127.0.0.1:53702) SELECT sum(total_bytes) AS backup_data_size FROM system.tables WHERE concat(database,'.',name) IN ('default.join', 'default.table_for_dict', 'default.set', 'default.merge', 'default.memory', 'default.stripelog', 'default.log', 'default.tinylog', 'default.buffer', 'default.ndict', 'default.null', 'default.dict', 'default.distributed', 'default.generate_random') (stage: Complete)
2024/04/08 08:13:32.461204 debug /var/lib/clickhouse/backups/shard0/metadata/default/join.json created logger=backuper
2024/04/08 08:13:32.461226 debug calculate parts list `default`.`table_for_dict` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.461462 debug /var/lib/clickhouse/backups/shard0/metadata/default/table_for_dict.json created logger=backuper
2024/04/08 08:13:32.461482 debug calculate parts list `default`.`set` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.461629 debug /var/lib/clickhouse/backups/shard0/metadata/default/set.json created logger=backuper
2024/04/08 08:13:32.461639 debug calculate parts list `default`.`merge` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.461781 debug /var/lib/clickhouse/backups/shard0/metadata/default/merge.json created logger=backuper
2024/04/08 08:13:32.461798 debug calculate parts list `default`.`memory` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.461953 debug /var/lib/clickhouse/backups/shard0/metadata/default/memory.json created logger=backuper
2024/04/08 08:13:32.461968 debug calculate parts list `default`.`stripelog` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.462110 debug /var/lib/clickhouse/backups/shard0/metadata/default/stripelog.json created logger=backuper
2024/04/08 08:13:32.462132 debug calculate parts list `default`.`log` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.462393 debug /var/lib/clickhouse/backups/shard0/metadata/default/log.json created logger=backuper
2024/04/08 08:13:32.462411 debug calculate parts list `default`.`tinylog` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.462560 debug /var/lib/clickhouse/backups/shard0/metadata/default/tinylog.json created logger=backuper
2024/04/08 08:13:32.462577 debug calculate parts list `default`.`buffer` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.462777 debug /var/lib/clickhouse/backups/shard0/metadata/default/buffer.json created logger=backuper
2024/04/08 08:13:32.462795 debug calculate parts list `default`.`ndict` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.462952 debug /var/lib/clickhouse/backups/shard0/metadata/default/ndict.json created logger=backuper
2024/04/08 08:13:32.462969 debug calculate parts list `default`.`null` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.463084 debug /var/lib/clickhouse/backups/shard0/metadata/default/null.json created logger=backuper
2024/04/08 08:13:32.463103 debug calculate parts list `default`.`dict` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.463209 debug /var/lib/clickhouse/backups/shard0/metadata/default/dict.json created logger=backuper
2024/04/08 08:13:32.463225 debug calculate parts list `default`.`distributed` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.463332 debug /var/lib/clickhouse/backups/shard0/metadata/default/distributed.json created logger=backuper
2024/04/08 08:13:32.463347 debug calculate parts list `default`.`generate_random` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.463453 debug /var/lib/clickhouse/backups/shard0/metadata/default/generate_random.json created logger=backuper
2024/04/08 08:13:32.463478  info SELECT value FROM `system`.`build_options` WHERE name='VERSION_DESCRIBE' logger=clickhouse
2024/04/08 08:13:32.465587 debug /var/lib/clickhouse/backups/shard0/metadata.json created logger=backuper
2024/04/08 08:13:32.465614  info done                      backup=shard0 duration=183ms logger=backuper operation=create_embedded
2024/04/08 08:13:32.465666  info clickhouse connection closed logger=clickhouse
2024/04/08 08:13:32.465752  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:32.466832  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:32.466868 error `general->remote_storage: s3` `clickhouse->use_embedded_backup_restore: true` require s3->compression_format: none, actual tar logger=validateUploadParams

@Slach
Copy link
Collaborator

Slach commented Apr 8, 2024

Ok. configuration looks correct,
could you share logs with upload failuers?

kubectl logs -n <your-namespace> pod/chi-one-sidecar-embedded-default-0-0-0 -c clickhouse-backup --since=24h

one suggestion

        <disks>
          <backups>
            <type>local</type>
            <path>/var/lib/clickhouse/backups/</path>
          </backups>
        </disks>

have sense only for standalone hardware servers where /var/lib/clickhouse/backups/ mounted as separate HDD disk for example

in kubernetes better use s3
look examples in
https://github.com/Altinity/clickhouse-backup/blob/master/test/integration/config-s3-embedded.yml#L23-L32
and

https://github.com/Altinity/clickhouse-backup/blob/master/test/integration/dynamic_settings.sh#L214-L238

@frankwg
Copy link
Author

frankwg commented Apr 8, 2024

kubectl logs -n backup pod/chi-one-sidecar-embedded-default-0-0-0 -c clickhouse-backup --since=24h
2024/04/08 08:13:33.799590  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:33.803256  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:33.803397  info Create integration tables logger=server
2024/04/08 08:13:33.803439  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:33.804733  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:33.804776  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2024/04/08 08:13:33.810063  info SELECT countIf(name='type') AS is_disk_type_present, countIf(name='object_storage_type') AS is_object_storage_type_present, countIf(name='free_space') AS is_free_space_present, countIf(name='disks') AS is_storage_policy_present FROM system.columns WHERE database='system' AND table IN ('disks','storage_policies')  logger=clickhouse
2024/04/08 08:13:33.816762  info SELECT d.path, any(d.name) AS name, any(lower(if(d.type='ObjectStorage',d.object_storage_type,d.type))) AS type, min(d.free_space) AS free_space, groupUniqArray(s.policy_name) AS storage_policies FROM system.disks AS d  LEFT JOIN (SELECT policy_name, arrayJoin(disks) AS disk FROM system.storage_policies) AS s ON s.disk = d.name GROUP BY d.path logger=clickhouse
2024/04/08 08:13:33.823185  info SELECT engine FROM system.databases WHERE name = 'system' logger=clickhouse
2024/04/08 08:13:33.827989  info DROP TABLE IF EXISTS `system`.`backup_actions` NO DELAY logger=clickhouse
2024/04/08 08:13:33.829256  info CREATE TABLE system.backup_actions (command String, start DateTime, finish DateTime, status String, error String) ENGINE=URL('http://127.0.0.1:7171/backup/actions', JSONEachRow) SETTINGS input_format_skip_unknown_fields=1 logger=clickhouse
2024/04/08 08:13:33.836864  info SELECT engine FROM system.databases WHERE name = 'system' logger=clickhouse
2024/04/08 08:13:33.841018  info DROP TABLE IF EXISTS `system`.`backup_list` NO DELAY logger=clickhouse
2024/04/08 08:13:33.842460  info CREATE TABLE system.backup_list (name String, created DateTime, size Int64, location String, required String, desc String) ENGINE=URL('http://127.0.0.1:7171/backup/list', JSONEachRow) SETTINGS input_format_skip_unknown_fields=1 logger=clickhouse
2024/04/08 08:13:33.849179  info clickhouse connection closed logger=clickhouse
2024/04/08 08:13:33.849778  info Starting API server on 0.0.0.0:7171 logger=server.Run
2024/04/08 08:13:33.852641  info Update backup metrics start (onlyLocal=false) logger=server
2024/04/08 08:13:33.852713  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:33.852800  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:33.854251  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:33.854283  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2024/04/08 08:13:33.854455  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:33.854483  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2024/04/08 08:13:33.856983  info SELECT countIf(name='type') AS is_disk_type_present, countIf(name='object_storage_type') AS is_object_storage_type_present, countIf(name='free_space') AS is_free_space_present, countIf(name='disks') AS is_storage_policy_present FROM system.columns WHERE database='system' AND table IN ('disks','storage_policies')  logger=clickhouse
2024/04/08 08:13:33.857263  info SELECT countIf(name='type') AS is_disk_type_present, countIf(name='object_storage_type') AS is_object_storage_type_present, countIf(name='free_space') AS is_free_space_present, countIf(name='disks') AS is_storage_policy_present FROM system.columns WHERE database='system' AND table IN ('disks','storage_policies')  logger=clickhouse
2024/04/08 08:13:33.869224  info SELECT d.path, any(d.name) AS name, any(lower(if(d.type='ObjectStorage',d.object_storage_type,d.type))) AS type, min(d.free_space) AS free_space, groupUniqArray(s.policy_name) AS storage_policies FROM system.disks AS d  LEFT JOIN (SELECT policy_name, arrayJoin(disks) AS disk FROM system.storage_policies) AS s ON s.disk = d.name GROUP BY d.path logger=clickhouse
2024/04/08 08:13:33.878475 error ResumeOperationsAfterRestart return error: open /var/lib/clickhouse/backup: no such file or directory logger=server.Run
2024/04/08 08:13:33.880908  info SELECT d.path, any(d.name) AS name, any(lower(if(d.type='ObjectStorage',d.object_storage_type,d.type))) AS type, min(d.free_space) AS free_space, groupUniqArray(s.policy_name) AS storage_policies FROM system.disks AS d  LEFT JOIN (SELECT policy_name, arrayJoin(disks) AS disk FROM system.storage_policies) AS s ON s.disk = d.name GROUP BY d.path logger=clickhouse
2024/04/08 08:13:33.886506  info clickhouse connection closed logger=clickhouse
2024/04/08 08:13:33.886537  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:33.889603  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:33.889652  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/04/08 08:13:33.893202  info SELECT macro, substitution FROM system.macros logger=clickhouse
2024/04/08 08:13:33.895191  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/04/08 08:13:33.898985  info SELECT macro, substitution FROM system.macros logger=clickhouse
2024/04/08 08:13:33.901912  info [s3:DEBUG] Request
GET /clickhouse?versioning= HTTP/1.1
Host: s3-backup-minio:9000
User-Agent: aws-sdk-go-v2/1.26.1 os/linux lang/go#1.22.1 md/GOOS#linux md/GOARCH#amd64 api/s3#1.53.1
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: f4e5a6fd-30db-45f1-ad62-5da57bdcf3ed
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=backup-access-key/20240408/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date, Signature=ae5b666f1e1b560bc36e259126e21e8a28555fdd75d99dcd234b753eb902eff9
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240408T081333Z


2024/04/08 08:13:33.904818  info [s3:DEBUG] Response
HTTP/1.1 200 OK
Content-Length: 99
Accept-Ranges: bytes
Content-Type: application/xml
Date: Mon, 08 Apr 2024 08:13:33 GMT
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
X-Amz-Request-Id: 17C43FE5A7DC2AD8
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block


2024/04/08 08:13:33.905007 debug /tmp/.clickhouse-backup-metadata.cache.S3 not found, load 0 elements logger=s3
2024/04/08 08:13:33.905839  info [s3:DEBUG] Request
GET /clickhouse?delimiter=%2F&list-type=2&max-keys=1000&prefix=backup%2Fshard-0%2F HTTP/1.1
Host: s3-backup-minio:9000
User-Agent: aws-sdk-go-v2/1.26.1 os/linux lang/go#1.22.1 md/GOOS#linux md/GOARCH#amd64 api/s3#1.53.1
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: 1225429f-7ccc-4a8c-aad4-32d681253c34
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=backup-access-key/20240408/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date, Signature=418ee0c64337f86817a6c3eb53ca41834c2e47f36bd8823fa4a598fdbd242913
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240408T081333Z


2024/04/08 08:13:33.907610  info [s3:DEBUG] Response
HTTP/1.1 200 OK
Content-Length: 280
Accept-Ranges: bytes
Content-Type: application/xml
Date: Mon, 08 Apr 2024 08:13:33 GMT
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
X-Amz-Request-Id: 17C43FE5A800167C
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block


2024/04/08 08:13:33.908370 debug /tmp/.clickhouse-backup-metadata.cache.S3 save 0 elements logger=s3
2024/04/08 08:13:33.908473  info clickhouse connection closed logger=clickhouse
2024/04/08 08:13:33.908509  info Update backup metrics finish LastBackupCreateLocal=2024-04-08 08:13:32.463475027 +0000 UTC LastBackupCreateRemote=<nil> LastBackupSizeLocal=13636 LastBackupSizeRemote=0 LastBackupUpload=<nil> NumberBackupsLocal=1 NumberBackupsRemote=0 duration=56ms logger=server

@Slach
Copy link
Collaborator

Slach commented Apr 8, 2024

Root reason in the logs

error general->remote_storage: s3 clickhouse->use_embedded_backup_restore: true require s3->compression_format: none, actual tar logger=validateUploadParams

just add

s3:
  compression_format: none

into your secret

@Slach Slach closed this as completed Apr 8, 2024
@frankwg
Copy link
Author

frankwg commented Apr 9, 2024

Would you mind adding the following instead of an empty string as the response?

{
  "command": "create_remote <backup_name>",
  "status": "error",
  "start": "2024-03-26 08:15:42",
  "finish": "2024-03-26 08:17:12",
  "error": "`general->remote_storage: s3` `clickhouse->use_embedded_backup_restore: true` require s3->compression_format: none"
}

@Slach Slach reopened this Apr 9, 2024
@Slach Slach changed the title /backup/status returns empty while previous requests are /backup/actions with {"command", "create_remote <backup_name>"} or /backup/upload /backup/status should return latest command status if no one in progress commands executed Apr 9, 2024
@Slach Slach self-assigned this Apr 9, 2024
@Slach Slach added this to the 2.6.0 milestone Apr 9, 2024
@Slach
Copy link
Collaborator

Slach commented Apr 9, 2024

@frankwg good suggestion, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants