ClickHouse Operator leaves orphan S3 files when scaling down replicas that use S3-backed MergeTree #1388

hodgesrm · 2024-04-07T21:32:54Z

When scaling down replicas clickhouse-operator does not ensure that S3 files that back MergeTree tables are fully deleted. This results in orphan files in the S3 bucket. This behavior was tested using ClickHouse 24.3.2.3 and clickhouse-operator 0.23.3.

Here's how to reproduce in general, followed by a detailed scenario.

Create a ClickHouse cluster with two replicas (replicaCount=2) with a storage policy that allows data to be stored on S3.
Run DDL to create a replicated table that uses S3 storage.
Add data to the table.
Confirm that data is stored in S3.
Change the replicaCount to 1 and update the CHI resource definition.
Drop the replicated table on the remaining replica.
Check data in the S3 bucket. You will see orphan files.

To reproduce in detail use the examples in https://github.com/Altinity/clickhouse-sql-examples/tree/main/using-s3-and-clickhouse. Here is a detailed script.

# Grab sample code. 
git clone https://github.com/Altinity/clickhouse-sql-examples
cd clickhouse-sql-examples/using-s3-and-clickhouse
# Generate S3 credentials in a secret. (See script header for instructions.)
./generate-s3-secret.sh
# Create the cluster. 
kubectl apply -f demo2-s3-01.yaml
# Wait for both pods to come up, then run the following commands. 
./port-forward-2.sh
alias cc-batch='clickhouse-client -m -n --verbose -t --echo -f Pretty'
cc-batch < sql-11-create-s3-tables.sql
cc-batch < sql-12-insert-data.sql
cc-batch < sql-03-statistics.sql
# Check the data in S3 using a command like the following. Note the number of objects. 
# Run this command until the number of S3 files stops growing. The sample inserts via a distributed table. 
# In my sample runs I get 3392 file and 4.3 GiB data stored in S3. 
aws s3 ls --recursive --human-readable --summarize s3://<bucket>/clickhouse/mergetree/
# Scale down the replicaCount from 2 to 1 and apply. 
kubectl edit chi demo2
# Check the data in S3 again. It should not have changed.  
aws s3 ls --recursive --human-readable --summarize s3://<bucket>/clickhouse/mergetree/

You can now prove that S3 files are orphaned and see which ones they are. One way is as follows.

On the remaining ClickHouse server run truncate table test_s3_direct_local;.
Check the S3 files. About half of them remain. In my sample runs there were 1707 files and 2.1 GiB of data remaining.

The text was updated successfully, but these errors were encountered:

hodgesrm · 2024-04-07T22:10:09Z

It appears that one workaround for this problem is to drop tables explicitly before decommissioning the replica. For example, you can login to the departing replica and issue the following command:

DROP TABLE test_s3_direct_local SYNC

It's unclear whether SYNC helps fully because it's not documented in the official docs but the Altinity KB indicates that it drops table data synchronously. Anyway, when I run this command before scaling down the S3 files are properly removed.

hodgesrm · 2024-04-07T22:11:06Z

Final notes:

The reproduction described above did not use zero-copy replication.
This issue also extends to ordinary MergeTree files. It appears the operator only deletes ReplicatedMergeTree tables, replicated databases, views, or dictionaries. See https://github.com/Altinity/clickhouse-operator/blob/master/pkg/model/chi/schemer/sql.go#L31 for details.

alex-zaitsev mentioned this issue Apr 24, 2024

Deleting a CHI resource may leave debris from replicated tables in [Zoo]Keeper that requires later cleanup #1387

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClickHouse Operator leaves orphan S3 files when scaling down replicas that use S3-backed MergeTree #1388

ClickHouse Operator leaves orphan S3 files when scaling down replicas that use S3-backed MergeTree #1388

hodgesrm commented Apr 7, 2024

hodgesrm commented Apr 7, 2024

hodgesrm commented Apr 7, 2024 •

edited

ClickHouse Operator leaves orphan S3 files when scaling down replicas that use S3-backed MergeTree #1388

ClickHouse Operator leaves orphan S3 files when scaling down replicas that use S3-backed MergeTree #1388

Comments

hodgesrm commented Apr 7, 2024

hodgesrm commented Apr 7, 2024

hodgesrm commented Apr 7, 2024 • edited

hodgesrm commented Apr 7, 2024 •

edited