Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] redis cluster hscale out shards post-provision pod Error after benchmark #7203

Open
JashBook opened this issue Apr 29, 2024 · 1 comment
Assignees
Labels
kind/bug Something isn't working Stale
Milestone

Comments

@JashBook
Copy link
Collaborator

JashBook commented Apr 29, 2024

Describe the bug
Instability reappears in minikube.

To Reproduce
Steps to reproduce the behavior:

  1. create cluster
kubectl apply -f -<<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: redisc-ozklak
  namespace: default
spec:
  terminationPolicy: DoNotTerminate
  shardingSpecs:
    - name: shard
      shards: 3
      template:
        name: shard-cxk
        componentDef: redis-cluster
        replicas: 1
        switchPolicy:
          type: Noop
        resources:
          limits:
            cpu: 100m
            memory: 0.5Gi
          requests:
            cpu: 100m
            memory: 0.5Gi
        volumeClaimTemplates:
          - name: data
            spec:
              accessModes:
                - ReadWriteOnce
              resources:
                 requests:
                  storage: 1Gi
EOF
  1. redis benchmark
kubectl apply -f -<<EOF
apiVersion: v1
kind: Pod
metadata:
  name: benchtest-redisc-ozklak
  namespace: default
spec:
  containers:
    - name: test-benchmark
      imagePullPolicy: IfNotPresent
      image: docker.io/apecloud/redis-benchmark:latest
      args:
        - "-h"
        - "redisc-ozklak-shard-vvn-0.redisc-ozklak-shard-vvn-headless.default.svc"
        - "-p"
        - "6379"
        - "-a"
        - "O3605v7HsS"
        - "-n"
        - "1000"
        - "-c"
        - "2"
        - "--cluster"
        - "-q"
  restartPolicy: Never
EOF
  1. hscale out shards to 4
kubectl patch cluster redisc-ozklak --namespace default --type json \
  -p '[{"op": "replace", "path": "/spec/shardingSpecs/0/shards", "value": '4'}]'
  1. See error
kubectl get pod 
NAME                                                  READY   STATUS    RESTARTS   AGE
kb-post-provision-job-redisc-ozklak-shard-twh-pjjc8   0/1     Error     0          10m
kb-post-provision-job-redisc-ozklak-shard-twh-tj5bg   0/1     Error     0          9m57s
kb-post-provision-job-redisc-ozklak-shard-twh-vnnds   0/1     Error     0          9m40s
kb-post-provision-job-redisc-ozklak-shard-vvn-bjwm5   0/1     Error     0          9m42s
kb-post-provision-job-redisc-ozklak-shard-vvn-jgp6n   0/1     Error     0          10m
kb-post-provision-job-redisc-ozklak-shard-vvn-p5cdn   0/1     Error     0          10m
redisc-ozklak-shard-27h-0                             3/3     Running   0          8m18s
redisc-ozklak-shard-6s8-0                             3/3     Running   0          10m
redisc-ozklak-shard-twh-0                             3/3     Running   0          10m
redisc-ozklak-shard-vvn-0                             3/3     Running   0          10m

➜  ~ kubectl get cluster
NAME            CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS    AGE
redisc-ozklak 

logs error pod

kubectl logs kb-post-provision-job-redisc-ozklak-shard-twh-pjjc8
+ declare -gA initialize_redis_cluster_primary_nodes
+ declare -gA initialize_redis_cluster_secondary_nodes
+ declare -gA initialize_pod_name_to_advertise_host_port_map
+ declare -gA scale_out_shard_default_primary_node
+ declare -gA scale_out_shard_default_other_nodes
+ '[' 1 -eq 1 ']'
+ case $1 in
+ initialize_or_scale_out_redis_cluster
+ wait_random_second 10 1
+ local max_time=10
+ local min_time=1
+ local random_time=10
+ echo 'Sleeping for 10 seconds'
+ sleep 10
Sleeping for 10 seconds
+ is_redis_cluster_initialized
+ '[' -z 10.244.2.26,10.244.2.27,10.244.2.25 ']'
+ local initialized=false
++ echo 10.244.2.26,10.244.2.27,10.244.2.25
++ tr , ' '
+ for pod_ip in $(echo "$KB_CLUSTER_POD_IP_LIST" | tr ',' ' ')
++ redis-cli -h 10.244.2.26 -a O3605v7HsS cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_info cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
total_cluster_links_buffer_limit_exceeded:0
+ cluster_info='cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
'otal_cluster_links_buffer_limit_exceeded:0
+ echo 'cluster_info cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
'otal_cluster_links_buffer_limit_exceeded:0
++ echo 'cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
'otal_cluster_links_buffer_limit_exceeded:0
++ grep -oP '(?<=cluster_state:)[^\s]+'
+ cluster_state=fail
+ '[' -z fail ']'
+ '[' fail == ok ']'
+ for pod_ip in $(echo "$KB_CLUSTER_POD_IP_LIST" | tr ',' ' ')
++ redis-cli -h 10.244.2.27 -a O3605v7HsS cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
+ cluster_info='cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
'otal_cluster_links_buffer_limit_exceeded:0
+ echo 'cluster_info cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
'otal_cluster_links_buffer_limit_exceeded:0
cluster_info cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
total_cluster_links_buffer_limit_exceeded:0
++ echo 'cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
'otal_cluster_links_buffer_limit_exceeded:0
++ grep -oP '(?<=cluster_state:)[^\s]+'
+ cluster_state=fail
+ '[' -z fail ']'
+ '[' fail == ok ']'
+ for pod_ip in $(echo "$KB_CLUSTER_POD_IP_LIST" | tr ',' ' ')
++ redis-cli -h 10.244.2.25 -a O3605v7HsS cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
+ cluster_info='cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
'otal_cluster_links_buffer_limit_exceeded:0
+ echo 'cluster_info cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
'otal_cluster_links_buffer_limit_exceeded:0
cluster_info cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
total_cluster_links_buffer_limit_exceeded:0
++ echo 'cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:1
cluster_size:0
cluster_current_epoch:0
cluster_my_epoch:0
cluster_stats_messages_sent:0
cluster_stats_messages_received:0
'otal_cluster_links_buffer_limit_exceeded:0
++ grep -oP '(?<=cluster_state:)[^\s]+'
+ cluster_state=fail
+ '[' -z fail ']'
+ '[' fail == ok ']'
+ '[' false = true ']'
+ echo 'Redis Cluster not initialized, initializing...'
+ initialize_redis_cluster
+ gen_initialize_redis_cluster_primary_node
+ gen_initialize_redis_cluster_node true
+ local is_primary=true
+ '[' -z redisc-ozklak-shard-vvn-0,redisc-ozklak-shard-twh-0,redisc-ozklak-shard-6s8-0 ']'
+ local shard_name
+ local shard_advertised_infos
+ local shard_advertised_svc
+ local shard_advertised_port
+ local shard_advertised_svc_ordinal
+ local pod_host_ip
Redis Cluster not initialized, initializing...
++ echo redisc-ozklak-shard-vvn-0,redisc-ozklak-shard-twh-0,redisc-ozklak-shard-6s8-0
++ tr , ' '
+ for pod_name in $(echo "$KB_CLUSTER_POD_NAME_LIST" | tr ',' ' ')
++ extract_ordinal_from_object_name redisc-ozklak-shard-vvn-0
++ local object_name=redisc-ozklak-shard-vvn-0
++ local ordinal=0
++ echo 0
+ pod_name_ordinal=0
+ '[' true = true ']'
+ '[' 0 -ne 0 ']'
+ '[' true = false ']'
+ '[' -n '' ']'
+ local port=6379
++ extract_pod_name_prefix redisc-ozklak-shard-vvn-0
++ local pod_name=redisc-ozklak-shard-vvn-0
+++ echo redisc-ozklak-shard-vvn-0
+++ sed 's/-[0-9]\+$//'
++ prefix=redisc-ozklak-shard-vvn
++ echo redisc-ozklak-shard-vvn
+ pod_name_prefix=redisc-ozklak-shard-vvn
+ local pod_fqdn=redisc-ozklak-shard-vvn-0.redisc-ozklak-shard-vvn-headless
+ '[' true = true ']'
+ initialize_redis_cluster_primary_nodes["$pod_name"]=redisc-ozklak-shard-vvn-0.redisc-ozklak-shard-vvn-headless:6379
+ initialize_pod_name_to_advertise_host_port_map["$pod_name"]=redisc-ozklak-shard-vvn-0.redisc-ozklak-shard-vvn-headless:6379
+ for pod_name in $(echo "$KB_CLUSTER_POD_NAME_LIST" | tr ',' ' ')
++ extract_ordinal_from_object_name redisc-ozklak-shard-twh-0
++ local object_name=redisc-ozklak-shard-twh-0
++ local ordinal=0
++ echo 0
+ pod_name_ordinal=0
+ '[' true = true ']'
+ '[' 0 -ne 0 ']'
+ '[' true = false ']'
+ '[' -n '' ']'
+ local port=6379
++ extract_pod_name_prefix redisc-ozklak-shard-twh-0
++ local pod_name=redisc-ozklak-shard-twh-0
+++ echo redisc-ozklak-shard-twh-0
+++ sed 's/-[0-9]\+$//'
++ prefix=redisc-ozklak-shard-twh
++ echo redisc-ozklak-shard-twh
+ pod_name_prefix=redisc-ozklak-shard-twh
+ local pod_fqdn=redisc-ozklak-shard-twh-0.redisc-ozklak-shard-twh-headless
+ '[' true = true ']'
+ initialize_redis_cluster_primary_nodes["$pod_name"]=redisc-ozklak-shard-twh-0.redisc-ozklak-shard-twh-headless:6379
+ initialize_pod_name_to_advertise_host_port_map["$pod_name"]=redisc-ozklak-shard-twh-0.redisc-ozklak-shard-twh-headless:6379
+ for pod_name in $(echo "$KB_CLUSTER_POD_NAME_LIST" | tr ',' ' ')
++ extract_ordinal_from_object_name redisc-ozklak-shard-6s8-0
++ local object_name=redisc-ozklak-shard-6s8-0
++ local ordinal=0
++ echo 0
+ pod_name_ordinal=0
+ '[' true = true ']'
+ '[' 0 -ne 0 ']'
+ '[' true = false ']'
+ '[' -n '' ']'
+ local port=6379
++ extract_pod_name_prefix redisc-ozklak-shard-6s8-0
++ local pod_name=redisc-ozklak-shard-6s8-0
+++ echo redisc-ozklak-shard-6s8-0
+++ sed 's/-[0-9]\+$//'
++ prefix=redisc-ozklak-shard-6s8
++ echo redisc-ozklak-shard-6s8
initialize_command: redis-cli --cluster create redisc-ozklak-shard-6s8-0.redisc-ozklak-shard-6s8-headless:6379 redisc-ozklak-shard-twh-0.redisc-ozklak-shard-twh-headless:6379 redisc-ozklak-shard-vvn-0.redisc-ozklak-shard-vvn-headless:6379  -a O3605v7HsS --cluster-yes
+ pod_name_prefix=redisc-ozklak-shard-6s8
+ local pod_fqdn=redisc-ozklak-shard-6s8-0.redisc-ozklak-shard-6s8-headless
+ '[' true = true ']'
+ initialize_redis_cluster_primary_nodes["$pod_name"]=redisc-ozklak-shard-6s8-0.redisc-ozklak-shard-6s8-headless:6379
+ initialize_pod_name_to_advertise_host_port_map["$pod_name"]=redisc-ozklak-shard-6s8-0.redisc-ozklak-shard-6s8-headless:6379
+ '[' 3 -eq 0 ']'
+ primary_nodes=
+ for primary_pod_name in "${!initialize_redis_cluster_primary_nodes[@]}"
+ primary_nodes+='redisc-ozklak-shard-6s8-0.redisc-ozklak-shard-6s8-headless:6379 '
+ for primary_pod_name in "${!initialize_redis_cluster_primary_nodes[@]}"
+ primary_nodes+='redisc-ozklak-shard-twh-0.redisc-ozklak-shard-twh-headless:6379 '
+ for primary_pod_name in "${!initialize_redis_cluster_primary_nodes[@]}"
+ primary_nodes+='redisc-ozklak-shard-vvn-0.redisc-ozklak-shard-vvn-headless:6379 '
+ '[' -z O3605v7HsS ']'
+ initialize_command='redis-cli --cluster create redisc-ozklak-shard-6s8-0.redisc-ozklak-shard-6s8-headless:6379 redisc-ozklak-shard-twh-0.redisc-ozklak-shard-twh-headless:6379 redisc-ozklak-shard-vvn-0.redisc-ozklak-shard-vvn-headless:6379  -a O3605v7HsS --cluster-yes'
+ echo 'initialize_command: redis-cli --cluster create redisc-ozklak-shard-6s8-0.redisc-ozklak-shard-6s8-headless:6379 redisc-ozklak-shard-twh-0.redisc-ozklak-shard-twh-headless:6379 redisc-ozklak-shard-vvn-0.redisc-ozklak-shard-vvn-headless:6379  -a O3605v7HsS --cluster-yes'
+ redis-cli --cluster create redisc-ozklak-shard-6s8-0.redisc-ozklak-shard-6s8-headless:6379 redisc-ozklak-shard-twh-0.redisc-ozklak-shard-twh-headless:6379 redisc-ozklak-shard-vvn-0.redisc-ozklak-shard-vvn-headless:6379 -a O3605v7HsS --cluster-yes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Could not connect to Redis at redisc-ozklak-shard-twh-0.redisc-ozklak-shard-twh-headless:6379: Name or service not known
+ echo 'Failed to create Redis Cluster'
+ exit 1
Failed to create Redis Cluster

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]
kbcli version
Kubernetes: v1.26.3
KubeBlocks: 0.9.0-beta.15
kbcli: 0.9.0-beta.4

Additional context
Add any other context about the problem here.

@JashBook JashBook added kind/bug Something isn't working severity/major Great chance user will encounter the same problem labels Apr 29, 2024
@JashBook JashBook added this to the Release 0.9.0 milestone Apr 29, 2024
@JashBook JashBook removed the severity/major Great chance user will encounter the same problem label Apr 30, 2024
Copy link

github-actions bot commented Jun 3, 2024

This issue has been marked as stale because it has been open for 30 days with no activity

@github-actions github-actions bot added the Stale label Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

2 participants