Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loadbalancing not working as expected in cross cluster scenario #34026

Open
lkalaivanan opened this issue May 8, 2024 · 2 comments
Open

Loadbalancing not working as expected in cross cluster scenario #34026

lkalaivanan opened this issue May 8, 2024 · 2 comments

Comments

@lkalaivanan
Copy link

lkalaivanan commented May 8, 2024

In envoyproxy corresponding to istio 1.18 with following configuration it worked fine when the service in one of the cluster brought down. However with the same configuration, in envoyproxy corresponding to istio 1.20 the load balancing in not happening, causing requests to be consistently directed to a local service that's currently down.

W.R.T the config if the "catalog.default" is down, the request should get routed to the "istio-egressgateway.istio-system".

Dynamic Endpoint Config :

 "endpoint_config": {
      "@type": "type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",
      "cluster_name": "outbound|8082||catalog.acme.com",
      "endpoints": [
       {
        "locality": {},
        "lb_endpoints": [
         {
          "endpoint": {
           "address": {
            "socket_address": {
             "address": "10.30.92.15",
             "port_value": 8082
            }
           },
           "health_check_config": {},
           "hostname": "catalog.default"
          },
          "health_status": "HEALTHY",
          "metadata": {
           "filter_metadata": {
            "istio": {
             "workload": ";;;;Kubernetes"
            }
           }
          },
          "load_balancing_weight": 4294967293
         },
         {
          "endpoint": {
           "address": {
            "socket_address": {
             "address": "10.30.98.6",
             "port_value": 443
            }
           },
           "health_check_config": {},
           "hostname": "istio-egressgateway.istio-system"
          },
          "health_status": "HEALTHY",
          "metadata": {
           "filter_metadata": {
            "istio": {
             "workload": ";;;;Kubernetes"
            }
           }
          },
          "load_balancing_weight": 1
         }
        ],
        "load_balancing_weight": 4294967294 
       }
      ],
      "policy": {
       "overprovisioning_factor": 140
      }
     } 

Cluster Loadbalancing Config:

 "load_assignment": {
       "cluster_name": "outbound|8082||catalog.acme.com",
       "endpoints": [
        {
         "locality": {},
         "lb_endpoints": [
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "catalog.default",
              "port_value": 8082
             }
            }
           },
           "metadata": {
            "filter_metadata": {
             "istio": {
              "workload": ";;;;Kubernetes"
             }
            }
           },
           "load_balancing_weight": 4294967293
          },
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "istio-egressgateway.istio-system",
              "port_value": 443
             }
            }
           },
           "metadata": {
            "filter_metadata": {
             "istio": {
              "workload": ";;;;Kubernetes"
             }
            }
           },
           "load_balancing_weight": 1
          }
         ],
         "load_balancing_weight": 4294967294
        }
       ]
      }
@lkalaivanan lkalaivanan added bug triage Issue requires triage labels May 8, 2024
@phlax
Copy link
Member

phlax commented May 8, 2024

cc @wbpcode @tonya11en @nezdolik

@phlax phlax added area/load balancing and removed triage Issue requires triage labels May 8, 2024
@tonya11en
Copy link
Member

if the "catalog.default" is down

Some questions:

  • Can you be more specific about what "down" means?
  • How are you creating the conditions for this behavior?
  • How are you measuring this behavior?

Also, it would be helpful to provide:

  1. Output from the proxy's /stats endpoint both before and after reproducing the issue.
  2. Logs. Ideally, you'd run envoy with this minimal config to reproduce the issue via something like:
envoy -c repro_config.yaml -l off --component-log-level upstream:trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants