Loadbalancing not working as expected in cross cluster scenario #34026

lkalaivanan · 2024-05-08T10:01:09Z

In envoyproxy corresponding to istio 1.18 with following configuration it worked fine when the service in one of the cluster brought down. However with the same configuration, in envoyproxy corresponding to istio 1.20 the load balancing in not happening, causing requests to be consistently directed to a local service that's currently down.

W.R.T the config if the "catalog.default" is down, the request should get routed to the "istio-egressgateway.istio-system".

Dynamic Endpoint Config :

 "endpoint_config": {
      "@type": "type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",
      "cluster_name": "outbound|8082||catalog.acme.com",
      "endpoints": [
       {
        "locality": {},
        "lb_endpoints": [
         {
          "endpoint": {
           "address": {
            "socket_address": {
             "address": "10.30.92.15",
             "port_value": 8082
            }
           },
           "health_check_config": {},
           "hostname": "catalog.default"
          },
          "health_status": "HEALTHY",
          "metadata": {
           "filter_metadata": {
            "istio": {
             "workload": ";;;;Kubernetes"
            }
           }
          },
          "load_balancing_weight": 4294967293
         },
         {
          "endpoint": {
           "address": {
            "socket_address": {
             "address": "10.30.98.6",
             "port_value": 443
            }
           },
           "health_check_config": {},
           "hostname": "istio-egressgateway.istio-system"
          },
          "health_status": "HEALTHY",
          "metadata": {
           "filter_metadata": {
            "istio": {
             "workload": ";;;;Kubernetes"
            }
           }
          },
          "load_balancing_weight": 1
         }
        ],
        "load_balancing_weight": 4294967294 
       }
      ],
      "policy": {
       "overprovisioning_factor": 140
      }
     }

Cluster Loadbalancing Config:

 "load_assignment": {
       "cluster_name": "outbound|8082||catalog.acme.com",
       "endpoints": [
        {
         "locality": {},
         "lb_endpoints": [
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "catalog.default",
              "port_value": 8082
             }
            }
           },
           "metadata": {
            "filter_metadata": {
             "istio": {
              "workload": ";;;;Kubernetes"
             }
            }
           },
           "load_balancing_weight": 4294967293
          },
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "istio-egressgateway.istio-system",
              "port_value": 443
             }
            }
           },
           "metadata": {
            "filter_metadata": {
             "istio": {
              "workload": ";;;;Kubernetes"
             }
            }
           },
           "load_balancing_weight": 1
          }
         ],
         "load_balancing_weight": 4294967294
        }
       ]
      }

phlax · 2024-05-08T15:44:02Z

cc @wbpcode @tonya11en @nezdolik

tonya11en · 2024-05-08T18:50:02Z

if the "catalog.default" is down

Some questions:

Can you be more specific about what "down" means?
How are you creating the conditions for this behavior?
How are you measuring this behavior?

Also, it would be helpful to provide:

Output from the proxy's /stats endpoint both before and after reproducing the issue.
Logs. Ideally, you'd run envoy with this minimal config to reproduce the issue via something like:

envoy -c repro_config.yaml -l off --component-log-level upstream:trace

lkalaivanan added bug triage Issue requires triage labels May 8, 2024

phlax added area/load balancing and removed triage Issue requires triage labels May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loadbalancing not working as expected in cross cluster scenario #34026

Loadbalancing not working as expected in cross cluster scenario #34026

lkalaivanan commented May 8, 2024 •

edited by phlax

phlax commented May 8, 2024

tonya11en commented May 8, 2024

Loadbalancing not working as expected in cross cluster scenario #34026

Loadbalancing not working as expected in cross cluster scenario #34026

Comments

lkalaivanan commented May 8, 2024 • edited by phlax

phlax commented May 8, 2024

tonya11en commented May 8, 2024

lkalaivanan commented May 8, 2024 •

edited by phlax