Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gRPC health check #3040

Open
flands opened this issue Apr 28, 2021 · 15 comments
Open

Add gRPC health check #3040

flands opened this issue Apr 28, 2021 · 15 comments
Labels
collector-telemetry healthchecker and other telemetry collection issues help wanted Good issue for contributors to OpenTelemetry Service to pick up

Comments

@flands
Copy link
Contributor

flands commented Apr 28, 2021

Is your feature request related to a problem? Please describe.
If you want to deploy the OpenTelemetry Collector in gateway-mode behind an AWS Application Load Balancer (ALB) to support gRPC (e.g. OTLP) then ALB requires a gRPC health check. Today, we have a health check extension, however it only support HTTP.

Describe the solution you'd like
Either extend the health check extension or add health check endpoints to gRPC receivers.

Describe alternatives you've considered
Switch from gRPC to HTTP for receiving data

@tigrannajaryan tigrannajaryan added the help wanted Good issue for contributors to OpenTelemetry Service to pick up label Apr 28, 2021
@rakyll
Copy link
Contributor

rakyll commented Apr 29, 2021

Might be related to #3002.

@ligeogeorge
Copy link

+1

Unable to set up ALB->(gRPC)->ECS because there is no support for gRPC health checks as is required by AWS ALB. This would be a common use case, IMO.

#3002 is only talking about enhancing HTTP health checks.

@obitech
Copy link

obitech commented Dec 30, 2021

You can specify the allowed return codes on the ALB, as a workaround just set them to 0-99 and leave the health check path as default.

@g3kr
Copy link

g3kr commented Apr 22, 2022

@flands @obitech can you share your collector config? I am running into problems using the ALB as the otlp exporter endpoint. or could you throw some light on #5246 ? Much appreciated. Thank you

@jangaraj
Copy link
Contributor

I use this config for ALB for OTLP/GRPC, which works like a charm:

 TargetGroupGRPC:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      VpcId: !Ref VpcId
      Port: 4317
      Protocol: HTTP
      ProtocolVersion: GRPC
      HealthCheckIntervalSeconds: 5
      HealthCheckPath: /
      HealthCheckProtocol: HTTP
      Matcher:
        GrpcCode: 0-99
      HealthCheckTimeoutSeconds: 4
      HealthyThresholdCount: 2
      UnhealthyThresholdCount: 2
      TargetType: ip

Unfortunately, I have a problem with OTLP/HTTP. It works fine with JSON payload, but it has problem with protobuf payloads sometimes (SSL handshake timeout, client headers timeout):

 TargetGroupHTTP:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      VpcId: !Ref VpcId
      Port: 4318
      Protocol: HTTP
      HealthCheckIntervalSeconds: 5
      HealthCheckPath: /
      HealthCheckProtocol: HTTP
      HealthCheckPort: 13133
      HealthCheckTimeoutSeconds: 4
      HealthyThresholdCount: 2
      UnhealthyThresholdCount: 2
      TargetType: ip

Any idea how to configure AWS ALB to work with OTLP HTTP Protobuf?

@pkalvagit
Copy link

@jangaraj , i am trying to setup openetelemetry image in ecs faregate with elb. But it i failing for grpc health check. do you have reference blog or repo to refer? Because you are able to setup grpc communication checking with you

@jangaraj
Copy link
Contributor

jangaraj commented Dec 8, 2022

ELB doesn't support GRPC, only ALB.

@pkalvagit
Copy link

pkalvagit commented Dec 9, 2022

@jangaraj , Load balancer type: Application. Could you please review below template and suggest any changes? Any suggestions highly appreciated.
PublicLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Scheme: internet-facing
Type: application
Name: !Sub ${Environment}-otel-gw-elb
Subnets:
- !Ref PublicELBSubnetAZ1
- !Ref PublicELBSubnetAZ2
- !Ref PublicELBSubnetAZ3
LoadBalancerAttributes:
- Key: idle_timeout.timeout_seconds
Value: 300
SecurityGroups:
- !Ref PublicLoadBalancerSecurityGroup
Tags:
- Key: Name
Value: !Sub ${Environment}-otel-gw-elb
- Key: Service
Value: monocle-otel-gateway-collector
PublicGrpcListener:
Type: AWS::ElasticLoadBalancingV2::Listener
Properties:
Certificates:
- CertificateArn: !Ref CertificateArn
DefaultActions:
- Type: forward
TargetGroupArn: !Ref ApiServiceGrpcTargetGroup
LoadBalancerArn: !Ref PublicLoadBalancer
Port: 443
Protocol: HTTPS
ApiGrpcListenerRule:
Type: AWS::ElasticLoadBalancingV2::ListenerRule
Properties:
Actions:
- Type: forward
TargetGroupArn:
Ref: ApiServiceGrpcTargetGroup
Conditions:
- Field: host-header
Values:
- !Ref ApiDomainName
- Field: path-pattern
Values:
- '*'
ListenerArn: !Ref PublicGrpcListener
Priority: 1
ApiServiceGrpcTargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckIntervalSeconds: 120
UnhealthyThresholdCount: 2
HealthyThresholdCount: 5
HealthCheckTimeoutSeconds: 60
HealthCheckPath: /
HealthCheckProtocol: HTTP
HealthCheckPort: 13133
Matcher:
GrpcCode: 0-99
Port: 4317
Protocol: HTTP
ProtocolVersion: GRPC
VpcId: !Ref VpcId
TargetType: ip
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: '60'
- Key: load_balancing.algorithm.type
Value: 'least_outstanding_requests'
Tags:
- Key: service
Value: monocle-otel-gateway-collector
- Key: Product
Value: monocle

ApiContainerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub ${Environment}-otel-gateway-collector
GroupDescription: Security group for otel gateway collector
VpcId: !Ref VpcId
Tags:
- Key: Service
Value: monocle-otel-gateway-collector
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 4317
ToPort: 4317
SourceSecurityGroupId: !GetAtt PublicLoadBalancerSecurityGroup.GroupId
- IpProtocol: tcp
FromPort: 4318
ToPort: 4318
SourceSecurityGroupId: !GetAtt PublicLoadBalancerSecurityGroup.GroupId
- IpProtocol: tcp
FromPort: 13133
ToPort: 13133
SourceSecurityGroupId: !GetAtt PublicLoadBalancerSecurityGroup.GroupId

@pkalvagit
Copy link

@jangaraj , at least cloud you please share your cloudformation template.. i will reference it to find solution to make grpc health check

@mx-psi mx-psi added the collector-telemetry healthchecker and other telemetry collection issues label Apr 14, 2023
@jammymalina
Copy link

I use this config for ALB for OTLP/GRPC, which works like a charm:

 TargetGroupGRPC:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      VpcId: !Ref VpcId
      Port: 4317
      Protocol: HTTP
      ProtocolVersion: GRPC
      HealthCheckIntervalSeconds: 5
      HealthCheckPath: /
      HealthCheckProtocol: HTTP
      Matcher:
        GrpcCode: 0-99
      HealthCheckTimeoutSeconds: 4
      HealthyThresholdCount: 2
      UnhealthyThresholdCount: 2
      TargetType: ip

Unfortunately, I have a problem with OTLP/HTTP. It works fine with JSON payload, but it has problem with protobuf payloads sometimes (SSL handshake timeout, client headers timeout):

 TargetGroupHTTP:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      VpcId: !Ref VpcId
      Port: 4318
      Protocol: HTTP
      HealthCheckIntervalSeconds: 5
      HealthCheckPath: /
      HealthCheckProtocol: HTTP
      HealthCheckPort: 13133
      HealthCheckTimeoutSeconds: 4
      HealthyThresholdCount: 2
      UnhealthyThresholdCount: 2
      TargetType: ip

Any idea how to configure AWS ALB to work with OTLP HTTP Protobuf?

This doesn't work for me. The load balancer health check still fails

@jammymalina
Copy link

I made an extension for grpc healthcheck if anyone is interested - https://github.com/jammymalina/otel-grpc-healthcheck. You have to use the otel builder to be able to use it in the collector. I'd like to make it part of the opentelemetry-collector-contrib so if anyone can help me with that it would be great.

My extension is using the standard grpc health check - /grpc.health.v1.Health/Check

@seankhliao
Copy link
Contributor

I filed #8397 to add it as an option for grpc servers

@ksandrmatveyev
Copy link

Hello, any updates on this?

@kasun-bandara
Copy link

We're waiting for the gRPC health check too.

If you are looking to set up ALB (gRPC) -> ECS, one workaround would be to deploy the gRPC proxy as a sidecar to route health requests to the HTTP endpoint.

@jammymalina
Copy link

We're waiting for the gRPC health check too.

If you are looking to set up ALB (gRPC) -> ECS, one workaround would be to deploy the gRPC proxy as a sidecar to route health requests to the HTTP endpoint.

Hello, any updates on this?

You can use this extension for now - https://github.com/jammymalina/otel-grpc-healthcheck. We use it in prod with Application Load Balancer for quite a while so I dare say it's pretty stable. The docs are still missing (too lazy to add them) but the config is pretty straightforward. It works with default you just have to make sure you include the original healtcheck extension with default settings too. You have to use the otel collector builder to build to use the extension

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collector-telemetry healthchecker and other telemetry collection issues help wanted Good issue for contributors to OpenTelemetry Service to pick up
Projects
None yet
Development

No branches or pull requests