Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS not working in tempo #3478

Open
samyak335 opened this issue Mar 11, 2024 · 17 comments
Open

TLS not working in tempo #3478

samyak335 opened this issue Mar 11, 2024 · 17 comments
Labels
stale Used for stale issues / PRs

Comments

@samyak335
Copy link

samyak335 commented Mar 11, 2024

Describe the bug
When TLS is enabled in tempo , observing below error continuously in pod logs and logs show grpc 3100 is started on http

level=info ts=2024-03-11T09:01:30.803116066Z caller=server.go:228 msg="server listening on addresses" http=[::]:3100 grpc=[::]:9095

error:
level=info ts=2024-03-11T09:01:30.829038354Z caller=tempodb.go:399 msg="compaction and retention enabled."
level=info ts=2024-03-11T09:01:30.829080668Z caller=worker.go:180 msg="adding connection" addr=127.0.0.1:9095
level=error ts=2024-03-11T09:01:30.830097384Z caller=frontend_processor.go:63 msg="error contacting frontend" address=127.0.0.1:9095 err="rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: EOF""

To Reproduce
Steps to reproduce the behavior:

  1. Configure below tls configurations in tempo values.yaml
> server:
>     # -- HTTP server listen port
>        http_listen_port: 3100
>        tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
>       tls_min_version: VersionTLS12
>       grpc_tls_config:
>            cert_file: /etc/tempo/cert/example.crt
>            key_file: /etc/tempo/key/example.key
>      http_tls_config:
>           cert_file: /etc/tempo/cert/example.crt
>           key_file: /etc/tempo/key/example.key

> receivers:
>      jaeger:
>        protocols:
>          grpc:
>             endpoint: 0.0.0.0:14250
>          thrift_binary:
>             endpoint: 0.0.0.0:6832
>          thrift_compact:
>             endpoint: 0.0.0.0:6831
>         thrift_http:
>           endpoint: 0.0.0.0:14268
>     opencensus:
>       otlp:
>         protocols:
>           grpc:
>             endpoint: "0.0.0.0:4317"
>              tls:
>                   cert_file: /etc/tempo/cert/example.crt
>                   key_file: /etc/tempo/key/example.key
>            http:
>               endpoint: "0.0.0.0:4318"
>               tls:
>                  cert_file: /etc/tempo/cert/example.crt
>                  key_file: /etc/tempo/key/example.key
> 

> grpc_client_config:
>         tls_enabled: true
>         tls_cert_path: /etc/tempo/cert/example.crt
>         tls_key_path: /etc/tempo/cert/example.key
>         tls_server_name: tempo.trace.svc.cluster.local
>         tls_insecure_skip_verify: true
>         tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
>         tls_min_version: VersionTLS12
  1. deploy tempo using values.yaml

Expected behavior
GRPC should be listening on https and when otel endpoint 0.0.0.0:4317 should receive traces on https.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Additional Context
i have referred TLS config from https://grafana.com/docs/tempo/latest/configuration/network/tls/ , i want to receive traces using https://ip:4317

@joe-elliott
Copy link
Member

Are you setting the grpc client config in all 3 places:

https://grafana.com/docs/tempo/latest/configuration/network/tls/#client-configuration

@samyak335
Copy link
Author

samyak335 commented Mar 12, 2024

I tried setting the grpc client config in all 3 places , but for querier.query-frontend.grpc_client_config i am observing below error in tempo pod logs while starting.

failed parsing config: failed to parse configFile /conf/tempo.yaml: yaml: unmarshal errors:
line 53: field grpc_client_config not found in type frontend.Config

setting in the values yaml file


>   ingester: {}
> 
>   querier: {}
> 
> 

>   queryFrontend:
>          grpc_client_config:
>                     tls_enabled: true
>                     tls_cert_path: /etc/tempo/cert/example.crt
>                     tls_key_path: /etc/tempo/cert/example.key
>                     tls_server_name: tempo.trace.svc.cluster.local
>                     tls_insecure_skip_verify: true
>                     tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
>                     tls_min_version: VersionTLS12

@joe-elliott
Copy link
Member

I do not know much about how the values.yaml interacts with the tempo configuration. Can you share the actual, rendered tempo config? It would be easier for me to help you.

Also the values.yaml content you shared is not tabbed properly which also makes it tough to understand what is happening.

My guess, based on your error, is that the grpc_client_config is being rendered somewhere into the query_frontend config and not under the querier.

@samyak335
Copy link
Author

i have formatted the config in previous comments properly.
The values.yaml has querier: {} separate and queryFrontend: {} separate , but in documentation it is said that querier.query-frontend.grpc_client_config which meant like the below .

querier: 
     query-frontend:
              grpc_client_config:

If i follow same format in values.yaml

ingester_client:
      grpc_client_config:
         tls_enabled: true
         tls_cert_path: /etc/tempo/cert/example.crt
         tls_key_path: /etc/tempo/cert/example.key
         tls_server_name: tempo.trace.svc.cluster.local
         tls_insecure_skip_verify: true
         tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
         tls_min_version: VersionTLS12
  metrics_generator_client:
       grpc_client_config:
         tls_enabled: true
         tls_cert_path: /etc/tempo/cert/example.crt
         tls_key_path: /etc/tempo/cert/example.key
         tls_server_name: tempo.trace.svc.cluster.local
         tls_insecure_skip_verify: true
         tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
         tls_min_version: VersionTLS12

querier:
   queryFrontend:
         grpc_client_config:
                    tls_enabled: true
                    tls_cert_path: /etc/tempo/cert/example.crt
                    tls_key_path: /etc/tempo/cert/example.key
                    tls_server_name: tempo.trace.svc.cluster.local
                    tls_insecure_skip_verify: true
                    tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
                    tls_min_version: VersionTLS12


I get this error failed parsing config: failed to parse configFile /conf/tempo.yaml: yaml: unmarshal errors: line 51: field query-frontend not found in type querier.Config when the tempo pod is starting, and cannot get the rendered config file from the pod as pod is not running state.
`

@joe-elliott
Copy link
Member

i believe the querier should be:

querier:
  frontend_worker:
    grpc_client_config:

can you see if that works?

@samyak335
Copy link
Author

Tempo got started now. But see 2 more errors , my application is throwing the below error

2024-03-13T04:17:47,743 ERROR [grpc-default-executor-2] i.o.e.o.trace.OtlpGrpcSpanExporter - Failed to export spans. Server is UNAVAILABLE. Make sure your collector is running and reachable from this network. Full error message:UNAVAILABLE

In grafana , in explore tempo in search i see the below error when i am clicking on search
Query error Error (upstream: (500) error querying ingesters in Querier.Search: failed to execute f() for 127.0.0.1:9095: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: EOF" ). Please check the server logs for more details.

Below is my tempo configuration from my pod

multitenancy_enabled: false

usage_report:
  reporting_enabled: true
compactor:
  compaction:
    block_retention: 24h
distributor:
  receivers:
        jaeger:
          protocols:
            grpc:
              endpoint: 0.0.0.0:14250
            thrift_binary:
              endpoint: 0.0.0.0:6832
            thrift_compact:
              endpoint: 0.0.0.0:6831
            thrift_http:
              endpoint: 0.0.0.0:14268
        otlp:
          protocols:
            grpc:
              endpoint: 0.0.0.0:4317
              tls:
                cert_file: /etc/tempo/cert/example.crt
                key_file: /etc/tempo/key/example.key
            http:
              endpoint: 0.0.0.0:4318
              tls:
                cert_file: /etc/tempo/cert/example.crt
                key_file: /etc/tempo/key/example.key
ingester:
      {}
server:
      grpc_tls_config:
        cert_file: /etc/tempo/cert/example.crt
        key_file: /etc/tempo/key/example.key
      http_listen_port: 3100
      http_tls_config:
        cert_file: /etc/tempo/cert/example.crt
        key_file: /etc/tempo/key/example.key
      tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
      tls_min_version: VersionTLS12
storage:
      trace:
        backend: local
        local:
          path: /var/tempo/traces
        wal:
          path: /var/tempo/wal
querier:
      frontend_worker:
        grpc_client_config:
          tls_cert_path: /etc/tempo/cert/example.crt
          tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
          tls_enabled: true
          tls_insecure_skip_verify: true
          tls_key_path: /etc/tempo/key/example.key
          tls_min_version: VersionTLS12
          tls_server_name: tempo.trace.svc.cluster.local
query_frontend:
      {}
overrides:
      per_tenant_override_config: /conf/overrides.yaml
      metrics_generator_processors:
      - 'service-graphs'
      - 'span-metrics'
metrics_generator:
      storage:
        path: "/tmp/tempo"
        remote_write:
          - url: https://prometheus-kube-prometheus-prometheus:9090/api/v1/write

@samyak335
Copy link
Author

i have deployed tempo yum package on Linux machine and tested, even then i am getting the below error from application.
`

2024-03-13T09:41:48,043 ERROR [BatchSpanProcessor_WorkerThread-1] i.o.e.o.trace.OtlpGrpcSpanExporter - Failed to export spans. Server is UNAVAILABLE. Make sure your collector is running and reachable from this network. Full error message:UNAVAILABLE `

here is my tempo config i have used

> stream_over_http_enabled: true
> server:
>   http_listen_port: 3200
>   log_level: info
>   tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
>   tls_min_version: VersionTLS12
>   grpc_tls_config:
>         cert_file: /etc/tempo/example.crt
>         key_file: /etc/tempo/example.key
>         client_auth_type: VerifyClientCertIfGiven
>         client_ca_file: /etc/tempo/ca.crt
>   http_tls_config:
>         cert_file: /etc/tempo/example.crt
>         key_file: /etc/tempo/example.key
>         client_auth_type: VerifyClientCertIfGiven
>         client_ca_file: /etc/tempo/ca.crt
> 
> querier:
>   frontend_worker:
>     grpc_client_config:
>        tls_enabled: true
>        tls_cert_path: /etc/tempo/example.crt
>        tls_key_path:  /etc/tempo/example.key
>        tls_ca_path: /etc/tempo/ca.crt
>        tls_insecure_skip_verify: true
>        tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
>        tls_min_version: VersionTLS12
> 
> 
> query_frontend:
> 
>   search:
>     duration_slo: 5s
>     throughput_bytes_slo: 1.073741824e+09
>   trace_by_id:
>     duration_slo: 5s
> 
> metrics_generator_client:
>  grpc_client_config:
>    tls_enabled: true
>    tls_cert_path: /etc/tempo/example.crt
>    tls_key_path:  /etc/tempo/example.key
>    tls_ca_path: /etc/tempo/ca.crt
>    tls_insecure_skip_verify: true
>    tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
>    tls_min_version: VersionTLS12
> 
> ingester_client:
>    grpc_client_config:
>      tls_enabled: true
>      tls_cert_path: /etc/tempo/example.crt
>      tls_key_path:  /etc/tempo/example.key
>      tls_ca_path: /etc/tempo/ca.crt
>      tls_insecure_skip_verify: true
>      tls_cipher_suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
>      tls_min_version: VersionTLS12
> 
> 
> 
> distributor:
>   receivers:                           # this configuration will listen on all ports and protocols that tempo is capable of.
>     jaeger:                            # the receives all come from the OpenTelemetry collector.  more configuration information can
>       protocols:                       # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
>         thrift_http:                   #
>         grpc:                          # for a production deployment you should only enable the receivers you need!
>         thrift_binary:
>         thrift_compact:
>     zipkin:
>     otlp:
>       protocols:
>         http:
>              tls:
>                  ca_file: /etc/tempo/ca.crt
>                  cert_file: /etc/tempo/example.crt
>                  key_file:  /etc/tempo/example.key
> 
> 
> 
>         grpc:
>              tls:
>                  ca_file: /etc/tempo/ca.crt
>                  cert_file: /etc/tempo/example.crt
>                  key_file:  /etc/tempo/example.key
> 
>     opencensus:
> 
> ingester:
>   max_block_duration: 5m               # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normally
> 
> compactor:
>   compaction:
>     block_retention: 1h                # overall Tempo trace retention. set for demo purposes
> 
> metrics_generator:
>   registry:
>     external_labels:
>       source: tempo
>       cluster: docker-compose
>   storage:
>     path: /tmp/tempo/generator/wal
>     remote_write:
>       - url: http://192.168.1.90:9090/api/v1/write
>         send_exemplars: true
> 
> memberlist:
>     tls_enabled: true
>     tls_cert_path: /etc/tempo/example.crt
>     tls_key_path: /etc/tempo/example.key
>     tls_ca_path: /etc/tempo/ca.crt
>     tls_insecure_skip_verify: true
> 
> 
> storage:
>   trace:
>     backend: local                     # backend configuration to use
>     wal:
>       path: /tmp/tempo/wal             # where to store the the wal locally
>     local:
>       path: /tmp/tempo/blocks
> 
> overrides:
>   defaults:
>     metrics_generator:
>       processors: [service-graphs, span-metrics] # enables metrics generator
> 

In grafana i am able to add the tempo data source with https .

@joe-elliott
Copy link
Member

2024-03-13T09:41:48,043 ERROR [BatchSpanProcessor_WorkerThread-1] i.o.e.o.trace.OtlpGrpcSpanExporter - Failed to export spans. Server is UNAVAILABLE. Make sure your collector is running and reachable from this network. Full error message:UNAVAILABLE

This is a networking issue and likely unrelated to Tempo. I would begin debugging by trying to open a tcp connection to the appropriate port using a tool like nc

@samyak335
Copy link
Author

samyak335 commented Mar 13, 2024

I have checked removing the TLS configuration from tempo , and application is able to send traces to otel endpoint. But if i enable TLS configuration in tempo , and add https://x.x.x.x:4317, then i am seeing this error from application.

2024-03-13T09:41:48,043 ERROR [BatchSpanProcessor_WorkerThread-1] i.o.e.o.trace.OtlpGrpcSpanExporter - Failed to export spans. Server is UNAVAILABLE. Make sure your collector is running and reachable from this network. Full error message:UNAVAILABLE

@joe-elliott
Copy link
Member

Have you confirmed that the process is running and the expected ports are open using a tool like netstat? Can nc connect to the configured port from wherever you're receiving that error?

@samyak335
Copy link
Author

samyak335 commented Mar 14, 2024

I have confirmed , port is accessible from the application machine. But adding https like https://192.168.1.90:4317 is giving issue.If i keep http:://192.168.1.90:4317 i am getting below error

2024-03-14T06:42:54,990 ERROR [grpc-default-executor-2] i.o.e.o.trace.OtlpGrpcSpanExporter - Failed to export spans. Server is UNAVAILABLE. Make sure your collector is running and reachable from this network. Full error message:UNAVAILABLE: End of stream or IOException

nc port confirmation

> root@bwapp-67c5884d8f-dcdq4:/# nc -zv 192.168.1.90 4317
> Connection to 192.168.1.90 4317 port [tcp/*] succeeded!
> 

@samyak335
Copy link
Author

i have tested by removing the below tls from distributor.receivers.otlp.protocols.grpc.tls and distributor.receivers.otlp.protocols.http.tls configuration from the tempo TLS. Application is able to send traces http://192.168.1.90:4317 endpoint

> distributor:
>   receivers:                           # this configuration will listen on all ports and protocols that tempo is capable of.
>     jaeger:                            # the receives all come from the OpenTelemetry collector.  more configuration information can
>       protocols:                       # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
>         thrift_http:                   #
>         grpc:                          # for a production deployment you should only enable the receivers you need!
>         thrift_binary:
>         thrift_compact:
>     zipkin:
>     otlp:
>       protocols:
>         http:
>              tls:
>                  ca_file: /etc/tempo/ca.crt
>                  cert_file: /etc/tempo/example.crt
>                  key_file:  /etc/tempo/example.key
>         grpc:
>              tls:
>                  ca_file: /etc/tempo/ca.crt
>                  cert_file: /etc/tempo/example.crt
>                  key_file:  /etc/tempo/example.key
> 
> 

@joe-elliott
Copy link
Member

If you run a local netstat is Tempo holding the expected port open? Any helpful logs in Tempo? Tempo version?

@samyak335
Copy link
Author

samyak335 commented Mar 19, 2024

I could see the port listening with netstat command , Tempo version is 2.4
image

From application pod
image

i still see the error from the application

2024-03-17T23:57:46,581 ERROR [BatchSpanProcessor_WorkerThread-1] i.o.e.o.trace.OtlpGrpcSpanExporter - Failed to export spans. Server is UNAVAILABLE. Make sure your collector is running and reachable from this network. Full error message:UNAVAILABLE

Please find tempo log here
tempo.log

please find configuration file here
config.yml.txt

@joe-elliott
Copy link
Member

It appears that Tempo is listening on the expected port. If you run the collector and Tempo in the same network namespace do you see the error?

Is the error transient? are any spans being consumed by Tempo?

@samyak335
Copy link
Author

samyak335 commented Mar 20, 2024

Error isn't transient . its lasting forever and no spans are being consumed by tempo at all, this happens only when we have
distributor.receivers.otlp.protocols.grpc.tls, distributor.receivers.otlp.protocols.http.tls are enabled and https:x.x.x.x:4317 is given in the application.Collector and tempo both are in same namespace and also tested monolithic mode of tempo as well.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

@github-actions github-actions bot added the stale Used for stale issues / PRs label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Used for stale issues / PRs
Projects
None yet
Development

No branches or pull requests

2 participants