Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: set the 'number_of_replicas' in elasticsearch to a variable #533

Open
Stevenpc3 opened this issue Jan 4, 2024 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@Stevenpc3
Copy link
Contributor

Stevenpc3 commented Jan 4, 2024

Requirement

As a user that uses Elasticsearch as the data store I deploy elastic in single mode (or multiple replicas) the default index created by Jaeger should match the number of replicas that I am using.

Problem

If I deploy elastic with a single replica, Jaeger will create an index with the (either the default or intentionally, but not sure) number_of_replicas = 1 which causes issues in the elasticsearch instance.

What happens is the index will have be created fine, but if the elasticsearch pod comes down then when it comes back up it will stay "yellow" and not go ready because the index waits for the second replica. This can be fixed by setting the number_of_replicas = 0 manually or via template.

The values.yaml "replicas" for elasticsearch was set to 1

elasticsearch:
    imageTag: "7.17.3"
    replicas: 1
    fullnameOverride: "jaeger-elasticsearch"

Using the following modifications to teh values.yaml works to set the template correctly. Specifically the section

lifecycle:
      postStart:
        exec:
          command:
jaeger:
  # -- enable or disable Jaeger
  enabled: true
  # -- version of Jaeger to use
  #tag:
  # -- Set the storage type to use for long term storage
  storage:
    type: elasticsearch
    elasticsearch:
      # make this a template that decides based on devMode and can configure properly
      host: "jaeger-elasticsearch"
      usePassword: false
      antiAffinity: "soft"

  # -- Preferred long term backend storage
  elasticsearch:
    imageTag: "7.17.3"
    replicas: 1
    fullnameOverride: "jaeger-elasticsearch"
    esConfig:
      elasticsearch.yml: |
        ingest.geoip.downloader.enabled: false
    lifecycle:
      postStart:
        exec:
          command:
            - bash
            - -c
            - |
              #!/bin/bash
              # Add a template to adjust number of shards/replicas
              TEMPLATE_NAME=no_replicas
              INDEX_PATTERN1="jaeger-span-*"
              INDEX_PATTERN2="jaeger-service-*"
              ES_URL=http://localhost:9200
              while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
              curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN1"\"','\""$INDEX_PATTERN2"\"'],"settings":{"number_of_replicas":"0"}}'

further examples of what happens can be found at

https://medium.com/fred-thougths/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode-ce196e20ba95
https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/

Proposal

Ensure the the number of shards created in the index is equal to the replicas - 1.

Open questions

No response

@Stevenpc3 Stevenpc3 added the enhancement New feature or request label Jan 4, 2024
@Stevenpc3
Copy link
Contributor Author

I have found a more simple way to set this instead of the lifecycle hack on start...

collector:
  cmdlineParams:
    es.num-replicas: "0"

I am sure I can template that "0" to be a value I require if more since we use a wrapper chart, but this does not seem straight forward when using out of the box chart.

@Stevenpc3
Copy link
Contributor Author

Stevenpc3 commented May 13, 2024

This problem also exists with Spark. It will default to setting replica count to 1 and then in kuberentes it will start as Yellow in elasticsearch and then fail to come up. You have to delete the replica by setting it to 1 via exec in and curl with

curl -XPUT -H 'Content-Type: application/json' 'localhost:9200/jaeger-dependencies-2024-05-11/_settings' -d '{"index.number_of_replicas" : 0}'

Replace the index with the proper value.

I will test with the same cmdlineParams setting and hope it fixes it...

@Stevenpc3
Copy link
Contributor Author

So cmdlineParams for spark does not work. The only thing that did work is setting a hook to adjust the replicas on start if using a single master.

master:
      masterOnly: false
      replicaCount: 1
      lifecycleHooks:
        postStart:
          exec:
            command:
              - bash
              - -c
              - |
                #!/bin/bash
                # Add a template to adjust number of shards/replicas
                TEMPLATE_NAME=no_replicas
                # INDEX_PATTERN1="jaeger-span-*"
                # INDEX_PATTERN2="jaeger-service-*"
                INDEX_PATTERN1="jaeger-dependencies-*"
                ES_URL=http://localhost:9200
                while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
                curl -XPUT "$ES_URL/_index_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN1"\"'],"template":{"settings":{"number_of_replicas":"0"}}}'

which in the new charts you can no longer set the jaeger-span or jaeger-service this way as it throws errors so in total you need to set the hook for spark and the cmdlineParams for the collector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant