Create a production ready values files #473

friedrichg · 2023-07-20T17:50:33Z

Most users using the helm chart have to specify a few things in the values.yaml to get their cortex to production ready status.
We should create an alternate values file to make this path easier, maybe copying some of the values specified in cortex-jsonnet.

jessequinn · 2024-01-10T13:32:48Z

@friedrichg possibly you can start by providing what you believe is a good values.yaml so others can evaluate.

abctaylor · 2024-02-25T20:59:08Z

Chiming in to say this would be incredibly helpful, especially for folks new to Kubernetes and Helm

danfinn · 2024-04-01T18:07:18Z

As a new user working through the helm install I fully agree, there is quite a bit missing to get this installed and production ready. I'm currently working through issues with my distributor and ingester pods OOM'ing and it looks like it's because there are no memory limits set on the pods and also GOMEMLIMIT is not configured.

If I'm able to get cortex running correctly I will be sure to share my values here.

danfinn · 2024-04-17T20:13:45Z

here is the ansible task we are using to do the helm install of cortex, values are specified here:

- name: Helm Cortex for Prometheus
  kubernetes.core.helm:
    name: cortex
    binary_path: "{{ helm310_binary_path }}"
    kubeconfig: "{{ context_file }}"
    context: "{{ aks_name }}"
    chart_ref: cortex-helm/cortex
    chart_version: v2.2.0
    wait: true
    wait_timeout: 600s
    release_namespace: "{{ namespace }}"
    values:
      ingress:
        enabled: true
        ingressClass:
          name: "nginx"
        annotations:
          cert-manager.io/cluster-issuer: "{{ namespace }}-letsencrypt-issuer"
          kubernetes.io/ingress.class: internal-nginx
        hosts:
          - host: "cortex.{{ dns_zone }}"
            paths:
              - /
        tls:
          - hosts:
              - "cortex.{{ dns_zone }}"
            secretName: cortex-tls
      ingester:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
        replicas: 4
        resources:
          limits:
            memory: "16Gi"
        env:
          - name: GOMEMLIMIT
            value: 14000MiB
      distributor:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
        replicas: 4
        resources:
          limits:
            memory: "8Gi"
        env:
          - name: GOMEMLIMIT
            value: 7000MiB
      alertmanager:
        enabled: false
      ruler:
        enabled: false
      query_frontend:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      querier:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      query_frontend:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      nginx:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      store_gateway:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      compactor:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      config:
        limits:
          max_label_names_per_series: 50
          max_series_per_metric: 0
        auth_enabled: true
        memberlist:
          abort_if_cluster_join_fails: false
          join_members:
            - cortex-memberlist.cortex.svc.cluster.local
        querier:
          store_gateway_addresses: cortex-store-gateway-headless.cortex.svc.cluster.local:9095
        blocks_storage:
          backend: azure
          azure:
            account_name: "{{ storage_account_name }}"
            account_key: "{{ storage_account_keys_output.0.value }}"
            container_name: "cortex"
            endpoint_suffix: "blob.core.windows.net"
          tsdb:
            dir: /data/tsdb
          bucket_store:
            sync_dir: /data/tsdb
            bucket_index:
              enabled: true
        ruler_storage:
          azure:
            account_name: "{{ storage_account_name }}"
            account_key: "{{ storage_account_keys_output.0.value }}"
            container_name: "cortex"
            endpoint_suffix: "blob.core.windows.net"
        alertmanager_storage:
          azure:
            account_name: "{{ storage_account_name }}"
            account_key: "{{ storage_account_keys_output.0.value }}"
            container_name: "cortex"
            endpoint_suffix: "blob.core.windows.net"

things I had issues with and had to tweak were the memory limits and GOMEMLIMIT sizing for ingester and distributor pods, auto DNS detection for memberlist was not working so fqdn had to be specified and we also had to set some limits for max labels and max series

friedrichg added the enhancement New feature or request label Jul 20, 2023

friedrichg mentioned this issue Mar 6, 2024

Distributor requires ingester configuration to access ring state cortexproject/cortex#5800

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a production ready values files #473

Create a production ready values files #473

friedrichg commented Jul 20, 2023

jessequinn commented Jan 10, 2024

abctaylor commented Feb 25, 2024

danfinn commented Apr 1, 2024 •

edited

danfinn commented Apr 17, 2024

Create a production ready values files #473

Create a production ready values files #473

Comments

friedrichg commented Jul 20, 2023

jessequinn commented Jan 10, 2024

abctaylor commented Feb 25, 2024

danfinn commented Apr 1, 2024 • edited

danfinn commented Apr 17, 2024

danfinn commented Apr 1, 2024 •

edited