Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a production ready values files #473

Open
friedrichg opened this issue Jul 20, 2023 · 4 comments
Open

Create a production ready values files #473

friedrichg opened this issue Jul 20, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@friedrichg
Copy link
Member

Most users using the helm chart have to specify a few things in the values.yaml to get their cortex to production ready status.
We should create an alternate values file to make this path easier, maybe copying some of the values specified in cortex-jsonnet.

@friedrichg friedrichg added the enhancement New feature or request label Jul 20, 2023
@jessequinn
Copy link

@friedrichg possibly you can start by providing what you believe is a good values.yaml so others can evaluate.

@abctaylor
Copy link

Chiming in to say this would be incredibly helpful, especially for folks new to Kubernetes and Helm

@danfinn
Copy link

danfinn commented Apr 1, 2024

As a new user working through the helm install I fully agree, there is quite a bit missing to get this installed and production ready. I'm currently working through issues with my distributor and ingester pods OOM'ing and it looks like it's because there are no memory limits set on the pods and also GOMEMLIMIT is not configured.

If I'm able to get cortex running correctly I will be sure to share my values here.

@danfinn
Copy link

danfinn commented Apr 17, 2024

here is the ansible task we are using to do the helm install of cortex, values are specified here:

- name: Helm Cortex for Prometheus
  kubernetes.core.helm:
    name: cortex
    binary_path: "{{ helm310_binary_path }}"
    kubeconfig: "{{ context_file }}"
    context: "{{ aks_name }}"
    chart_ref: cortex-helm/cortex
    chart_version: v2.2.0
    wait: true
    wait_timeout: 600s
    release_namespace: "{{ namespace }}"
    values:
      ingress:
        enabled: true
        ingressClass:
          name: "nginx"
        annotations:
          cert-manager.io/cluster-issuer: "{{ namespace }}-letsencrypt-issuer"
          kubernetes.io/ingress.class: internal-nginx
        hosts:
          - host: "cortex.{{ dns_zone }}"
            paths:
              - /
        tls:
          - hosts:
              - "cortex.{{ dns_zone }}"
            secretName: cortex-tls
      ingester:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
        replicas: 4
        resources:
          limits:
            memory: "16Gi"
        env:
          - name: GOMEMLIMIT
            value: 14000MiB
      distributor:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
        replicas: 4
        resources:
          limits:
            memory: "8Gi"
        env:
          - name: GOMEMLIMIT
            value: 7000MiB
      alertmanager:
        enabled: false
      ruler:
        enabled: false
      query_frontend:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      querier:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      query_frontend:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      nginx:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      store_gateway:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      compactor:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      config:
        limits:
          max_label_names_per_series: 50
          max_series_per_metric: 0
        auth_enabled: true
        memberlist:
          abort_if_cluster_join_fails: false
          join_members:
            - cortex-memberlist.cortex.svc.cluster.local
        querier:
          store_gateway_addresses: cortex-store-gateway-headless.cortex.svc.cluster.local:9095
        blocks_storage:
          backend: azure
          azure:
            account_name: "{{ storage_account_name }}"
            account_key: "{{ storage_account_keys_output.0.value }}"
            container_name: "cortex"
            endpoint_suffix: "blob.core.windows.net"
          tsdb:
            dir: /data/tsdb
          bucket_store:
            sync_dir: /data/tsdb
            bucket_index:
              enabled: true
        ruler_storage:
          azure:
            account_name: "{{ storage_account_name }}"
            account_key: "{{ storage_account_keys_output.0.value }}"
            container_name: "cortex"
            endpoint_suffix: "blob.core.windows.net"
        alertmanager_storage:
          azure:
            account_name: "{{ storage_account_name }}"
            account_key: "{{ storage_account_keys_output.0.value }}"
            container_name: "cortex"
            endpoint_suffix: "blob.core.windows.net"

things I had issues with and had to tweak were the memory limits and GOMEMLIMIT sizing for ingester and distributor pods, auto DNS detection for memberlist was not working so fqdn had to be specified and we also had to set some limits for max labels and max series

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

4 participants