Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolerations added in Worflow spec not being applied to pods #13020

Open
3 of 4 tasks
kunalmehta-eve opened this issue May 8, 2024 · 9 comments
Open
3 of 4 tasks

Tolerations added in Worflow spec not being applied to pods #13020

kunalmehta-eve opened this issue May 8, 2024 · 9 comments
Assignees
Labels
area/controller Controller issues, panics area/templates/container area/workflow-templates problem/more information needed Not enough information has been provide to diagnose this issue. type/bug

Comments

@kunalmehta-eve
Copy link

kunalmehta-eve commented May 8, 2024

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

We need to add some tolerations and node selector in workflow definition

Pods are getting rendered with nodeSelector but added tolerations are not being rendered into pod definition

Version

3.5.5

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

  apiVersion: argoproj.io/v1alpha1
  kind: Workflow
  metadata:
    generateName: x-y-
  spec:
    workflowTemplateRef:
      name: dummy-template
      namespace: argo-workflows
    entrypoint: aa-bb
    nodeSelector:
      stack: workflows
    tolerations:
    - key: stack
      operator: Equal
      value: workflows
      effect: NoSchedule
    - key: "kubernetes.azure.com/scalesetpriority"
      operator: "Equal"
      value: "spot"
      effect: "NoSchedule"`

Logs from the workflow controller

No Node found, as nodes have tolerations

Logs from in your workflow's wait container

No Node found, as nodes have tolerations
@Joibel
Copy link
Member

Joibel commented May 8, 2024

Do you have a complete example for this.

// Set tolerations (if specified)
appears to do the right thing, and the same thing as for nodeSelector, so I'd like to be sure you haven't got tolerations in your workflowTemplate you're running. They don't get merged.

@kunalmehta-eve
Copy link
Author

@Joibel Yes, i confirm there are no tolerations/node selector in workflowTemplate, but still its not rendering it in pods.

@kunalmehta-eve
Copy link
Author

apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
name: sample-sensor
namespace: argo-workflows
spec:
template:
serviceAccountName: sample-workflow-sa
dependencies:
- name: sample-jobs
eventSourceName: azure-queue-storage
eventName: processing-preview-jobs
transform:
jq: .body|=(@base64d |@base64d | fromjson)
filters:
data:
- path: "body.data.filePath"
type: string
value:
- '.-da-.-(full|snapshot|base)_[0-9]+.mcap'
triggers:
- template:
name: sample-workflow
k8s:
operation: create
source:
resource:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: sample-preview-
spec:
entrypoint: sample-entrypoint
nodeSelector:
stack: workflows
tolerations:
- key: stack
operator: Equal
value: workflows
effect: NoSchedule
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
synchronization:
semaphore:
configMapKeyRef:
name: workflow-config
key: SAMPLE_JOBS
arguments:
parameters:
- name: filePath
# this is the value that should be overridden
value: empty
workflowTemplateRef:
name: sample-workflow-template
namespace: argo-workflows
parameters:
- src:
dependencyName: sample-jobs
dataKey: body.data.filePath
dest: spec.arguments.parameters.0.value

@Joibel
Copy link
Member

Joibel commented May 13, 2024

I still need a complete example.

I don't think the trigger mechanism (events) matters here, but I can't run it as is because the workflow template referred to isn't provided.

@Joibel Joibel added the problem/more information needed Not enough information has been provide to diagnose this issue. label May 13, 2024
@kunalmehta-eve
Copy link
Author

and as i said issue is only with tolerations, node selector are being rendered to pod yaml

@kunalmehta-eve
Copy link
Author

@Joibel

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: sample-workflow-template
namespace: argo-workflows
spec:
ttlStrategy:
secondsAfterCompletion: 86400 # 1 day
secondsAfterSuccess: 86400
secondsAfterFailure: 86400
securityContext:
runAsNonRoot: true
runAsUser: 8737 #; any non-root user
fsGroup: 8737
volumeClaimTemplates:

  • metadata:
    name: data
    spec:
    accessModes: [ "ReadWriteOnce" ]
    resources:
    requests:
    storage: 15Gi
    entrypoint: sample-entrypoint
    metrics:
    prometheus:
    • name: custom_workflow_execution_duration
      labels:
      • key: name
        value: "{{workflow.name}}"
        help: "Time taken by workflow execution"
        gauge:
        realtime: true # This metric will be emitted in real time. For more info see: docs/metrics.md
        value: "{{workflow.duration}}" # Use {{workflow.duration}} in workflow-level and {{duration}} in template-level
    • name: custom_workflow_execution_status_count
      help: "Status of workflow execution"
      labels:
      • key: name
        value: "{{workflow.name}}"
      • key: status
        value: "{{workflow.status}}"
        counter:
        value: "1"
        templates:
    • name: sample-preview
      inputs:
      parameters:
      • name: filePath
        container:
        image: dummy.azurecr.io/sample-gen-job
        resources:
        requests:
        memory: "500Mi"
        cpu: 1
        limits:
        memory: "700Mi"
        cpu: 1
        command: ["/sample_gen_job_bin"]
        args: [process-bag, --path, "{{inputs.parameters.filePath}}"]
        volumeMounts: # same syntax as k8s Pod spec
      • name: data
        mountPath: /mnt/data
        env:
        • name: AZURE_STORAGE_CONNECTION_STRING
          valueFrom:
          secretKeyRef:
          name: workflows
          key: jobs-queue-connection-string
        • name: DB_URL
          valueFrom:
          secretKeyRef:
          name: workflows
          key: sample-db-url
        • name: DB_SCHEMA
          value: sample
        • name: KEYCLOAK_REALM
          value: sample
        • name: KEYCLOAK_CLIENT_ID
          value: sample-api-pipeline-token
        • name: KEYCLOAK_CLIENT_SECRET
          valueFrom:
          secretKeyRef:
          name: workflows
          key: sample-client-secret
        • name: KEYCLOAK_SERVER_URL
          valueFrom:
          configMapKeyRef:
          name: sample-config
          key: KEYCLOAK_SERVER_URL
        • name: ARTIFACTORY_USER
          value: sample_processing_reader
        • name: ARTIFACTORY_PASSWORD
          valueFrom:
          secretKeyRef:
          name: workflows
          key: sample-password
        • name: SAMPLE_PREVIEW_DIR
          value: "/tmp/"
        • name: SAMPLE_DIR
          value: "/mnt/data/"
        • name: DUMMY_URL
          valueFrom:
          configMapKeyRef:
          name: workflow-config
          key: DUMMY_URL

@kunalmehta-eve
Copy link
Author

@shuangkun

@shuangkun
Copy link
Member

@shuangkun

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: workflow-template-submittable
spec:
  arguments:
    parameters:
      - name: message
        value: hello world
  templates:
    - name: whalesay-template
      inputs:
        parameters:
          - name: message
      container:
        image: docker/whalesay
        command: [cowsay]
        args: ["{{inputs.parameters.message}}"]
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: x-y-
spec:
  entrypoint: whalesay-template
  workflowTemplateRef:
    name: workflow-template-submittable
  nodeSelector:
    stack: workflows
  tolerations:
  - key: stack
    operator: Equal
    value: workflows
    effect: NoSchedule
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"

I used this example and finally found the corresponding tolerations on the pod. @kunalmehta-eve

@shuangkun shuangkun self-assigned this May 16, 2024
@kunalmehta-eve
Copy link
Author

@shuangkun Unfortunately when we use sensor to trigger a workflow it does not render tolerations to pods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/templates/container area/workflow-templates problem/more information needed Not enough information has been provide to diagnose this issue. type/bug
Projects
None yet
Development

No branches or pull requests

4 participants