[BUG]: SIGTERM signal not passed to Docker task process #4548

natbprice · 2023-11-30T18:19:39Z

What happened?

My pipeline step needs to perform some cleanup if it is terminated (e.g., cancelled, times out). This works correctly if I run the task on the host machine or in a Docker container on my local machine. However, if I run the step inside a container with Azure Pipelines it is not handled correctly. I believe this is because the process responsible for running tasks in Azure Pipelines is running as PID 1 and it exits without passing the SIGTERM signal to my task.

Here is a minimal reproducible example:

trigger: none

pool:
  vmImage: 'ubuntu-latest'

container:
  image: ubuntu:22.04
  options: --init
    
steps:
- checkout: none
- bash: |
    # Function to handle SIGTERM signal
    terminate() {
      echo "SIGTERM signal received. Exiting..."
      exit 0
    }
    trap terminate SIGTERM
    echo "Waiting for SIGTERM signal..."
    while true; do
      sleep 1
    done
  timeoutInMinutes: 1
  target: host
  displayName: HostTimeout
- bash: |
    # Function to handle SIGTERM signal
    terminate() {
      echo "SIGTERM signal received. Exiting..."
      exit 0
    }
    trap terminate SIGTERM
    echo "Waiting for SIGTERM signal..."
    while true; do
      sleep 1
    done
  condition: always()
  timeoutInMinutes: 1
  displayName: DockerTimeout

The "HostTimeout" step on the host has the expected output:

Waiting for SIGTERM signal...
SIGTERM signal received. Exiting...
##[error]The task has timed out.
Finishing: HostTimeout

The "DockerTimeout" step runs in the container and exits prematurely:

Waiting for SIGTERM signal...
##[error]The task has timed out.
Finishing: DockerTimeout

I have explored running with or without the Docker --init flag and calling my script with exec but it did not resolve the issue.

For the simple example, I may be able to configure a separate pipeline step that would perform cleanup, but this doesn't work for my real use case where the process called in the step has complex cleanup logic built in.

Versions

Agent Version 3.230.0 / Ubuntu 22.04.3 LTS

Environment type (Please select at least one enviroment where you face this issue)

Self-Hosted
Microsoft Hosted
VMSS Pool
Container

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

No response

Version controll system

No response

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

DenisNikulin5 · 2023-12-04T14:11:42Z

Hi @natbprice thanks for reporting! We are working on more prioritized issues at the moment, but will get back to this one soon.

technic · 2023-12-17T15:28:05Z

Azure pipelines worker executes steps inside docker container with docker exec. When the step is cancelled SIGINT is send to the docker exec process. This does not send signals to the process inside the container and it continues running in the background (moby/moby#9098). Thus cancelled task is not stopped at all. The following tasks which are configured to run with always() condition are started in parallel. Afterwards container is removed and all processes created with docker exec inside of the container are killed.

natbprice · 2023-12-17T16:37:20Z

@technic I was just using always() to show difference between a task running on host versus container. Without always the second demo task would not run.

In my testing, removing the container at the end of pipeline will not properly terminate running processes in that the task will not have an opportunity to catch signal and terminate cleanly.

You can also use always() with a cleanup step that will manually stop the running process that was not properly terminated. This is my current workaround. I am not sure if that is what you were suggesting.

As it relates to this ticket, it would be better if pipeline tasks were properly terminated without need for 2nd cleanup task. I am not sure if this is the job of pipeline agent or bash task. However, it doesn’t seem unsolvable for agent or bash task to at least stop the processes it has started. I believe it already assigns some unique ID to the task so it just needs to exec some cleanup if there is timeout or cancellation.

natbprice added the bug label Nov 30, 2023

github-actions bot added Area: Agent triage labels Nov 30, 2023

DenisNikulin5 removed the triage label Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: SIGTERM signal not passed to Docker task process #4548

[BUG]: SIGTERM signal not passed to Docker task process #4548

natbprice commented Nov 30, 2023

DenisNikulin5 commented Dec 4, 2023

technic commented Dec 17, 2023

natbprice commented Dec 17, 2023 •

edited

[BUG]: SIGTERM signal not passed to Docker task process #4548

[BUG]: SIGTERM signal not passed to Docker task process #4548

Comments

natbprice commented Nov 30, 2023

What happened?

Versions

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

Azure DevOps Server Version (if applicable)

Operation system

Version controll system

Relevant log output

DenisNikulin5 commented Dec 4, 2023

technic commented Dec 17, 2023

natbprice commented Dec 17, 2023 • edited

natbprice commented Dec 17, 2023 •

edited