Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race when calling Child.Stop with a process that takes too long to stop #1753

Open
averche opened this issue May 25, 2023 · 0 comments
Open

Comments

@averche
Copy link
Contributor

averche commented May 25, 2023

Consul Template version

v0.32.0

Configuration

The following minimal example to demonstrates the race. The full code can be found at averche/test-go/consul-template-child-gorace:

// 0s : start the child process
// 2s : stop the child process
// 7s : the library attempts to kill the child process
func run() (int, error) {
	process, err := child.New(&child.NewInput{
		Command:     "./my-script.sh", // a script that disregards SIGTERM
		Stdout:      os.Stdout,
		KillSignal:  syscall.SIGTERM,
		KillTimeout: 5 * time.Second,
	})
	if err != nil {
		return -2, err
	}

	if err := process.Start(); err != nil {
		return -3, fmt.Errorf("could not start the process: %s", err)
	}

	select {
	case <-time.After(2 * time.Second):
		process.Stop()
		return SuccessfullyStoppedTheProcess, nil

	case exitCode := <-process.ExitCh():
		return exitCode, nil
	}
}

func TestRunOnce(t *testing.T) {
	c, err := run()
	if err != nil {
		t.Fatal(err)
	}
	if c != SuccessfullyStoppedTheProcess {
		t.Fatalf("unexpected return code: %d", c)
	}
}

Debug output

$ go test -run TestRunOnce --race

2023/05/25 13:55:39 [INFO] (child) spawning: ./my-script.sh
sleeping for 20s
sleeping for 19s
2023/05/25 13:55:41 [INFO] (child) stopping process
received SIGTERM; ignoring it
sleeping for 18s
sleeping for 17s
sleeping for 16s
sleeping for 15s
sleeping for 14s
==================
WARNING: DATA RACE
Write at 0x00c0000f20c0 by goroutine 6:
  github.com/hashicorp/consul-template/child.(*Child).kill.func1()
      /Users/avean/go/pkg/mod/github.com/hashicorp/consul-template@v0.32.0/child/child.go:439 +0x84
  runtime.deferreturn()
...

Previous read at 0x00c0000f20c0 by goroutine 9:
  github.com/hashicorp/consul-template/child.(*Child).kill.func2()
      /Users/avean/go/pkg/mod/github.com/hashicorp/consul-template@v0.32.0/child/child.go:457 +0x70

... 

The full output can be found @ averche/test-go/consul-template-child-gorace.

Expected behavior

We attempt to stop the script, after 5 seconds, the script is killed immediately.

Actual behavior

The script is killed correctly but a go race is detected in tests.

Steps to reproduce

  1. git clone https://github.com/averche/test-go && cd test-go/consul-template-child-gorace/
  2. go test -run TestRunOnce --race
  3. See the "DATA RACE" output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant