Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ExecArtifact crash on concurrency execution (native API) #1168

Open
bozaro opened this issue Mar 27, 2024 · 8 comments
Open

[Bug] ExecArtifact crash on concurrency execution (native API) #1168

bozaro opened this issue Mar 27, 2024 · 8 comments
Assignees
Labels
api Issues or PRs related to kcl rust native APIs and multi-lang APIs enhancement New feature or request

Comments

@bozaro
Copy link

bozaro commented Mar 27, 2024

Bug Report

ExecArtifact crash on parallel execution (serial execution works fine).

1. Minimal reproduce step (Required)

Clone small test (go lang):

git clone https://github.com/bozaro/kcl-issue.git --branch concurrency-exec-issue .
go test .

Test source code:

package kcl_parse_file_issue

import (
	"encoding/json"
	"kcl-lang.io/kcl-go/pkg/service"
	"path"
	"strconv"
	"sync"
	"testing"

	"github.com/stretchr/testify/require"
	"kcl-lang.io/kcl-go/pkg/native"
	"kcl-lang.io/kcl-go/pkg/spec/gpyrpc"
)

func Test(t *testing.T) {
	client := native.NewNativeServiceClient()
	// Compile file
	output := buildProgram(t, client)
	require.NotEmpty(t, output)

	t.Run("serial execution", func(t *testing.T) {
		for i := 0; i < 1000; i++ {
			checkExecute(t, client, output, int64(i))
		}
	})
	t.Run("parallel execution", func(t *testing.T) {
		var wg sync.WaitGroup
		threads := 10
		wg.Add(threads)
		c := make(chan int64, threads*10)
		for i := 0; i < threads; i++ {
			go func() {
				defer wg.Done()
				for id := range c {
					checkExecute(t, client, output, id)
				}
			}()
		}
		for i := 0; i < 1000; i++ {
			c <- int64(i)
		}
		close(c)
		wg.Wait()
	})
}

func buildProgram(t *testing.T, client service.KclvmService) string {
	tempDir := t.TempDir()
	result, err := client.BuildProgram(&gpyrpc.BuildProgram_Args{
		ExecArgs: &gpyrpc.ExecProgram_Args{
			KFilenameList: []string{"source.k"},
			KCodeList:     []string{`v=option("foo")`},
		},
		Output: path.Join(tempDir, "kcl"),
	})
	require.NoError(t, err, "BuildProgram returns error")
	return result.Path
}

func checkExecute(t *testing.T, client service.KclvmService, output string, id int64) {
	value := strconv.FormatInt(id, 10)
	result, err := client.ExecArtifact(&gpyrpc.ExecArtifact_Args{
		ExecArgs: &gpyrpc.ExecProgram_Args{
			Args: []*gpyrpc.CmdArgSpec{{
				Name:  "foo",
				Value: value,
			}},
		},
		Path: output,
	})
	require.NoError(t, err, "ExecArtifact returns errors")

	var config struct {
		V int64 `json:"v"`
	}
	err = json.Unmarshal([]byte(result.JsonResult), &config)
	require.NoError(t, err, "Can't parse configuration")

	require.Equal(t, id, config.V)
}

2. What did you expect to see? (Required)

All test passed

3. What did you see instead (Required)

Errors like:

        kcl_concurrency_issue_test.go:72: 
            	Error Trace:	kcl_concurrency_issue_test.go:72
            	            				kcl_concurrency_issue_test.go:36
            	            				/usr/lib/go-1.22/src/runtime/asm_amd64.s:1695
            	Error:      	Received unexpected error:
            	            	already borrowed: BorrowMutError
            	Test:       	Test/parallel_execution
            	Messages:   	ExecArtifact returns errors

Or crash like:

malloc_consolidate(): unaligned fastbin chunk detected
SIGABRT: abort
PC=0x7d947a8969fc m=9 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 14 gp=0xc000232fc0 m=9 mp=0xc000480008 [syscall]:
runtime.cgocall(0x8d1260, 0xc000496c68)
	/usr/lib/go-1.22/src/runtime/cgocall.go:157 +0x4b fp=0xc000496c40 sp=0xc000496c08 pc=0x40a9eb
kcl-lang.io/kcl-go/pkg/native._Cfunc_kclvm_service_call(0x7d941a996d76, 0x4934ce0, 0x7d9410000d50, 0x7d9411005560)
	_cgo_gotypes.go:106 +0x4c fp=0xc000496c68 sp=0xc000496c40 pc=0x8c4c2c
...

4. What is your KCL components version? (Required)

kcl-lang.io/kcl-go v0.8.2

@Peefy
Copy link
Contributor

Peefy commented Mar 28, 2024

Sorry, I did not mark on this API that it is indeed not thread safe, this is a feature to be implemented.

@Peefy Peefy added this to the v0.9.0 Release milestone Mar 28, 2024
@Peefy Peefy added api Issues or PRs related to kcl rust native APIs and multi-lang APIs enhancement New feature or request labels Mar 28, 2024
@bozaro
Copy link
Author

bozaro commented Mar 28, 2024

I tried to research this problem a little (and update testcase repo).

It looks like parallel execution breaks down in at least the following cases:

  • If the same .so file is called (it seems dlopen loads it into the same address space and uses global variables). I can workaround this by copying the file with different name (hard link is not enough).
  • If plugins are used. I do not know how to workaround this problem.

@Peefy
Copy link
Contributor

Peefy commented Mar 28, 2024

You are right! They use the same address and global variables to share the KCL runtime, which currently does not support concurrent running. We are working hard to solve this problem, which is a bit tricky, similar to the parallelism issue of Python GIL and JVM, but it is expected that the 0.9 version KCL runtime will support concurrency.

@Peefy Peefy self-assigned this Mar 28, 2024
bozaro added a commit to bozaro/kcl that referenced this issue Apr 18, 2024
```
cargo test --package kclvm-api --features llvm -- --nocapture
```
@bozaro
Copy link
Author

bozaro commented Apr 22, 2024

Since version 0.8.4+, the BuildProgram method has also became non-thread-safe.
In version 0.8.3, the BuildProgram successfully works for us with parallelism.
Unfortunatelly, I was unable to create an example for stable reproducing BuildProgram concurrency issue.

@Peefy
Copy link
Contributor

Peefy commented Apr 22, 2024

Thank you for your feedback. I will investigate it.

I did modify the build logic before two versions, but added locks for different build threads. 😷

@bozaro
Copy link
Author

bozaro commented Apr 26, 2024

Looks there are some memoty corruption between 0.8.3 and 0.8.4+.
I add mutex to every BuildProgram, ParseFile and ExecArtifact call. And application still crash after 0.8.3 -> 0.8.4 upgrade.

@Peefy
Copy link
Contributor

Peefy commented Apr 26, 2024

Thank you very much for your feedback. I will carefully review the recent updates to identify factors. It is expected that version 0.8.6 will be released next week.

@bozaro
Copy link
Author

bozaro commented May 13, 2024

Looks there are some memoty corruption between 0.8.3 and 0.8.4+.
I add mutex to every BuildProgram, ParseFile and ExecArtifact call. And application still crash after 0.8.3 -> 0.8.4 upgrade.

I found my crashes-after-upgrade root cause: kcl-lang/lib#73 (I have locally applications with differ kcl version).

@Peefy Peefy changed the title ExecArtifact crash on concurrency execution (native API) [Bug] ExecArtifact crash on concurrency execution (native API) May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Issues or PRs related to kcl rust native APIs and multi-lang APIs enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants