Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test TestBatchProcessor/ForceFlush/ErrorPartialFlush #5342

Open
pellared opened this issue May 9, 2024 · 2 comments
Open

Flaky test TestBatchProcessor/ForceFlush/ErrorPartialFlush #5342

pellared opened this issue May 9, 2024 · 2 comments
Labels
area:logs Part of OpenTelemetry logs bug Something isn't working help wanted Extra attention is needed pkg:SDK Related to an SDK package

Comments

@pellared
Copy link
Member

pellared commented May 9, 2024

go test -timeout 60s -short ./sdk/log/...
panic: test timed out after 1m0s
running tests:
	TestBatchProcessor (1m0s)
	TestBatchProcessor/ForceFlush (1m0s)
	TestBatchProcessor/ForceFlush/ErrorPartialFlush (1m0s)

goroutine 229 [running]:
testing.(*M).startAlarm.func1()
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:2366 +0x38c
created by time.goFunc
	/opt/hostedtoolcache/go/1.22.3/x64/src/time/sleep.go:177 +0x2f

goroutine 1 [chan receive, 1 minutes]:
testing.(*T).Run(0x9542008, {0x8287f28, 0x12}, 0x8298f74)
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:1750 +0x3d8
testing.runTests.func1(0x9542008)
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:2161 +0x45
testing.tRunner(0x9542008, 0x9523dfc)
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:1689 +0x125
testing.runTests(0x9410260, {0x8439600, 0x2a, 0x2a}, {0xc1875a91adb24fc1, 0xdf8516329, 0x843c1a0})
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:2159 +0x39d
testing.(*M).Run(0x9472370)
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:2027 +0x6d8
main.main()
	_testmain.go:137 +0x141

goroutine 15 [chan receive]:
testing.(*T).Run(0x9542a08, {0x8285381, 0xa}, 0x94102a0)
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:1750 +0x3d8
go.opentelemetry.io/otel/sdk/log.TestBatchProcessor(0x9542a08)
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/batch_test.go:317 +0x215
testing.tRunner(0x9542a08, 0x8298f74)
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:1689 +0x125
created by testing.(*T).Run in goroutine 1
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:1742 +0x3b9

goroutine 258 [chan receive]:
testing.(*T).Run(0x9542408, {0x82879b5, 0x11}, 0x94a80b0)
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:1750 +0x3d8
go.opentelemetry.io/otel/sdk/log.TestBatchProcessor.func7(0x9542408)
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/batch_test.go:343 +0xe9
testing.tRunner(0x9542408, 0x94102a0)
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:1689 +0x125
created by testing.(*T).Run in goroutine 15
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:1742 +0x3b9

goroutine 17 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.exportSync.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter.go:129 +0xf2
created by go.opentelemetry.io/otel/sdk/log.exportSync in goroutine 16
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter.go:127 +0xce

goroutine 18 [select]:
go.opentelemetry.io/otel/sdk/log.(*BatchProcessor).poll.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/batch.go:145 +0x119
created by go.opentelemetry.io/otel/sdk/log.(*BatchProcessor).poll in goroutine 16
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/batch.go:140 +0x15f

goroutine 20 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.run.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:63 +0x9a
created by go.opentelemetry.io/otel/sdk/log.run in goroutine 19
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:59 +0x81

goroutine 35 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.run.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:63 +0x9a
created by go.opentelemetry.io/otel/sdk/log.run in goroutine 34
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:59 +0x81

goroutine [178](https://github.com/open-telemetry/opentelemetry-go/actions/runs/9019965885/job/24784041133?pr=5338#step:4:179) [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.run.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:63 +0x9a
created by go.opentelemetry.io/otel/sdk/log.run in goroutine 177
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:59 +0x81

goroutine 208 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.run.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:63 +0x9a
created by go.opentelemetry.io/otel/sdk/log.run in goroutine 207
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:59 +0x81

goroutine 242 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.TestBatchProcessor.func7.2(0x9644108)
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/batch_test.go:394 +0x51c
testing.tRunner(0x9644108, 0x94a80b0)
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:1689 +0x125
created by testing.(*T).Run in goroutine 258
	/opt/hostedtoolcache/go/1.22.3/x64/src/testing/testing.go:1742 +0x3b9

goroutine 212 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.run.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:63 +0x9a
created by go.opentelemetry.io/otel/sdk/log.run in goroutine 211
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:59 +0x81

goroutine 223 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.run.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:63 +0x9a
created by go.opentelemetry.io/otel/sdk/log.run in goroutine 222
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:59 +0x81

goroutine 245 [select, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.(*BatchProcessor).poll.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/batch.go:145 +0x119
created by go.opentelemetry.io/otel/sdk/log.(*BatchProcessor).poll in goroutine 242
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/batch.go:140 +0x15f

goroutine 227 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.run.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:63 +0x9a
created by go.opentelemetry.io/otel/sdk/log.run in goroutine 226
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:59 +0x81

goroutine 219 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.run.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:63 +0x9a
created by go.opentelemetry.io/otel/sdk/log.run in goroutine [218](https://github.com/open-telemetry/opentelemetry-go/actions/runs/9019965885/job/24784041133?pr=5338#step:4:219)
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:59 +0x81

goroutine 260 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.run.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:63 +0x9a
created by go.opentelemetry.io/otel/sdk/log.run in goroutine 259
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:59 +0x81

goroutine 243 [chan receive, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.run.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:63 +0x9a
created by go.opentelemetry.io/otel/sdk/log.run in goroutine 242
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:59 +0x81

goroutine 244 [select, 1 minutes]:
go.opentelemetry.io/otel/sdk/log.(*testExporter).Export(0x964a030, {0x82ddb20, 0x98dc000}, {0x979f448, 0x1, 0x1})
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter_test.go:86 +0x119
go.opentelemetry.io/otel/sdk/log.(*timeoutExporter).Export(0x94a80f0, {0x82dda08, 0x8468620}, {0x979f448, 0x1, 0x1})
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter.go:119 +0xa3
go.opentelemetry.io/otel/sdk/log.chunkExporter.Export({{0x82dd7d4, 0x94a80f0}, 0x1}, {0x82dda08, 0x8468620}, {0x979f448, 0x1, 0x1})
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter.go:89 +0xda
go.opentelemetry.io/otel/sdk/log.exportData.DoExport({{0x82dda08, 0x8468620}, {0x979f448, 0x1, 0x1}, 0x0}, 0x94dcfdc)
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter.go:156 +0x50
go.opentelemetry.io/otel/sdk/log.exportSync.func1()
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter.go:130 +0xde
created by go.opentelemetry.io/otel/sdk/log.exportSync in goroutine [242](https://github.com/open-telemetry/opentelemetry-go/actions/runs/9019965885/job/24784041133?pr=5338#step:4:243)
	/home/runner/work/opentelemetry-go/opentelemetry-go/sdk/log/exporter.go:127 +0xce
FAIL	go.opentelemetry.io/otel/sdk/log	60.010s
ok  	go.opentelemetry.io/otel/sdk/log/logtest	0.003s
FAIL
@pellared pellared added bug Something isn't working pkg:SDK Related to an SDK package area:logs Part of OpenTelemetry logs labels May 9, 2024
@MrAlias MrAlias added the help wanted Extra attention is needed label May 9, 2024
@amanakin
Copy link
Contributor

Hi folks!
Digged a little bit and found race condition in test exporter
Here we waiting when one export starts (and locks because of ExportTrigger higher in this func).

assert.Eventually(t, func() bool {

But also we assume, that another record already in buffer chan (because if buffer chan is empty, ForceFlush will write records to it and won't return expected error. That is what happens when test fails).
input buffer chan:
input := make(chan exportData, size)

However, in this goroutine when EnqueueExport with first record ended and Exporter fetched it, it is not necessary that second record will be in buffer chan.

To reproduce failed test try this from sdk/log/ dir:
go test -v -timeout 10s -short -count=100 ./... -run='TestBatchProcessor/ForceFlush/ErrorPartialFlush'

To fix this, for example, we could add trigger to assert.Eventually, that buffer chan is full. Or make buffer chan size configurable (see

// TODO: explore making the size of this configurable.
) and set size 0.

(In this particular test with newBufferExporter(exporter, 0) test always completes successfully)

@amanakin
Copy link
Contributor

@MrAlias @pellared What do you think about adding BatchOption for BufferExporter size (like mentioned in #5238)? With that we could fix this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logs Part of OpenTelemetry logs bug Something isn't working help wanted Extra attention is needed pkg:SDK Related to an SDK package
Projects
Status: Todo
Development

No branches or pull requests

3 participants