Improve filtering performance of Stream #440

ButterBright · 2024-04-28T15:51:47Z

The specification of the test environment is:
OS: Ubuntu 22.04.3 LTS x86_64
CPU: AMD EPYC 7282 (4) @ 2.794GHz
Memory: 7951MiB

The parameters for the benchmark test are:
{batchCount: 2, timestampCount: 500, seriesCount: 100, tagCardinality: 10, startTimestamp: 1, endTimestamp: 1000, scenario: "large-scale"}
{batchCount: 2, timestampCount: 500, seriesCount: 100, tagCardinality: 10, startTimestamp: 900, endTimestamp: 1000, scenario: "latest"}
{batchCount: 2, timestampCount: 500, seriesCount: 100, tagCardinality: 10, startTimestamp: 300, endTimestamp: 400, scenario: "historical"}

Query benchmark results before optimization:
BenchmarkFilter/filter-4 1 4685496299 ns/op 2221001152 B/op 9874080 allocs/op
BenchmarkFilter/filter#01-4 1000000000 0.4689 ns/op 0 B/op 0 allocs/op
BenchmarkFilter/filter#02-4 1 1107566728 ns/op 221140904 B/op 990019 allocs/op

Query benchmark results after optimization:
BenchmarkFilter/filter-large-scale-4 1000000000 0.06766 ns/op 0 B/op 0 allocs/op
BenchmarkFilter/filter-latest-4 1000000000 0.01630 ns/op 0 B/op 0 allocs/op
BenchmarkFilter/filter-historical-4 1000000000 0.01633 ns/op 0 B/op 0 allocs/op

Memory allocation details of Filter before optimization:

Memory allocation details of Filter after optimization:

Memory allocation details of Pull after optimization:

The improvement is significant and evident.

If this pull request closes/resolves/fixes an existing issue, replace the issue number. Fixes [BanyanDB-Server] Improve sorting and filtering performance of Stream skywalking#12160.

wu-sheng · 2024-04-28T16:05:32Z

Are we going to include this into 0.6?

wu-sheng · 2024-04-29T00:11:24Z

@ButterBright Please fix e2e.

wu-sheng · 2024-04-30T04:02:44Z

I will leave this @hanahmily whether we include this into 0.6 or we should cut 0.6 first and do more tests for this commit.

hanahmily · 2024-04-30T04:42:09Z

I will leave this @hanahmily whether we include this into 0.6 or we should cut 0.6 first and do more tests for this commit.

I prefer to hold this commitment for a while. Since this improvement is a performance issue, we should introduce benchmark testing in addition to traditional UTs and E2E testing.

banyand/stream/block.go

banyand/stream/filter.go

banyand/stream/block.go

banyand/stream/filter.go

wu-sheng · 2024-05-15T07:29:03Z

@ButterBright @hanahmily Let's think how we could build tests to verify this? @hanahmily Is your local benchmark suitable for this case?

hanahmily · 2024-05-15T08:48:42Z

@ButterBright @hanahmily Let's think how we could build tests to verify this? @hanahmily Is your local benchmark suitable for this case?

Existing stress tests could also verify it, but we need to use a lightweight benchmark suite to do this job. I have discussed with @ButterBright to add some benchmarks to the package.

* Add order_asc and order_desc cases for stream

* Polish stream query and filtering

…andb into dev

hanahmily · 2024-05-23T11:31:50Z

banyand/measure/query.go

+		blankCursorList := []int{}
+		var mu sync.Mutex
+		var wg sync.WaitGroup


You could use a single channel to implement it:

resultsChan := make(chan int, len(qr.data)) ... if qr.loadData(i, tmpBlock) { if qr.orderByTimestampDesc(i) { qr.data[i].idx = len(qr.data[i].timestamps) - 1 } resultsChan <- -1 // Indicate success } else { resultsChan <- i // Indicate failure with the index } ... var blankCursorList []int completed := 0 // Process results from all goroutines. for completed < len(qr.data) { result := <-resultsChan if result != -1 { blankCursorList = append(blankCursorList, result) } completed++ } close(resultsChan) // Close the channel as we're done with it.

hanahmily · 2024-05-23T11:36:01Z

banyand/measure/query.go

-				qr.data[i].idx = len(qr.data[i].timestamps) - 1
-			}
+			wg.Add(1)
+			go func(i int) {


Please limit the maximum number of goroutines to twice the number of CPU cores specified by the GOMAXPROCS. You can find more information about GOMAXPROCS here.

hanahmily · 2024-05-23T11:39:33Z

banyand/stream/block.go

-		return false
+	idxList := make([]int, 0)
+	var start, end int
+	if applyFilter {


The "applyFilter" is unnecessary. Instead, you can simply check if "expectedTimestamps" is equal to nil.

hanahmily · 2024-05-23T11:44:14Z

banyand/stream/index.go

+		if pl.IsEmpty() {
+			continue
+		}


Please move it up a bit.

hanahmily · 2024-05-23T11:45:29Z

banyand/stream/query.go

-		defer releaseBlock(tmpBlock)
+		blankCursorList := []int{}
+		var mu sync.Mutex
+		var wg sync.WaitGroup


The same as the measure's query.

hanahmily · 2024-05-23T11:51:50Z

@ButterBright The benchmark result shows significant improvement. Can you analyze the benchmark to understand why the allocation is 0? This could provide valuable insights.

ButterBright · 2024-05-23T12:14:31Z

Sure.

ButterBright added 2 commits April 28, 2024 17:45

Improve filtering performance of Stream

bfdcdfb

Fix lint

48f93fd

wu-sheng requested a review from hanahmily April 28, 2024 16:04

wu-sheng added the enhancement New feature or request label Apr 28, 2024

ButterBright added 2 commits April 30, 2024 05:27

Fix e2e test

2b9b058

Merge remote-tracking branch 'upstream/main' into dev

2fc423a

ButterBright added 3 commits April 30, 2024 13:59

Fixup

7252fae

Merge remote-tracking branch 'upstream/main' into dev

39502c3

Refactor Lookup method calls in stream filtering and sorting

4555819

hanahmily requested changes May 11, 2024

View reviewed changes

wu-sheng added this to the 0.7.0 milestone May 11, 2024

ButterBright added 2 commits May 15, 2024 05:29

Fixup

3dd03b5

Merge remote-tracking branch 'upstream/main' into dev

3b27de0

ButterBright and others added 9 commits May 15, 2024 16:50

* Load blocks concurrently

f8ed73d

* Add order_asc and order_desc cases for stream

Merge branch 'main' into dev

df374c2

Fix unit test

67a957d

Merge remote-tracking branch 'upstream/main' into dev

c367388

* Update benchmark tests

fffc817

* Polish stream query and filtering

Merge branch 'dev' of https://github.com/ButterBright/skywalking-bany…

42e2046

…andb into dev

Merge branch 'main' into dev

193fa50

Fix lint

4dea396

Merge branch 'dev' of https://github.com/ButterBright/skywalking-bany…

cdee686

…andb into dev

hanahmily requested changes May 23, 2024

View reviewed changes

ButterBright and others added 2 commits May 23, 2024 17:03

Fixup

90e4d5f

Merge branch 'main' into dev

e3eef2c

hanahmily approved these changes May 24, 2024

View reviewed changes

hanahmily merged commit 715674c into apache:main May 24, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve filtering performance of Stream #440

Improve filtering performance of Stream #440

ButterBright commented Apr 28, 2024 •

edited

wu-sheng commented Apr 28, 2024

wu-sheng commented Apr 29, 2024

wu-sheng commented Apr 30, 2024

hanahmily commented Apr 30, 2024

wu-sheng commented May 15, 2024

hanahmily commented May 15, 2024

hanahmily May 23, 2024

hanahmily May 23, 2024

hanahmily May 23, 2024

hanahmily May 23, 2024

hanahmily May 23, 2024

hanahmily commented May 23, 2024

ButterBright commented May 23, 2024

Improve filtering performance of Stream #440

Improve filtering performance of Stream #440

Conversation

ButterBright commented Apr 28, 2024 • edited

wu-sheng commented Apr 28, 2024

wu-sheng commented Apr 29, 2024

wu-sheng commented Apr 30, 2024

hanahmily commented Apr 30, 2024

wu-sheng commented May 15, 2024

hanahmily commented May 15, 2024

hanahmily May 23, 2024

Choose a reason for hiding this comment

hanahmily May 23, 2024

Choose a reason for hiding this comment

hanahmily May 23, 2024

Choose a reason for hiding this comment

hanahmily May 23, 2024

Choose a reason for hiding this comment

hanahmily May 23, 2024

Choose a reason for hiding this comment

hanahmily commented May 23, 2024

ButterBright commented May 23, 2024

ButterBright commented Apr 28, 2024 •

edited