New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve filtering performance of Stream #440
Conversation
Are we going to include this into 0.6? |
@ButterBright Please fix e2e. |
I will leave this @hanahmily whether we include this into 0.6 or we should cut 0.6 first and do more tests for this commit. |
I prefer to hold this commitment for a while. Since this improvement is a performance issue, we should introduce benchmark testing in addition to traditional UTs and E2E testing. |
@ButterBright @hanahmily Let's think how we could build tests to verify this? @hanahmily Is your local benchmark suitable for this case? |
Existing stress tests could also verify it, but we need to use a lightweight benchmark suite to do this job. I have discussed with @ButterBright to add some benchmarks to the package. |
* Add order_asc and order_desc cases for stream
* Polish stream query and filtering
banyand/measure/query.go
Outdated
blankCursorList := []int{} | ||
var mu sync.Mutex | ||
var wg sync.WaitGroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use a single channel to implement it:
resultsChan := make(chan int, len(qr.data))
...
if qr.loadData(i, tmpBlock) {
if qr.orderByTimestampDesc(i) {
qr.data[i].idx = len(qr.data[i].timestamps) - 1
}
resultsChan <- -1 // Indicate success
} else {
resultsChan <- i // Indicate failure with the index
}
...
var blankCursorList []int
completed := 0
// Process results from all goroutines.
for completed < len(qr.data) {
result := <-resultsChan
if result != -1 {
blankCursorList = append(blankCursorList, result)
}
completed++
}
close(resultsChan) // Close the channel as we're done with it.
qr.data[i].idx = len(qr.data[i].timestamps) - 1 | ||
} | ||
wg.Add(1) | ||
go func(i int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please limit the maximum number of goroutines to twice the number of CPU cores specified by the GOMAXPROCS. You can find more information about GOMAXPROCS here.
banyand/stream/block.go
Outdated
return false | ||
idxList := make([]int, 0) | ||
var start, end int | ||
if applyFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "applyFilter" is unnecessary. Instead, you can simply check if "expectedTimestamps" is equal to nil.
if pl.IsEmpty() { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move it up a bit.
banyand/stream/query.go
Outdated
defer releaseBlock(tmpBlock) | ||
blankCursorList := []int{} | ||
var mu sync.Mutex | ||
var wg sync.WaitGroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as the measure's query.
@ButterBright The benchmark result shows significant improvement. Can you analyze the benchmark to understand why the allocation is 0? This could provide valuable insights. |
Sure. |
The specification of the test environment is:
OS: Ubuntu 22.04.3 LTS x86_64
CPU: AMD EPYC 7282 (4) @ 2.794GHz
Memory: 7951MiB
The parameters for the benchmark test are:
{batchCount: 2, timestampCount: 500, seriesCount: 100, tagCardinality: 10, startTimestamp: 1, endTimestamp: 1000, scenario: "large-scale"}
{batchCount: 2, timestampCount: 500, seriesCount: 100, tagCardinality: 10, startTimestamp: 900, endTimestamp: 1000, scenario: "latest"}
{batchCount: 2, timestampCount: 500, seriesCount: 100, tagCardinality: 10, startTimestamp: 300, endTimestamp: 400, scenario: "historical"}
Query benchmark results before optimization:
BenchmarkFilter/filter-4 1 4685496299 ns/op 2221001152 B/op 9874080 allocs/op
BenchmarkFilter/filter#01-4 1000000000 0.4689 ns/op 0 B/op 0 allocs/op
BenchmarkFilter/filter#02-4 1 1107566728 ns/op 221140904 B/op 990019 allocs/op
Query benchmark results after optimization:
BenchmarkFilter/filter-large-scale-4 1000000000 0.06766 ns/op 0 B/op 0 allocs/op
BenchmarkFilter/filter-latest-4 1000000000 0.01630 ns/op 0 B/op 0 allocs/op
BenchmarkFilter/filter-historical-4 1000000000 0.01633 ns/op 0 B/op 0 allocs/op
Memory allocation details of Filter before optimization:
Memory allocation details of Filter after optimization:
Memory allocation details of Pull after optimization:
The improvement is significant and evident.