Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try a branchless approach for the union of two sorted arrays (array containers) #327

Open
lemire opened this issue Jul 14, 2021 · 1 comment

Comments

@lemire
Copy link
Member

lemire commented Jul 14, 2021

See blog post Faster sorted array unions by reducing branches

Related to PR #301

credit @puzpuzpuz

@lemire lemire changed the title Try a branchless approach for the union of two sorted arrays Try a branchless approach for the union of two sorted arrays (array containers) Jul 14, 2021
@puzpuzpuz
Copy link

puzpuzpuz commented Jul 15, 2021

Run a new real data benchmark with the following changes
puzpuzpuz@f127594 (see #301 for more context)

The benchmark looks like the following (in theory, it may yield to perfect branch prediction if the bitmaps under test are small enough):

func BenchmarkRealDataOr(b *testing.B) {
	benchmarkRealDataAggregate(b, func(bitmaps []*Bitmap) uint64 {
		t := uint64(0)
		for i := 1; i < len(bitmaps); i++ {
			t += Or(bitmaps[i-1], bitmaps[i]).GetCardinality()
		}
		return t
	})
}

Environment: Go 1.16.5, i5-8300h, BENCH_REAL_DATA=1 go test -bench BenchmarkRealDataOr -count 5 -run - | tee new.txt
Baseline: union2by2 implementation that can be found here https://github.com/RoaringBitmap/roaring/blob/ff33c3b226c3ac033bf1a0b0f3ed647fc9cd2efa/setutil_generic.go

$ benchstat --delta-test none -sort -delta  old.txt new.txt
name                                 old time/op  new time/op  delta
RealDataOr/wikileaks-noquotes_srt-8  1.21ms ± 9%  1.15ms ± 0%   -4.75%
RealDataOr/uscensus2000-8             470µs ± 3%   449µs ± 3%   -4.59%
RealDataOr/dimension_008-8           12.5ms ± 1%  12.0ms ± 1%   -4.45%
RealDataOr/dimension_003-8           11.8ms ± 1%  11.6ms ± 1%   -1.80%
RealDataOr/wikileaks-noquotes-8      2.04ms ± 3%  2.01ms ± 0%   -1.77%
RealDataOr/dimension_033-8           1.05ms ± 4%  1.04ms ± 0%   -1.61%
RealDataOr/census1881_srt-8          1.79ms ± 0%  1.78ms ± 0%   -0.12%
RealDataOr/weather_sept_85_srt-8     6.70ms ± 0%  6.79ms ± 0%   +1.37%
RealDataOr/census-income_srt-8       3.08ms ± 0%  3.16ms ± 1%   +2.53%
RealDataOr/census-income-8           3.09ms ± 1%  3.50ms ± 1%  +13.24%
RealDataOr/census1881-8              1.38ms ± 0%  1.62ms ± 0%  +17.44%
RealDataOr/weather_sept_85-8         10.5ms ± 1%  13.2ms ± 0%  +25.56%

Not all of the real data scenarios assume array bitmap representation, but, for instance, weather_sept_85 and census1881 use it. So, it seems to be that this particular branchless code is not worth it. However, this doesn't mean that someone won't be able to improve it and proof the execution time improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants