Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"error in roaringArray.readFrom: did not find expected serialCookie in header" when reading a bitmap written by roaring64 #409

Open
wjohnson-aurora opened this issue Jan 23, 2024 · 5 comments
Labels

Comments

@wjohnson-aurora
Copy link
Contributor

wjohnson-aurora commented Jan 23, 2024

We've occasionally seen the following error when using roaring64.Bitmap.ReadFrom to read data written by roaring64.Bitmap.WriteTo:

error in roaringArray.readFrom: did not find expected serialCookie in header

I was able to find random data that replicates the error. To replicate:

  1. Download the data that replicates the error (5.4 MB of random uint64s): roaring_error_items.txt
  2. Create main.go containing the following code:
package main

import (
	"bufio"
	"bytes"
	"os"
	"strconv"

	"github.com/RoaringBitmap/roaring/roaring64"
)

func main() {
	var items []uint64

	scanner := bufio.NewScanner(os.Stdin)
	for scanner.Scan() {
		line := scanner.Text()

		item, err := strconv.ParseUint(line, 10, 64)
		if err != nil {
			panic(err)
		}

		items = append(items, item)
	}
	if err := scanner.Err(); err != nil {
		panic(err)
	}

	bitmap := roaring64.NewBitmap()
	for _, item := range items {
		bitmap.Add(item)
	}

	var bitmapBuf bytes.Buffer
	if _, err := bitmap.WriteTo(&bitmapBuf); err != nil {
		panic(err)
	}

	readBitmap := roaring64.NewBitmap()
	if _, err := readBitmap.ReadFrom(&bitmapBuf); err != nil {
		panic(err)
	}
}
  1. Create go.mod with the following contents:
module roaring-replication

go 1.20

require github.com/RoaringBitmap/roaring v1.7.0

require (
	github.com/bits-and-blooms/bitset v1.12.0 // indirect
	github.com/mschoch/smat v0.2.0 // indirect
)
  1. Run go mod tidy
  2. Run with:
cat roaring_error_items.txt | go run main.go
  1. Observe error from ReadFrom:
panic: error in roaringArray.readFrom: did not find expected serialCookie in header

goroutine 1 [running]:
main.main()
	main.go:42 +0x27d
wjohnson-aurora added a commit to wjohnson-aurora/roaring-issue-409 that referenced this issue Jan 23, 2024
wjohnson-aurora added a commit to wjohnson-aurora/roaring-issue-409 that referenced this issue Jan 23, 2024
@grantwwu
Copy link

Note: sorting the input file seems to make replication way faster. Also, here's a smaller test case (also found by @wjohnson-aurora)

roaring_error_items_2_sorted.txt

@lemire
Copy link
Member

lemire commented Jan 23, 2024

The bug is absolutely real.

The issue is that when deserializing a 64-bit roaring bitmap, for some reason, the code first tries to deserialize a 32-bit version. I don't know why it is done, but in the instances you have created, it gets confused. It thinks it is dealing with a 32-bit bitmap, and then everything breaks after that.

(Of course, it is not, you serialize a 64-bit roaring bitmap.)

lemire added a commit that referenced this issue Jan 23, 2024
…to deserialize a 32-bit bitmap. It would usually work, but sometimes it would fail (see #409) because the number of containers would be recognized as a cookie.

This PR just disables this attempt at supporting 32-bit bitmaps loading from 64-bit bitmaps.
@lemire
Copy link
Member

lemire commented Jan 23, 2024

Feel free to review my potential fix at #410

Note that the data is not corrupted or any such thing. It is just that the code gets confused at the deserialization stage.

@lemire lemire added the bug label Jan 23, 2024
@lemire
Copy link
Member

lemire commented Jan 23, 2024

In fact, a review would be much appreciated.

@wjohnson-aurora
Copy link
Contributor Author

Thank you for fixing this so quickly! I can confirm that commit 94aeb2b resolves this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants