Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree can't serialize []byte as keys #113

Open
cryptix opened this issue Jun 13, 2019 · 5 comments
Open

Tree can't serialize []byte as keys #113

cryptix opened this issue Jun 13, 2019 · 5 comments

Comments

@cryptix
Copy link

cryptix commented Jun 13, 2019

I have a large dataset of 32byte hashes that I need to search. Loading and searching worked but when I try to serialize the tree to JSON it can't be reloaded because the keys get turned into "[1 2 3 4 5]", I guess it uses fmt.Sprintf("%v", s). The right thing to do would be to marshal to base64.

This is bad. Why is there a custom json marshaller and also: why not just implement https://godoc.org/encoding#BinaryMarshaler? Then you get all kinds of marshals for free as well.

@emirpasic
Copy link
Owner

@cryptix

most serialization is done using the native (de)serialization, e.g. https://github.com/emirpasic/gods/blob/master/lists/arraylist/serialization.go , however in some cases i needed more control of the (de)serialization and needed to custom implement it. this implementation does not differ much from how other libraries implement it, it's really nothing magical about it, just basic string/binary manipulations.

could you give me a simple example, so i can reproduce your problem and please describe what the expected output should be.

i am not sure of what you mean by 32-byte hashes, but i'll assume you are talking about an array of bytes? there isn't a general consensus on how to (de)serialize binary data into JSON. you mention base64, why? it could also be something else, e.g. base32 or hexadecimal string representation. this project tries to stay unopinionated with that regard, flexible and generic, i.e. if you need base64 (i like it too), but that doesn't mean that everyone is working with base64.

there are examples here how to implement a custom (de)serialization that suites your needs and i can point you in the right direction if you'd like and if i understood your problem.

cheers

@cryptix
Copy link
Author

cryptix commented Jun 13, 2019

Hey @emirpasic thanks for the quick response! I only mentioned base64 since that is what the stdlib does when it encounters a []byte in a struct field when marshaling to JSON. I wouldn't mind a more compact format, gob would also suffice for instance. I just need it to persist for my own application when shutting down/starting again, it's not meant as an exchange format or anything.

Here is some contrived code with hardcoded examples of what I tried before opening the issue. I just shortened the values because that doesn't mattter.

func main() {
        // returns a []byte of the same size as the keys for the  tree
	msg, err := ssb.ParseMessageRef(os.Args[1])
	check(err)

	tree := btree.NewWith(3, bytesCompare)

	b, err := ioutil.ReadFile("tree.json")
	if os.IsNotExist(err) {
		fmt.Println("rebuilding tree...")
		ctx := context.TODO()

		vals := []struct {
			Hash []byte
			Int  int
		}{
			{[]byte{1, 2, 3, 4, 5}, 23},
			{[]byte{6, 6, 6, 6, 6}, 42},
			{[]byte{0, 0, 0, 0, 1}, 131},
			{[]byte{0, 0, 0, 0, 2}, 555},
		}

		start := time.Now()
		i := 0
		for i, val := range vals {

			tree.Put(val.Hash, val.Int)
			i++
		}

		fmt.Printf("building took %v for %d elements\n", time.Since(start), i)
		fmt.Println("size:", tree.Size())

		b, err := tree.ToJSON()
		check(err)
		ioutil.WriteFile("tree.json", b, 0700)
	} else {
		err = tree.FromJSON(b)
		check(err)
		fmt.Println("loaded existing tree")
	}

	kv, ok := tree.Get(msg.Hash)
	fmt.Println(ok, kv)

}

func bytesCompare(a, b interface{}) int {
	bytesA := a.([]byte)
	bytesB := b.([]byte)
	return bytes.Compare(bytesA, bytesB)
}

@emirpasic
Copy link
Owner

@cryptix i will look into this later with more detail and propose a solution or fix the bug if so.

however, judging from what you need, i would suggest a custom binary (de)serialization for this unless you necessarily need it to be json for readability? binary would allow make the (de)serialization a lot faster and more space efficient, etc. you could also implement custom compression this way to make it even smaller (varints, delta-compression, etc.).

i intended to implement a binary (de)serialization for all the structures, but unfortunately never found the time.

i frankly have no strong opinion on how to json-encode binary data and i am not sure if there a "one shoe fits all" solution to this. another issue i am having is if i encode it as string (e.g. exact hexadecimal representation or base64, etc.) then how would the deserialization know that this should be deserialized as an array of bytes, it only sees it as a string?

this all could be solved with the binary (de)serialization i was planning on doing that would hold some metadata internally in order to deserialize to the same types that were serialized.

@cryptix
Copy link
Author

cryptix commented Jun 13, 2019

Yup, binary would totally work for me! If you point me to an example of that I'm sure I can take the next steps by myself.

@emirpasic
Copy link
Owner

@cryptix

here is one of the few examples where i had to implement a custom serialization in order to have more control over ordering of keys in an ordered map and the default behavior did not suffice.

https://github.com/emirpasic/gods/blob/master/maps/linkedhashmap/serialization.go

it basically writes and reads text (json), so it should be easy to implement reading/writing of binary data instead. as you suggested, make use of the native BinaryMarshaler and BinaryUnmarshaler to conform to the go's ideology and keep things simple.

please let me know how this goes and paste the code here so we can make use of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants