Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about maps ? #16

Open
stephanenicolas opened this issue Jan 14, 2017 · 9 comments
Open

What about maps ? #16

stephanenicolas opened this issue Jan 14, 2017 · 9 comments

Comments

@stephanenicolas
Copy link

It would be nice to serialize maps, maybe as a simple 2D array ?

@pascaldekloe
Copy link
Owner

To keep the unmarshaller performant you'd need access to the key-value pairs one by one. Generally speaking maps cause performance degradation and they are overused with I/O out of lazyness.

I'd like to make the choice a compilation option such that one can have maps only where needed and only when possible for the respective language.

  • C has no native maps
  • Go has limited struct key support
  • JavaScript requires string keys

How about a compiler flag to specify the key in struct lists? When the struct has two fields then the other field will be used as a value. Otherwise the entire struct is.

@stephanenicolas
Copy link
Author

stephanenicolas commented Jan 14, 2017 via email

@pascaldekloe
Copy link
Owner

I like this read-only map idea. The problem is that unmarshaling would rely on the data for the uniqueness constraint which goes against the resilient against mallicious input principal. Imagine a Map of size n with a #keySet() of size n - 1.
Plus some languages may not be able to deliver uniquene keys for some types without a ton of work. The development has been just me in my spare time until now.

What I can do is make this work for Java as stated before. Duplicate keys would disappear (in the HashMap) during the unmarshaling. The struct list wire format has a size prefix which should help allocation.

On second thought with the forward compatibility in mind this value extraction idea for the cases with 2 struct fields might be a bit whobly b.t.w..

@mzaks
Copy link

mzaks commented Mar 11, 2017

It would be possible to do the same trick as FlatBuffers does. Annotate one field of the struct as key and if another struct has a field which is list of structs with key annotated struct, it will unmarshal it as a Map. When you marshal such a struct with dictionary you can marshal just the array of values. Performance should be fine I guess.

@pascaldekloe
Copy link
Owner

Sounds good @mzaks. With the key declaration in the schema the compiler can make explicit declarations about it's uniqueness on platforms without map support too.

@mzaks
Copy link

mzaks commented Mar 13, 2017

One more consideration from my side. While FlatBuffers is marshalling, it sorts the array by key and it provides accessor by key methods which internally do binary search on the array to have a fast lookup. This way FlatBuffers mimics map semantics by providing O log(n) for average lookup time.

This strategy totally makes sense in FlatBuffers but is a bit more "complicated" for colfer.
In FlatBuffers there is no unmarshmaling step. Buffers are build in the way which gives user random access directly into the buffer. This is why lookup by key is a cool feature to have. And it also implies that the data is read only.

Colfer has an unmarshmaling step and the resulting objects are mutable. It still fits for languages which have a natural Map concept like for example Java and JavaScript, but for languages which don't have such concept e.g. C the generated struct would need to hide the backing arrays and provide add, remove, get, has accessor methods. Which internally perform binary search and keep the underlying array sorted. It also implies that even for languages which support Map, the marshalling step will be slower because they would need to store the array sorted by key.

All in all it is possible to implement maps in colfer, but IMHO, there should be a cost/value consideration. I see colfer as a format to describe data transfer objects. AFAIK maps are not common for data transfer objects.

@pascaldekloe
Copy link
Owner

Interesting @mzaks! So by offering only those 4 map operators you bypass the issue of malformed data.

Sorted keys do give better performance with unmarshalling, especially when you're using an array instead of tree nodes. It is quite easy to detect unsorted data in which case you sort after unmarshaling the entire list.

Writing your own map alike functions is quite a bit more work yet it seems like the right thing to do ™️. Go's sort.Interface and java.utils.Arrays#sort should help a little.

@mzaks
Copy link

mzaks commented Mar 13, 2017

Well you can even use SortedMap implementation in languages which support it, like for example Javas TreeMap. In JavaScript it would imply foot work.

@pascaldekloe
Copy link
Owner

The problem with java.util.(Sorted)Map is that it offers methods such as #size() and #keySet() which require the map to be consistent all the time. If not users could suffer from serious bugs and even security issues. Sure java.util.TreeMap does that yet it is notoriously slow and the unmarshaller needs another error for receiving duplicates. All Java collections (including the hash based) allocate memory like it's Christmas. With the array backing sorted set you can do operations on the array directly and sort once if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants