Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support re-use of existing arrays if of same size. #30

Open
ImbanessItself opened this issue Jun 21, 2017 · 6 comments
Open

Support re-use of existing arrays if of same size. #30

ImbanessItself opened this issue Jun 21, 2017 · 6 comments

Comments

@ImbanessItself
Copy link

This is Java-specific, I'm not sure how other generations work. When unmarshalling arrays (any list or byte array) a new array is always constructed. This needlessly creates garbage when the old array has the exact same size as the new array and can be directly re-used. This can be a common use-case for some applications (fixed size images or buffers, for example).

@pascaldekloe
Copy link
Owner

Great idea and this could work for several languages too.

First a word of warning for reuse. For plain serialization Colfer is a diff against the zero value. That means fields with the zero value are skipped. For example when you reuse an object and the serialized data contains an integer field with the value 0 then that field would keep its previous value. This could also be used to support versioning as one could concatenate the diffs and have self contained document that holds its entire history.

Garbage reuse is seen with C++ under the name memory arena. In Java it could be done like the following.

/** Returns a new or reused instance. */
public static T newInstance()

/**
 * Releases the object including all field data for reuse.
 * Caution! Access to any of the mutable fields after
 * calling {@code free} may cause severe bugs.
 */
void free()

@ImbanessItself
Copy link
Author

The API is to allow existing references to instances to keep unmutated data? I think it's reasonable to assume that if you re-use an object by calling unmarshal on it then there is no guarantee that existing data will remain the same.

@pascaldekloe
Copy link
Owner

Fair point. I'm about to drop this diff / skip feature in the the upcoming format revision.

@inaneverb
Copy link

Hi there. Any update on this issue?

I have a bunch of objects, 90% size of which is binary data. The rest 10% is some metadata. Seeing an allocation with copying that slice during unmarshaling makes me sad. It's up to 10x slower than marshalling and sync.Pool in that case looks like useless toy.

Would it be bad if I'd try to implement this feature but only for Go?

@pascaldekloe
Copy link
Owner

Go-only (or any other language for that matter) is no problem @qioalice. Reading back my comments made me realise that the issue is not explained well.

The unmarshal methods in Go only set the fields present in the serial. For example, when a JSON document is missing some property, then the respective field value remains untouched, i.e. it keeps the previous value. Therefore unmarshal should only be used on new (zero) structs, unless you know what you are doing.

With Colfer, you could store the full revision history of a document quite efficiently with a just sequence of serials. Start with a zero struct (version 0) and then for each unmarshal you get the next version, all the way until the latest state. I did this for a project. Others may have too.

We need a way to declare the field data as free—ready to reuse. If any mutable field-data is still used after such free, then people may run into some serious trouble. I have a client who ran into a similar issue just this month. Even when you clearly document not to retain any data, some will still do so by accident. Such bugs can be hard to understand, especially if the behaviour is hidden in some generated code. 😬

I propose a very explicit API for such functionality so that the respective code advertises what is going on.

func (o *T) ReleaseAllForRecyle() {

… and:

func UnmarshalWithRecycling(serial []byte) (*T, error) {

@pascaldekloe
Copy link
Owner

The concurrency needed to make UnmarshalWithRecycling 👆 work may outweigh the savings made for many use-cases.

Then there is the memory "leak" issue. ColferSizeMax does help here, but most people will skip the math. 😄

All and all, lots of documentation will be required to describe good appliance of memory reuse.

Much simpler would be to leave the object reuse to the user. A func (o *T) UnmarshalWithDangerousReuse(serial []byte) error may be a better fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants