Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow turning a few optimisations off for Kryo 5.4.0 #943

Open
chrisr3 opened this issue Mar 19, 2023 · 20 comments
Open

Allow turning a few optimisations off for Kryo 5.4.0 #943

chrisr3 opened this issue Mar 19, 2023 · 20 comments

Comments

@chrisr3
Copy link
Contributor

chrisr3 commented Mar 19, 2023

We are upgrading our code-base from Kryo 4.0.2 to 5.4.0, and have hit an issue when serialising a Collection that only contains instances of Collections$UnmodifiableRandomAccessList.

We have sub-classed Kryo to support serialising classes with writeResolve and readReplace methods; UnmodifiableRandomAccessList implements writeResolve like this:

private Object writeReplace() {
    return new UnmodifiableList<>(list);
}

Kryo 5's CollectionSerializer seems to have an optimisation for the case when all of a collection's elements are of the same type, whereby Kryo serializes the element type once, and then serializes all of the elements' object data afterwards. Unfortunately, UnmodifiableRandomAccessList.writeResolve transforms the elements into instances of UnmodifiableList, but Kryo is unable to discover this until it has invoked writeResolve at least once. This happens after it has incorrectly written UnmodifiableRandomAccessList as the element type to the output stream, of course.

I cannot see a way of working around this apart from ignoring UnmodifiableRandomAccessList.writeResolve, although I cannot ignore every writeResolve method because that would break lambda serialization. Ideally, I'd like to be able to configure CollectionsSerializer not to apply this optimisation at all.

@theigl
Copy link
Collaborator

theigl commented Mar 20, 2023

Hi @chrisr3. Could you create a minimal reproducer (ideally a unit test) for the issue you are facing?

I'm open to adding a flag to CollectionSerializer, but there might be other solutions.

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 21, 2023

This sample project should demonstrate what is happening: kryo-example.tar.gz

@theigl
Copy link
Collaborator

theigl commented Mar 21, 2023

OK thanks!

I quickly glanced over the code:

Do you really need to globally override Kryo's behavior? Or would it be enough to create a custom serializer that uses writeResolve/readReplace internally and register it only for the classes where you actually want this behavior?

Lambda serialization is handled by ClosureSerializer that uses readResolve/writeReplace internally. So lambda serialization won't break if you change your global serialization logic.

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 21, 2023

I think the problem there is that any class (including unknown "third party" ones) can potentially implement readReplace or writeResolve, so we cannot possibly know them all in advance.

@theigl
Copy link
Collaborator

theigl commented Mar 21, 2023

So you are serializing arbitrary classes in your case and have no control over which classes are serialized?

We could add a flag to CollectionSerializer that disables the optimization, but I'm not sure if any other Kryo user will ever need it.

If you want, you can create a PR for this and we can discuss it further.

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 21, 2023

We could add a flag to CollectionSerializer that disables the optimization, but I'm not sure if any other Kryo user will ever need it.

Maybe a better and more general solution would be to allow us to override Kryo's built-in CollectionSerializer with one of our own?

@theigl
Copy link
Collaborator

theigl commented Mar 21, 2023

Can you try this:

addDefaultSerializer(Collection.class, MyCollectionSerializer.class);

I'm not 100% sure about the ordering of serializers right now, but this might work.

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 21, 2023

Hmm, interesting. I wouldn't expect that to work because the built-in CollectionSerializer is already registered for Collection.class and so wouldn't MyCollectionSerialzier have lower precedence?

Is there a detailed explanation anywhere of how Kryo determines serializer precedence please?

@theigl
Copy link
Collaborator

theigl commented Mar 21, 2023

Default serializers registered after the built-in serializers should take priority. I just did a quick test and it looks like MyCollectionSerializer is chosen over CollectionSerializer. Please give it a try.

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 21, 2023

Yes, I agree. However, I have now discovered that my example code is actually testing the ObjectArraySerializer and not the CollectionSerializer 🤦. It looks like both of these default serializers have this optimisation, and hence will need overriding. Are there any others I need to worry about please?


I have successfully added MyCollectionSerializer, but I also needed to re-add some other built-in serializers in order to prevent it from serializing all collection types:

addDefaultSerializer(EnumSet.class, EnumSetSerializer.class);
addDefaultSerializer(Collections.EMPTY_LIST.getClass(), CollectionsEmptyListSerializer.class);
addDefaultSerializer(Collections.EMPTY_SET.getClass(), CollectionsEmptySetSerializer.class);
addDefaultSerializer(Collections.singletonList(null).getClass(), CollectionsSingletonListSerializer.class);
addDefaultSerializer(Collections.singleton(null).getClass(), CollectionsSingletonSetSerializer.class);
addDefaultSerializer(TreeSet.class, TreeSetSerializer.class);
addDefaultSerializer(Collection.class, MyCollectionSerializer.class);

That's a little bit arcane... 🧙

@theigl
Copy link
Collaborator

theigl commented Mar 22, 2023

Yes, I agree. However, I have now discovered that my example code is actually testing the ObjectArraySerializer and not the CollectionSerializer 🤦. It looks like both of these default serializers have this optimisation, and hence will need overriding. Are there any others I need to worry about please?

I think these two are the only serializers that use this optimization.

I have successfully added MyCollectionSerializer, but I also needed to re-add some other built-in serializers in order to prevent it from serializing all collection types:

OK this is unfortunate. It might sense to add something like Kryo.replaceDefaultSerializer().

In the meantime, you could try this:

Kryo kryo = new Kryo() {
    public Serializer getDefaultSerializer(Class type) {
        final Serializer serializer = super.getDefaultSerializer(type);
        if (serializer instanceof CollectionSerializer<?>) {
            return new MyCollectionSerializer();
        }
        return serializer;
    }
};

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 22, 2023

Oh dear, the CollectionSerializer has a few sub-classes!

  • ArraysAsListSerializer
  • CopyForIterateCollectionSerializer
  • JdkImmutableListSerializer
  • JdkImmutableSetSerializer
  • PriorityQueueSerializer
  • TreeSetSerializer

All of these will need extra handling too if I replace CollectionSerializer with MyCollectionSerializer... 👷‍♂️.

Maybe explicit re-registration isn't so much worse after all?

@theigl
Copy link
Collaborator

theigl commented Mar 22, 2023

OK, this looks like a dead end.

It seems that in order to support your use case, we would indeed need some configuration options for both CollectionSerializer and ObjectArraySerializer.

Then you could override getDefaultSerializer and set the flag on these serializers and their sub-classes.

Do you want to create a PR? I'm still not sure about this, because you are probably the only one who needs this feature. But if we can think of a really good name for the flags and the code changes are minimal, I might merge it.

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 22, 2023

OK, I'll try that. I've also experimented with a CollectionSerializerAdapter that can be applied via:

Kryo kryo = new Kryo() {
    public Serializer getDefaultSerializer(Class type) {
        final Serializer serializer = super.getDefaultSerializer(type);
        if (serializer instanceof CollectionSerializer<?>) {
            return new CollectionSerializerAdapter((CollectionSerializer<?>) serializer);
        } else {
            return serializer;
        }
    }
};

although this has the following drawbacks:

  • The writeHeader and create methods are protected, and so must be invoked via reflection.
  • Some CollectionSerializer sub-classes override the read and write methods too.

I'll push this idea a bit further, just to see how terrible it really is...

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 22, 2023

So how would you rate this code on the scale of Pure Evil? 🦹

package org.testing.kryo;

import com.esotericsoftware.kryo.Kryo;
import com.esotericsoftware.kryo.Serializer;
import com.esotericsoftware.kryo.io.Input;
import com.esotericsoftware.kryo.io.Output;
import com.esotericsoftware.kryo.serializers.CollectionSerializer;
import com.esotericsoftware.kryo.serializers.DefaultSerializers.ArraysAsListSerializer;

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.Set;

import static com.esotericsoftware.kryo.Kryo.NULL;
import static java.lang.invoke.MethodHandles.lookup;
import static java.lang.invoke.MethodHandles.privateLookupIn;
import static java.lang.invoke.MethodType.methodType;

class CollectionSerializerAdapter<T extends Collection<? super Object>> extends CollectionSerializer<T> {
    private static final MethodHandle writeHeaderMethod;
    private static final MethodHandle createMethod;

    static {
        try {
            final MethodHandles.Lookup lookup = privateLookupIn(CollectionSerializer.class, lookup());
            writeHeaderMethod = lookup.findVirtual(CollectionSerializer.class, "writeHeader", methodType(void.class, Kryo.class, Output.class, Collection.class));
            createMethod = lookup.findVirtual(CollectionSerializer.class, "create", methodType(Collection.class, Kryo.class, Input.class, Class.class, int.class));
        } catch (NoSuchMethodException | IllegalAccessException e) {
            throw new InternalError(e);
        }
    }

    private final CollectionSerializer<T> underlying;
    private final MyKryo myKryo;

    public CollectionSerializerAdapter(MyKryo myKryo, CollectionSerializer<T> underlying) {
        this.underlying = underlying;
        this.myKryo = myKryo;
    }

    public void setElementsCanBeNull(boolean elementsCanBeNull) {
        underlying.setElementsCanBeNull(elementsCanBeNull);
    }

    public void setElementClass(Class elementClass) {
        underlying.setElementClass(elementClass);
    }

    public Class<?> getElementClass() {
        return underlying.getElementClass();
    }

    public void setElementClass(Class elementClass, Serializer serializer) {
        underlying.setElementClass(elementClass, serializer);
    }

    public void setElementSerializer(Serializer elementSerializer) {
        underlying.setElementSerializer(elementSerializer);
    }

    public Serializer<?> getElementSerializer() {
        return underlying.getElementSerializer();
    }

    public T copy(Kryo kryo, T original) {
        return underlying.copy(kryo, original);
    }

    @SuppressWarnings("unchecked")
    protected T create(Kryo kryo, Input input, Class<? extends T> type, int size) {
        try {
            return (T) createMethod.invoke(underlying, kryo, input, type, size);
        } catch (RuntimeException | Error e) {
            throw e;
        } catch (Throwable t) {
            throw new RuntimeException(t.getMessage(), t);
        }
    }

    protected void writeHeader(Kryo kryo, Output output, T collection) {
        try {
            writeHeaderMethod.invoke(underlying, kryo, output, collection);
        } catch (RuntimeException | Error e) {
            throw e;
        } catch (Throwable t) {
            throw new RuntimeException(t.getMessage(), t);
        }
    }

    @Override
    public void write(Kryo kryo, Output output, T collection) {
        if (collection == null) {
            output.writeByte(NULL);
            return;
        }

        final int length = collection.size();
        if (length == 0) {
            output.writeByte(1);
            writeHeader(kryo, output, collection);
            return;
        }

        try {
            output.writeVarInt(length + 1, true);
            writeHeader(kryo, output, collection);

            for (Object element : collection) {
                kryo.writeClassAndObject(output, element);
            }
        } finally {
            kryo.getGenerics().popGenericType();
        }
    }

    @Override
    public T read(Kryo kryo, Input input, Class<? extends T> type) {
        try {
            int length = input.readVarIntFlag(true);
            if (length == 0) {
                return null;
            }

            --length;
            final T collection = create(kryo, input, type, length);
            kryo.reference(collection);

            for (int i = 0; i < length; ++i) {
                collection.add(kryo.readClassAndObject(input));
            }
            return adapt(collection);
        } finally {
            kryo.getGenerics().popGenericType();
        }
    }

    @SuppressWarnings("unchecked")
    private T adapt(T collection) {
        final Class<?> underlyingClass = underlying.getClass();
        if (underlyingClass == ArraysAsListSerializer.class) {
            return (T) Arrays.asList(collection.toArray());
        } else if (underlyingClass == myKryo.immutableListSerializerClass) {
            return (T) List.of(collection.toArray());
        } else if (underlyingClass == myKryo.immutableSetSerializerClass) {
            return (T) Set.of(collection.toArray());
        } else {
            return collection;
        }
    }
}

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 22, 2023

I suppose the question becomes: Are there any ways that this CollectionSerializerAdapter can be made less Evil? One way that occurs to me is that the individual ImmutableCollectionSerializer classes could be public instead of package private/. But the real Dark Magic is how I currently need to invoke the protected methods.


It's no use. There are too many code paths like:

Registration registration = kryo.getRegistration(type);
kryo.writeObject(output, object, registration.getSerializer());

for the "adapter" to work 😿.

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 23, 2023

Then you could override getDefaultSerializer and set the flag on these serializers and their sub-classes.

Do you think that would be enough? E.g. the UnmodifiableCollectionsSerializers can only be installed via:

UnmodifiableCollectionsSerializer.registerSerializers(kryo);

and invoking:

ImmutableCollectionsSerializers.registerSerializers(kryo);

would override that kind of configuration too. There doesn't seem to be a unique "point of contact" for all new serializers, so I'd likely also need to modify these functions:

public Registration register(Class type, Serializer serializer);
public Registration register(Class type, Serializer serializer, int id);

@theigl
Copy link
Collaborator

theigl commented Mar 24, 2023

There doesn't seem to be a unique "point of contact" for all new serializers, so I'd likely also need to modify these functions

I just took another quick look and a possible point of contact could be ClassResolver.register(Registration registration). Did you consider that method?

I suppose the question becomes: Are there any ways that this CollectionSerializerAdapter can be made less Evil? One way that occurs to me is that the individual ImmutableCollectionSerializer classes could be public instead of package private/. But the real Dark Magic is how I currently need to invoke the protected methods.

I have no issue with making the ImmutableCollectionSerializer classes public. The protected methods in CollectionSerializer are a different topic. They really are extension points for sub-classes and exposing them publicly doesn't really make sense from a design point of view.

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 24, 2023

I just took another quick look and a possible point of contact could be ClassResolver.register(Registration registration). Did you consider that method?

Heh, yes, but configuring serializers via the ClassResolver seemed like an odd thing to do... 😉

I have no issue with making the ImmutableCollectionSerializer classes public.

👍

The protected methods in CollectionSerializer are a different topic.

That's OK, I wouldn't want to make those methods public either. I was just wondering if you could think of a nicer way than using MethodHandles of getting the same result. Although curiously, the JVM seems happy to invoke a protected method on a sibling class via a MethodHandle 🤔.

@chrisr3
Copy link
Contributor Author

chrisr3 commented Mar 24, 2023

I've created two PRs. The first should not be controversial, but I'll leave the second open to discussion for a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants