Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow propagation of class based discriminator settings to subclasses #196

Open
Sanavesa opened this issue Mar 3, 2024 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@Sanavesa
Copy link

Sanavesa commented Mar 3, 2024

  • mashumaro version: 3.12
  • Python version: 3.11.7
  • Operating System: Debian 12 (Bookworm)

Description

Issue Summary:
When serializing and deserializing a hierarchy of dataclasses involving abstract classes using mashumaro's to_dict() and from_dict() methods, attempting to deserialize a dictionary back to an object through an abstract parent class results in a TypeError due to the inability to instantiate the abstract class, even when the dictionary represents a concrete subclass.

Expected Behavior:
Deserialization of a dictionary to an object should be successful through any class in the hierarchy (including abstract classes) as long as the dictionary represents a concrete subclass that implements all abstract methods. This is expected to work because of the specified discriminator field that indicates the concrete subclass type.

Actual Behavior:
Deserializing a dictionary to an object through an abstract class using the from_dict() method throws a TypeError, indicating an attempt to instantiate an abstract class with unimplemented abstract methods.

What I Did

  1. Define a base class Superclass that inherits from DataClassDictMixin. This class includes a Config subclass with a Discriminator configured on field type and includes all subtypes.
  2. Define an abstract class AbstractSubclass inheriting from Superclass with an abstract method foo().
  3. Define a concrete subclass ConcreteSubclass inheriting from AbstractSubclass, implementing the foo() method and setting the discriminator field type to "concrete_subclass".
  4. Serialize an instance of ConcreteSubclass to a dictionary using to_dict().
  5. Attempt to deserialize the dictionary back to an object using from_dict() on all classes in the hierarchy.
from abc import ABC, abstractmethod
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
from mashumaro.types import Discriminator


@dataclass
class Superclass(DataClassDictMixin):
    def __post_serialize__(self, d: dict) -> dict:
        # Embed the discriminator into the serialized dictionary
        return {"type": self.type, **d}

    class Config(BaseConfig):
        discriminator = Discriminator(field="type", include_subtypes=True)


@dataclass
class AbstractSubclass(Superclass, ABC):
    weight: float

    @abstractmethod
    def foo(self) -> None:
        raise NotImplementedError

    # NOTE: If this was uncommented, the error disappears!
    # class Config(BaseConfig):
    #     discriminator = Discriminator(field="type", include_subtypes=True)


@dataclass
class ConcreteSubclass(AbstractSubclass):
    type = "concrete_subclass"
    height: float

    def foo(self) -> None:
        print("bar")


inst = ConcreteSubclass(weight=80.0, height=180.0)
inst_dict = inst.to_dict()

assert inst_dict == {"type": "concrete_subclass", "weight": 80.0, "height": 180.0}  # OK

assert inst == ConcreteSubclass.from_dict(inst_dict)  # OK

assert inst == Superclass.from_dict(inst_dict)  # OK

assert inst == AbstractSubclass.from_dict(
    inst_dict
)  # TypeError: Can't instantiate abstract class AbstractSubclass with abstract method foo

Error Message:

TypeError: Can't instantiate abstract class AbstractSubclass with abstract methods foo

Temporary Workaround:
Adding the same Config subclass with a Discriminator configuration to the AbstractSubclass resolves the issue. This suggests a potential inconsistency in how discriminator configurations are inherited or recognized in the class hierarchy.

@Fatal1ty
Copy link
Owner

Fatal1ty commented Mar 9, 2024

Hi @Sanavesa

Deserialization of a dictionary to an object should be successful through any class in the hierarchy (including abstract classes) as long as the dictionary represents a concrete subclass that implements all abstract methods. This is expected to work because of the specified discriminator field that indicates the concrete subclass type.

Since you mentioned abstract classes in the context of inclusion, I assume you expect a concrete subclass to be used to deserialize any of its subclasses? What would you be expecting in the following case? An error or a Bar instance? And if it's a second option, then how should we be able to validate that the incoming dictionary contains exactly the "foo" type?

from dataclasses import dataclass
from typing import Literal

from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
from mashumaro.types import Discriminator


@dataclass
class Base(DataClassDictMixin):
    class Config(BaseConfig):
        discriminator = Discriminator(field="type", include_subtypes=True)


@dataclass
class Foo(Base):
    type: Literal["foo"] = "foo"


@dataclass
class Bar(Foo):
    type: Literal["bar"] = "bar"


obj = Foo.from_dict({"type": "bar"})  # what to expect here?

@Fatal1ty Fatal1ty added question Just a question needs information Further information is requested and removed question Just a question labels Mar 9, 2024
@Sanavesa
Copy link
Author

In the code example provided, I noticed that type is a Literal. This raises a question: should validation or discrimination take precedence? I lean towards validation (and so does the current implementation), which means an error like mashumaro.exceptions.InvalidFieldValue: Field "type" of type Literal['foo'] in Foo has invalid value 'bar' should occur. However, if type were a str (specifically, ClassVar[str]), I'd then expect the output to be a Bar instance instead.

Regarding class hierarchy and their subtype variants, each class seems to have its own lookup map (__mashumaro_subtype_variants__) including itself (unless it's abstract) and its descendants. Given the hierarchy:

    Base
   /    \
Foo    AbstractFoo
 |          |
Bar     ConcreteBar

The expected lookup maps would be:

  • Base.__mashumaro_subtype_variants__ = {"foo": Foo, "bar": Bar, "concrete_bar": ConcreteBar}
  • Foo.__mashumaro_subtype_variants__ = {"foo": Foo, "bar": Bar}
  • Bar.__mashumaro_subtype_variants__ = {"bar": Bar}
  • AbstractFoo.__mashumaro_subtype_variants__ = {"concrete_bar": ConcreteBar}
  • ConcreteBar.__mashumaro_subtype_variants__ = {"concrete_bar": ConcreteBar}

In the from_dict() method, it should first identify the discriminator field, check its value in the data dictionary, and then look it up in the __mashumaro_subtype_variants__. If a matching class is found (and all fields are present), it should return data as that class. If not, it should raise a mashumaro.exceptions.SuitableVariantNotFoundError.

What are your thoughts on this approach? Thank you for considering my feedback.

@Fatal1ty
Copy link
Owner

Sorry for being silent for a long time and thank you for the detailed answer. I like your proposal, however, its implementation may require heavy improvements.

Regarding class hierarchy and their subtype variants, each class seems to have its own lookup map (__mashumaro_subtype_variants__) including itself (unless it's abstract) and its descendants.

Unfortunately, this is not true in the current implementation. When a dataclass has a discriminator in its config, a map __mashumaro_subtype_variants__ is being created only for that dataclass. I dropped a note in README about this limitation:

The only difference is that you can't use include_supertypes=True because it would lead to a recursion error.

I see the following way to resolve this issue:

  1. Allow using include_supertypes=True along with include_subtypes=True in class level discriminators. This can be done by generating two methods for the same class — the first one will search for a suitable class in __mashumaro_subtype_variants__ and the second one will be used for deserialization of the current class itself.
  2. Add an optional flag to Discriminator that will enable propagation of discriminator settings to the subclasses.

With an existing workaround, I think this issue will have a low priority. However, if you or someone else can handle it, PR welcome.

@Fatal1ty Fatal1ty added enhancement New feature or request and removed needs information Further information is requested labels Mar 31, 2024
@Fatal1ty Fatal1ty changed the title Deserialization Error with Abstract Class Discriminators Allow propagation of class based discriminator settings to subclasses Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants