Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typing.Self not supported in generic fields #9391

Open
1 task done
strangemonad opened this issue May 4, 2024 · 5 comments
Open
1 task done

typing.Self not supported in generic fields #9391

strangemonad opened this issue May 4, 2024 · 5 comments

Comments

@strangemonad
Copy link

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

It seems that with the recent introduction to support the Self type in pydantic 2.7 #5992, this doesn't cover support for generic fields where Self is the type var.

Alternatives:

  1. For the code below, when making instantiate a new model, the FieldInfo likely needs to be updated. eg for the example below, Product has the following field info.
{'ref': FieldInfo(annotation=Ref[Self], required=True), 'name': FieldInfo(annotation=str, required=True)}

At the point of model creation, I believe it should be possible to update Ref[Self] to Ref[model_class]? The downside here is that the FieldInfo, while it reflects the reality of the semantics of the Self type, no longer matches the annotation verbatim making it harder to potentially debug, easier to introduce bugs (e.g. if the annotation value is used to look up the cached generic subclass) ad potentially incompatible with whatever direction the python core team takes __future__ annotations in.

  1. Keep the existing field info and handle the case when building the field's core schema. This would mean a generic model subclass might need to be instantiated just in time and registered in then cache in the correct order for it to be visible to the core_schema machinery.

Example Code

from typing import TypeVar, Generic, Self
from pydantic import BaseModel

T = TypeVar("T", bound="Entity")


class Ref(BaseModel, Generic[T]):
    path: str


class Entity(BaseModel):
    ref: "Ref[Self]"


class Product(Entity):
    name: str

p = Product(name="test", ref=Ref[Product](path="f"))


```console
Traceback (most recent call last):
  File "/Users/shawn/Code/instance-bio/instance/apps/admin/src/pdg.py", line 18, in <module>
    p = Product(name="test", ref=Ref[Product](path="f"))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shawn/Code/instance-bio/instance/apps/admin/.venv/lib/python3.11/site-packages/pydantic/main.py", line 171, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Product
ref
  Input should be a valid dictionary or instance of Ref[Self] [type=model_type, input_value=Ref[Product](path='f'), input_type=Ref[Product]]
    For further information visit https://errors.pydantic.dev/2.6/v/model_type


### Python, Pydantic & OS Version

```Text
pydantic version: 2.7.1
        pydantic-core version: 2.18.2
          pydantic-core build: profile=release pgo=true
                 install path: /Users/shawn/Code/instance-bio/instance/libs/instance-firestore-entity/.venv/lib/python3.11/site-packages/pydantic
               python version: 3.11.4 (main, Aug 14 2023, 09:41:08) [Clang 14.0.3 (clang-1403.0.22.14.1)]
                     platform: macOS-14.4.1-arm64-arm-64bit
             related packages: typing_extensions-4.11.0 pyright-1.1.321
                       commit: unknown
@strangemonad strangemonad added bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels May 4, 2024
@sydney-runkle sydney-runkle added feature request and removed bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels May 16, 2024
@sydney-runkle
Copy link
Member

@strangemonad,

Yep, you're right, this isn't currently supported. I've marked this as a feature request. PRs with support are welcome!

@NeevCohen
Copy link
Contributor

@sydney-runkle Hey, I started looking into this, and it seems a little tricky to me.

In the definition of Entity

class Entity(BaseModel):
    ref: Ref[Self]

Ref's __class_getitem__ method gets called which is defined here, though when its called, the typevar_values parameter is just Self, without a way to access or determine what Self is actually referring to (none that I could tell at least).

My best idea was to perhaps look try to look up the stack to figure out what Self refers to, kind of like what super() does, though I'm not sure that's the best idea.

Do you perhaps have any ideas/directions into how this should be implemented?

@strangemonad
Copy link
Author

strangemonad commented Jun 2, 2024

@NeevCohen I don't believe it's a trial patch to handle the generic case. The code example above it's too hard (you use the new cls instance being constructed in the pydantic base model meta-class but e.g. what happens when Product has a subclass or what happens when there's an intermediate abstract class in the MRO / class hierarchy chain. There's also the issue that Self might occur in some arbitrarily complex type expression that needs to be interpreted. There's lots of edge cases to handle properly.

For this simple case, you can change Entity to have the following and it works for the main cases but it's not ideal

class Entity(BaseModel):
    ref: "Ref[Self]"

        @classmethod
    @override
    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)

        # Specialize the ref field type to the concrete Entity subclass.
        # This works around https://github.com/pydantic/pydantic/issues/9391
        # This patching of field-info must happen in __init_subclass__ (before
        # __pydantic_init_subclass__) so the correct core schema is generated for the model.
        ref_fi = cls.model_fields["ref"]
        # if ref_fi.annotation == Union[Ref[Self], None]:
        # but only if the type is not yet specialized so we can handle model lineages in a
        # slightly hacky way by setting the ref type to be the same across all versions.
        ref_fi.annotation = Union[Ref[cls], None]  # noqa: UP007  # we actually need to create a UnionType instance here.
        cls.model_fields["ref"] = ref_fi

@NeevCohen you could also look at BaseModel.model_construct and ModelMetaclass.__new__ to see where the new cls instance is created (you don't have to explicitly inspect the call stack).

@sydney-runkle
Copy link
Member

@NeevCohen,

@dmontagu is our resident generics expert, so I'll chat with him on Monday and see what he thinks about this :).

@dmontagu
Copy link
Contributor

dmontagu commented Jun 10, 2024

Okay so I did some investigation. The following relatively-small patch seems to make the example above work:

diff --git a/pydantic/_internal/_forward_ref.py b/pydantic/_internal/_forward_ref.py
index 231f81d11..5281353e2 100644
--- a/pydantic/_internal/_forward_ref.py
+++ b/pydantic/_internal/_forward_ref.py
@@ -1,7 +1,13 @@
 from __future__ import annotations as _annotations
 
 from dataclasses import dataclass
-from typing import Union
+from typing import Any, Union
+
+
+@dataclass
+class PydanticDeferredSelfGeneric:
+    cls: Any
+    args: tuple[Any, ...]
 
 
 @dataclass
diff --git a/pydantic/_internal/_generate_schema.py b/pydantic/_internal/_generate_schema.py
index 335edb915..642d36dc5 100644
--- a/pydantic/_internal/_generate_schema.py
+++ b/pydantic/_internal/_generate_schema.py
@@ -75,7 +75,7 @@ from ._decorators import (
 )
 from ._docs_extraction import extract_docstrings_from_cls
 from ._fields import collect_dataclass_fields, get_type_hints_infer_globalns
-from ._forward_ref import PydanticRecursiveRef
+from ._forward_ref import PydanticDeferredSelfGeneric, PydanticRecursiveRef
 from ._generics import get_standard_typevars_map, has_instance_in_type, recursively_defined_type_refs, replace_types
 from ._mock_val_ser import MockCoreSchema
 from ._schema_generation_shared import CallbackGetCoreSchemaHandler
@@ -754,6 +754,11 @@ class GenerateSchema:
             with self.model_type_stack.push(obj):
                 return self._model_schema(obj)
 
+        if isinstance(obj, PydanticDeferredSelfGeneric):
+            self_type = self.model_type_stack.get()
+            args = tuple(self_type if is_self_type(x) else x for x in obj.args)
+            return self.generate_schema(obj.cls[*args])
+
         if isinstance(obj, PydanticRecursiveRef):
             return core_schema.definition_reference_schema(schema_ref=obj.type_ref)
 
diff --git a/pydantic/main.py b/pydantic/main.py
index 752718326..82b5d64c3 100644
--- a/pydantic/main.py
+++ b/pydantic/main.py
@@ -688,7 +688,7 @@ class BaseModel(metaclass=_model_construction.ModelMetaclass):
 
     def __class_getitem__(
         cls, typevar_values: type[Any] | tuple[type[Any], ...]
-    ) -> type[BaseModel] | _forward_ref.PydanticRecursiveRef:
+    ) -> type[BaseModel] | _forward_ref.PydanticRecursiveRef | _forward_ref.PydanticDeferredSelfGeneric:
         cached = _generics.get_cached_generic_type_early(cls, typevar_values)
         if cached is not None:
             return cached
@@ -702,6 +702,10 @@ class BaseModel(metaclass=_model_construction.ModelMetaclass):
 
         if not isinstance(typevar_values, tuple):
             typevar_values = (typevar_values,)
+
+        if any(_typing_extra.is_self_type(x) for x in typevar_values):
+            return _forward_ref.PydanticDeferredSelfGeneric(cls, typevar_values)
+
         _generics.check_parameters_count(cls, typevar_values)
 
         # Build map from generic typevars to passed params

However, I worry that there are about a million edge cases that merit consideration, such as "what happens if you have a typing.Self and another recursive typevar reference?" or "what happens if you have multiple typevars"? Or "what happens if you try to parametrize a basemodel with a dangling Self"? You can imagine some nasty things like:

class MyGenericModel(Generic[R, S, T]):
    x: R
    y: MyGenericModel[Self, S, T]
    z: MyGenericModel[Self, Self, T]

MyGenericModel[MyGenericModel[int], int, str]

What would happen? No idea, but I'd be pretty surprised if there weren't bugs 😅 . We don't necessarily need to resolve them all, but I'd want to see some semi-thorough testing before we merge this, even if that testing just results in adding some xfail'ing tests to the test suite (because it works as desired in the more common case).

Also, I'll note that for this to be implemented properly, we'd need to change my patch above to recursively look for typing.Self in other generics, not just at the top level, otherwise, things like my_field: dict[str, Self] wouldn't get deferred properly; this is definitely a test case I'd want to see before merging.


Also, I noticed that PyCharm doesn't like it if you use Self as a parameter in a generic type:
image

PyCharm is often slow to implement advanced type-checking features, so this isn't necessarily a problem, but before we consider merging support for this I would want to confirm whether existing type-checkers actually support this syntax. In particular, if either mypy or pyright report a type error for this code (which I haven't checked), it would take some convincing for me to want to add support for this feature in pydantic, as I strongly suspect this feature will add a nontrivial maintenance/backwards-compatibility burden to pydantic, and I don't want to do that if even the typecheckers do not consider this valid. But if the popular typecheckers handle this as expected/intended, then I think it's reasonable to add support in pydantic despite the potential for maintenance challenges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants