New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading to pydantic
v2
#2543
Open
bcdurak
wants to merge
274
commits into
develop
Choose a base branch
from
feature/OSSK-316-upgrading-to-pydantic-v2
base: develop
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Upgrading to pydantic
v2
#2543
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reverts commit 4bf0b22.
This reverts commit bc25176.
…m:zenml-io/zenml into feature/OSSK-316-upgrading-to-pydantic-v2
…ithub.com/zenml-io/zenml into feature/OSSK-316-upgrading-to-pydantic-v2
…m:zenml-io/zenml into feature/OSSK-316-upgrading-to-pydantic-v2
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why?
Why do we upgrade
pydantic
to v2?When it comes to the advantages of upgrading our
pydantic
dependency, there are mainly two big ones:pydantic
v2At the same time, we face the following challenges:
pydantic
Moreover, the
pydantic
team stopped the active development of V1. They still make releases with critical bug fixes and some security maintenance but that will also stop these at the end of June 2024.Why do we upgrade
sqlmodel
andsqlalchemy
along withpydantic
?sqlmodel
(0.0.8) does not supportpydantic
v2.pydantic
with version (0.0.14). Since then, they have supported both v1 and v2.sqlalchemy
v2 as well.sqlmodel
that supportspydantic
v2 requires you to work withsqlalchemy
v2.sqlmodel
dependency, we need to upgrade all of these packages altogether.Other packages
Due to the
pydantic
v2 support, there are a few more important dependencies, likefastapi
, that were affected by this upgrade. To see the full list, check the changes in thepyproject.toml
.How?
Migration Guides
Before I explain the changes in our codebase, I want to mention that both the
pydantic
andsqlalchemy
upgrades come with significant changes. You can check the respective migration guides and get more info with these links: pydantic v2 migration guide, sqlalchemy v2 migration guidepydantic
was also kind enough to offer a tool calledbump-pydantic
which helped a lot at the start. It roughly modified 80 files or so, mostly focused on the configuration of models and some validators. But, as you can see from the number of changed files, there were a lot of things that we still had to adopt after the tool did its migration.The most critical changes w.r.t. the
pydantic
upgradeConfiguration for models has been reworked.
Many configuration options have been either deprecated or removed. The most important ones include:
allow_mutation
is now calledfrozen
and it is set toFalse
by default.underscore_attrs_are_private
is removed and the models behave in a way like this value is set to True.The
smart_union
configuration parameter is now removed. Now, the default behavior issmart_union
, which means if we had aUnion
field that was not parameterized withsmart_union
before, we have to manually setunion_mode='left_to_right'
to keep the same behavior. Check here for more details.json_encoders
have been removed first, added back afterward, and deprecated later.pydantic.SecretStr
s in our codebase. Now, we have replaced this functionality with a custom type annotation calledZenSecretStr
, which serves simply as aSecretStr
with a custom pydantic serializer.The
regex
parameter is removed and a new parameter calledpattern
is now introduced:The
__fields__
have been replaced bymodel_fields
. Previously in V1, you were to be able to get a.type_
for each field but this is not the case anymore. The replacement is called.annotation
. However, it acts in a slightly different way. For instance, anOptional[int]
field previously had a.type_
int
but now the.annotation
isOptional[int]
.Validators have been heavily reworked. There are no
@validator
s or@root_validator
s anymore. The new validators are called@field_validator
and@model_validator
. They now feature a lot more flexibility and functionality. IMO, this change is one of the most critical ones and it has a lot of implications when it comes to our codebase. So, if you would like to get a detailed explanation, you can check all the changes here.The
skip_on_failure
parameter in the validator decorator has been removed. The only validator of this type that we had before now throws a warning instead of failing.There is an important change when it comes to the serialization of subclasses. Check this issue I have created on their GitHub page for more detail. TLDR, if you are using subclasses in your models, do not forget to use the
SerializeAsAny[NameOfBaseModel]
as the annotation to keep the same serialization behavior as v1.You can not use subclasses so easily anymore. For instance, if you subclass
int
, you can not directly use it as a type in apydantic
class. This is by design. You need to define a method calledget_pydantic_core_schema
in this new class in order to be able to use it as an annotation.Let’s say you define a new pydantic class
A
, you annotate on of its fields with another pydantic classB
. Now, you subclassA
, call itA’
, and it requires a subclass version ofB
, let’s call thatB’
. Previously, you could use an instance ofB
to create an instance ofA’
, but this is not the case anymore. You have to explicitly convertB
toB’
before you can pass it to the constructor ofA’
. Check thebase_zen_store
andbase_secret_store
implementations for more details.pydantic
definition of generic models have been removed as well.Fields do not have a
required
field anymore, instead they have anis_required()
method. Due to this, if you would like to make a field non-required, you have to set the default value or the default factory.There is also a very significant change with regards to the optional and nullable fields. Most importantly in our case, if you want to define an optional value, you
have to provide
at leastNone
as a default value. Otherwise, in contrast to V1, even if you doOptional[int]
, it will still be a required field.parse_obj
andparse_raw
have been deprecated, instead, the recommendation is to usemodel_validate
. However, this method is functioning in a slightly different way. In contrast to V1, if you feed it an instance of a subclass it fails with a validation error:The
update_forward_refs
method has been reworked and renamed. Now it is enough to just doMyModel.model_rebuild()
.There is a new Python package called
pydantic-settings
. Classes such as theSettingsConfigDict
are now a part of this package.ValidatedFunction
has been deprecated. Check theutils/pydantic_utils.py
for further info and see if we can remove this. (tagging @schustmi here)ModelMetaclass
has been moved topydantic._internal
module. Check theglobal_config.py
andtyped_model.py
in our codebase.They removed their collection of utility methods in their
typing
module (including functions such asget_args
andget_origin
). Since our codebase heavily used these functions, I carried the original versions over to work in our codebase.There were instances where we used
some_model_instance.json()
. This behavior is now replaced with thesome_model_instance.model_dump_json()
. However, if you would like to parameterize this process by using keys likesort_keys
, this is unfortunately not possible anymore. As an alternative, I have appliedsome_model_instance.model_dump()
before and then used thejson
package manually to dump it with thesort_keys
parameter.pydantic.Field
s with themax_length
setting now fail if they haveUUID
in their annotation. In such cases, I have separated the validation function.When it comes to fields,
field_info.extra
has been renamed tofield.json_schema_extra
. You can find an example how this is being used by check the changes in thezenml.utils.secret_utils
.This one was a bit interesting and hard to figure out. When you do
zenml up
, if there is a response model that has anEnum
field defined with apydantic.Field
and the field is parameterized withmax_length
, the local server deployment will fail. Still, I can not figure out the root cause of this issue. However, this is not a critical use case so I removed these instances and now we can dozenml up
successfully.With
pydantic
V2, the issue regarding multiple inherited config classes is now resolved. The related ignore tags have been removed.The
schema_json
method is deprecated; we are usingmodel_json_schema
andjson.dumps
instead.The
copy
method is deprecated; we are usingmodel_copy
instead. You can check the docstring ofBaseModel.copy
for details about how to handle include and exclude.Our update model decorator has been removed. At first, this change was mainly triggered by various failing
mypy
linting issues because they changed the way of defining required/optional values inpydantic
v2. However, soon it helped us reveal some linting issues that were suppressed by the relationship between our previous...Request
and...Update
models. Each update model is now implemented properly with optional annotations.With
pydantic
v2, the error handling within the validators has been reworked as well:The following code block used to execute successfully in
pydantic
v1, but this behavior has changed in pydantic v2 and it now throws a ValidationError:This following code block used to print out
True
andTrue
but with the new changes inpydantic
now it outputsFalse
andFalse
:Fields that are provided as extra fields to any model can be accessed by
.model_extra
now.In contrast to
pydantic
v1, defining anypydantic
class without properly annotating its fields will raise apydantic.errors.PydanticUserError
now.Critical changes w.r.t. the
sqlmodel
upgradeThe most critical factor in this upgrade stems from the PR right here. With this change, they have changed the way they handle
Enum
values.For instance, if you are familiar with our component schema (which we defined through
sqlmodel
), we have a field calledtype
which was aStackComponentType
:With this setup, when we registered, for instance, a new artifact store, we created an entry in the components table of our DB where the column
type
had the stringartifact_store
stored in it as a value. However, with the new changes,sqlmodel
now gives higher priority toEnum
fields and saves the valueARTIFACT_STORE
instead. While this is alright if you are starting from scratch, if you have any entry in a table with anEnum
field zenml will fail after the upgrade. Instead of taking the migration route, we decided to adjust our schemas to use str fields instead and updated the correspondingto_model
,update
, andfrom_request
methods.Critical changes w.r.t. the
sqlalchemy
upgradeThe new
sqlalchemy
v2 has a lot of functional and syntactic changes as well. Luckily, most of the puresqlalchemy
code in our codebase can only be found around oursql_zen_store
implementation and migration scripts. I have tried my best to fix all the deprecation issues but I ask you to pay extra attention to these changes, especially around the migration scripts.Other critical changes
pydantic
has changed how you can define optional, required, and nullable fields. Moreover, they removed therequired
field from theFieldInfo
. With the new update, there is a function called.is_required()
for each field which checks if the field has a default value or default factory. Due to these changes, I had to rework ourupdate_model
decorator. However, in my experiments, the new possible solutions created a lot of problems withmypy
and I ended up removing this decorator altogether. I implemented the exact equivalent version of these update models. This revealed a bunch of issues that were hidden before (because previously, fields in theupdate
models were considered to be required instead of optional bymypy
.). That's why you may see a few updates in the codebase, especially when it comes to theServiceConnector
models.Integration Corner
I will try to update the following subsections as the fixes come along. Here you will find a list of all the integrations affected by the aforementioned changes to our codebase and dependencies.
AWS @wjayesh
The upgrade to
kfp
V2 (in integrations likekubeflow
,tekton
, orgcp
) bumps ourprotobuf
dependency from3.X
to4.X
. This is why we need to relax thesagemaker
dependency.Airflow ✅ @schustmi
I believe this was the most critical update. Airflow still has a dependency on
sqlalchemy
V1 and this conflicts with this entire PR as we have to migrate tosqlalchemy
V1. However, we managed to figure out a way where we can still run pipelines on Airflow by keeping the Airflow and ZenML installation separated.Evidently @safoinme
Relaxing the main dependency here resolved the installation issue. They started supporting
pydantic
V2 starting from the version0.4.16
. As their latest version is0.4.22
, the dependency is limited between the two. When you installzenml
and theevidently
integration afterward, it installs0.4.22
. However, if you use theinstall-zenml-dev
script, it ends up installing0.4.16
. This is why it might make sense to test both versions.0.4.16
0.4.22
Feast ✅ @strickvl
To fix the installation issues, we had to remove the
redis
extra from thefeast
integration. As the latest version is0.37.1
, the dependency is capped at0.37.1
.GCP @safoinme @strickvl
This is also one of the major changes. As they switched to their own V2, the Python SDK of
kfp
removed theirpydantic
v1 dependency, which ultimately solved our installation issues. However, this means that we have to adapt our integration accordingly to work withkfp>=2.0
. You can find the migration guide for KFP SDK V1 to V2 here. Also, Felix previously worked on this issue and you can find his changes right here in this PR.Great Expectations ✅ @stefannica
Similar to the previous integrations, relaxing the main dependencies here resolved the installation issue. As they started supporting installations with
pydantic
v2 from0.17.15
, the minimum requirement was changed. There was a note in the requirements of this integration stating thattyping_extensions>4.6.0
does not work with GE, and the resolved version is4.10.0
. We need to figure out if this is still an issue. Moreover, they are closing on their 1.0 release. Since this might include major breaking changes, I put the upper limit to<1.0
for now.Kubeflow @safoinme @strickvl
Similar to the GCP integration, relaxing the kfp python SDK dependency resolved the installation issue, however, the code still needs to be migrated. You can find the migration guide for KFP SDK V1 to V2 here. Also, Felix previously worked on this issue and you can find his changes right here in this PR.
mlflow @avishniakov
This was an interesting change. As they stand right now, the dependencies of the
mlflow
integration are compatible withzenml
usingpydantic
v2. However, if you installzenml
first and then dozenml integration install mlflow -y
, it downgradespydantic
to v1. (I think this is an important problem that we have to solve separately in a generalized manner!) This is why I had to manually add the same duplicatedpydantic
requirement in the integration definition as well.Label Studio ✅ @strickvl
They still have a hard dependency on
pydantic = "<=1.11.0,>=1.7.3"
for thelabel_studio
package. @strickvl has opened up an issue on their GitHub page. We decided to remove that and just rely on thelabel_studio_sdk
package as that allows for Pydantic >2.x.Skypilot @safoinme
While
uv
was able to compile a list of requirements usingpydantic>=2.7.0
with bothskypilot[aws]<=0.5.0
andskypilot[gcp]<=0.5.0
respectively,skypilot[azure]<=0.5.0
is still creating issues.Tensorflow @avishniakov
The new version of
pydantic
creates a drift between thetensorflow
andtyping_extensions
packages. Relaxing the dependencies here resolves the issue, however, there is a known issue betweentorch
andtensorflow
and we need to test whether this is still problematic.Additionally, the upgrade to
kfp
V2 (in integrations likekubeflow
,tekton
, orgcp
) bumps ourprotobuf
dependency from3.X
to4.X
. This is another reason why thetensorflow
upgrade is necessary.tensorflow
integration.Tekton @safoinme @strickvl
The
tekton
integration should go through a major change as well, since it is affected by the kfp changes. You can find the migration guide for KFP SDK V1 to V2 here. Also, Felix previously worked on this issue and you can find his changes right here in this PR.Docs changes
Keep in mind, much like the changes in the
airflow
integration, some future updates will probably require changes in our documentation.Special Thanks
Special thanks to the
pydantic
team (especially @sydney-runkle) for helping us out when we got stuck. It has been a blast to work on this upgrade. Looking forward to V3 😄Leftover TODOs
mlflow
integration, we realized the data type in the artifact versions (that have a DistributionPackageSource) was deserialized incorrectly when combined with the tenant setup, leading to a failure when you doartifact_version.load()
. We need to investigate and solve this issue.Union
fields that were not parameterized withsmart_union
before, we have to manually setunion_mode='left_to_right'
to keep the same behavior.SecretStr
s. I have created an issue in the pydantic repository for this problem. Simply put, our service connectors have a configuration field. This field is a dictionary that might or might not haveSecretStr
s inside. Whenfastapi
tries to serialize the response model of such a service connector, it fails with a pydantic SerializationError.serialize_as_any
would work within fields of a BaseModel.pydantic_encoder
which is deprecated. Find an alternative solution to it.any_pydantic_model.dict()
method is now deprecated. Even though, I fixed and removed most of these calls, it is really hard to scan the codebase for similar instances. So, anytime you run into any deprecation warnings, we have to remove these calls.sqlalchemy
as well. We need to find replacements for those as well.__init__
call from theBaseService
.Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes