Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should serialization check type by default? #234

Open
wyfo opened this issue Oct 31, 2021 · 6 comments
Open

Should serialization check type by default? #234

wyfo opened this issue Oct 31, 2021 · 6 comments
Labels
breaking change Breaking change in the API question Further information is requested

Comments

@wyfo
Copy link
Owner

wyfo commented Oct 31, 2021

With Cython, performance penality caused by serialization type checking is less an issue. apischema is already fast enough to afford this feature as default behavior, as it will stay a lot faster than competitors.

With serialization type checking by default, disabling it could then become an option to improve performance.

@wyfo wyfo added question Further information is requested breaking change Breaking change in the API labels Oct 31, 2021
@wyfo wyfo assigned wyfo and unassigned wyfo Oct 31, 2021
@stas-dubrovskyi
Copy link

Hi @wyfo,

Data classes direct serialization is done by orjson like a charm. Type checking is important at some earlier stages.

  • In most cases, API responses don't have validation of values but should check typical mistakes, like an assigned value type, because this is handy for the development process.
  • In most cases, response payload size is much larger than request, so looks like this should be a focus.

Please check how just Pydantic's type-checking impacts performance in real-like scenarios. In this benchmark, we use data classes decorated by Pydantic. If the validation decorator is changed to built-ins we have serialization speed equal to primitives (raw benchmarks).

https://github.com/mtag-dev/py-rest-stress-testing/tree/squall_integration

@wyfo
Copy link
Owner Author

wyfo commented Nov 24, 2021

Data classes direct serialization is done by orjson like a charm.

Actually, not really, because orjson only works with raw dataclasses and doesn't cover apischema features like aliasing (for example camelCase), (conditional) field skipping, serialized methods, field ordering, etc. There are some issues opened in orjson repository, for example ijl/orjson#45 about aliasing, but I don't know if they will ever be implemented (and I personally think they should not, it's kind of a slippery slope).

For simple dataclasses however, orjson is indeed fine. By the way, apischema has a passthrough feature that allow arbitrary types like dataclasses to be handled directly by the JSON serializer. Moreover, in the original design of this feature, there was an option for dataclass that allows to automatically pass through dataclasses that use no apischema specific feature. This option has been removed before the release to simplify the code in anticipation of the cythonisation. That option should come back in the next release, even if the performance of the new compiled code makes it less useful: apischema (without dataclass passthrough) + orjson is almost as fast than orjson only for simple dataclasses.

In most cases, API responses don't have validation of values but should check typical mistakes, like an assigned value type, because this is handy for the development process.

For my part, I would more rely on static type checking, but I admit it's quite limited on Python.

Please check how just Pydantic's type-checking impacts performance in real-like scenarios.

Actually, pydantic and apischema are no longer really comparable in terms of performance since the cythonization (coming with the next release). apischema is indeed about 20x faster than pydantic in serialization (using its own benchmark), without type checking; with type checking activated, it still more than 17x.
Apischema is very optimized (even pure Python code is still 3x faster than cythonized pydantic code), and so is type checking in serialization.

In the end, I didn't understand your opinion on this issue. Do you think type checking should be the default?

@wyfo
Copy link
Owner Author

wyfo commented Nov 25, 2021

I did not think about it when I was writing the issue, but the best solution could be to have simply apischema.settings.serialization.type_checking = __debug__ by default. It would disable type checking in optimized/production mode (and we can see type checking as runtime assertion, so they would be disabled with other assertions of the code) and enable it in debug/development mode. Best of two worlds.

@stas-dubrovskyi
Copy link

@wyfo all depends on the stage.
Options:

  • If it can highlight problems during instance creation, like for example does the strongtyping. It will be handier to present by default, because of useful information from exception.
  • If it is in the serialization stage, probably there is no difference.

Switching off in production is not an option. Because some wrong values can appear from a data source for many reasons. In this case, issue presence will be opaque and investigation will take much longer. This only works for configs/widgets.

Are these improvements already in the master?

@stas-dubrovskyi
Copy link

I'll try to integrate apischema in squall ASAP and will provide you feedback based on benchmarks.

@wyfo
Copy link
Owner Author

wyfo commented Nov 25, 2021

Because some wrong values can appear from a data source for many reasons.

I think you misunderstood this issue. There is no question about deactivating type checking at deserialization. The issue only concerns serialization — that's why the settings key is apischema.settings.serialization.type_checking.

Are these improvements already in the master?

Yes, they are, however, installing from Github with git+https://github.com/wyfo/apischema.git@master will not compile anything. I think you should rather clone the repository, compile the files in place by executing scripts/cythonize.py (which will generate .c files) and then python setup.py build_ext --inplace, and install the local package with pip install -e path/to/cloned/apischema.
You must use the same version of Python in the command python setup.py build_ext --inplace than the one you use in your benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Breaking change in the API question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants