Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on your approach to strict mode #200

Open
ryanhiebert opened this issue Nov 9, 2022 · 2 comments
Open

Question on your approach to strict mode #200

ryanhiebert opened this issue Nov 9, 2022 · 2 comments

Comments

@ryanhiebert
Copy link

ryanhiebert commented Nov 9, 2022

Thank you so much for the work that you've done with this library. I haven't gotten to actually use it yet (so I may be missing important details, and would appreciate knowing that), and I'm extremely impressed. It is taking the right approach to design in ways that have really caused me trouble with other typing and serialization libraries, or that have caused me to just not be happy enough to want to use them.

The principles you lay out as guiding principles are spot on. By leaning into the standard tooling to provide extensibility over those tools rather than working a different parallel path, or even in opposition to them, we're better able to gradually improve our code. Bit by bit, making it better, as we learn and grow, without having to risk an entire rewrite to get the benefits.


When I think about working with Python instead of against it, which is a key feature that I'm very impressed with your work in typical for data classes and other built-in types, I find the approach of the default non-strict mode to be a very notable departure from that principle. Python doesn't do implicit type coercion, and that has been an intentional and deliberate design choice of the language for as long as I've known it, and I think probably since its inception.

The non-strict mode default turns that design principle absolutely on its head. You've provided strict mode, and that helps us work around the concern, but the philosophy is still reversed from the Pythonic first principles. If I care to preserve this principle, I'm left with a variety of (easy to use) options, where I have to constantly re-affirm that I agree with this Pythonic first principle.

The global strict mode to solve this is a non-starter for non-trivial applications, because it breaks assumptions other libraries might very reasonably be making about the state of that. I want everyone to use typical, so I want that concern to be a common thing to encounter. IMO, it is a misfeature to even offer that API, it's far too powerful of a foot gun.

All that said, easy and loose type coercion is extremely valuable. You had an eye toward this with your initial definition of @typic.al. That is a great tool, and I don't wish for you to take it out of the toolbox. However, I think it should be a different tool than the typing and serialization layer's defaults.


Personally, I find this core philosophical difference between typical as it currently exists and the way I think of the principles that have guided your design of typical to be so significant that its worth having an entirely separate API if needed that defaults to strict mode. There are a few approaches I can see to doing this, depending on what you want typical to be.

  1. Do nothing, because the current design is what you intend typical to be.
  2. Cut a major, breaking release of typical, fixing and reversing this default.
  3. Release new APIs in this package that are reliably strict by default.
  4. Create a new package in this distribution with strict by default APIs.

Gut reactions to which of these is best might be further informed by these additional considerations:

  • If the current design is the right choice for typical, I can see myself considering releasing another distribution package to PyPI, that perhaps works with typical under the hood, but exposes the strict mode by default.
  • A new package in this distribution could be typical, matching the distribution package name. This could leave the cute typic.al shortcut names defaulting to the non-strict mode, which may be preferable to many people, and allowing others to choose to use the strict APIs that might have more no-frills business-mode names like attrs ended up adding.
  • I'm sure we can find a nice keyword for non-strict mode, less boring than "non-strict". magic=True, maybe? Magic is cool, as long as you've asked for it. friendly? autoconvert?

If you got through all that, I hope that it came through in the intended spirit of gratitude and deference. I greatly appreciate that you've released this to the world, and that I get to see it. Still, this is your project, and it is and should be what you say it is, and I respect that.

What would you like to be the future of strict mode in typical? Do you agree with me that it's critical enough to warrant one of these significant options to allow for really changing the default mode? Or is that just not what you want typical to be? Have I perhaps missed something important?

@seandstewart
Copy link
Owner

Hey @ryanhiebert -

Thanks so much for the thoughtful post. It's clear you have read through my documentation and you have hit the nail on the head, as it were, to the fundamental conflict in this library to-date.

A Brief History of Time

Typical began as a part of my first foray into typed Python. At the time, I was a new convert to not only type hints, but Python3. I was struggling with the concept of strongly typed code, but I wanted the guarantees it provided. Too much of my functions and methods at the time were devoted to boilerplate validation and coercion of inputs. Thus came the @typic.al decorator - which was inspired by the attrs library's own "cute" aliases. Along the way I came up with the @typic.klass decorator, which took the same ideas from @typic.al and put them into a dataclass-style decorator.

Speaking frankly, these two features were definitely good learning experiences, but I regret them. You can see from the design of the library that my view on how to use type-hints changed from a means to save developers from lazily-typed upstream code to a means to describe protocols for serialization, deserialization, and runtime validation of types described by the Python type-system.

If you look at code written in the v1 era vs the v2 era, you can start to see this evolution.

This has to do with my own experience with SerDes libraries in statically-typed libraries like Java, Go, etc. I also gained a critical understanding of how to write strongly-typed Python and realized that the "magic" of auto-coercion was largely unnecessary if I was just very careful and explicit about the types I was passing around. Today, my production code is still a heavy user of typical, but basically only at network boundaries. Within my applications, my type-hints and mypy do the rest of the work and give me much greater peace of mind.

Moving Forward

I've been hard at work on v3 for the last 6 months. In v3, which you can take a look at here: https://github.com/seandstewart/typical/tree/v3-routine-factories, you can see I now view this library as a SerDes library first and foremost. I've given very little thought to those two areas of the typical API, but I'm a fan of your thinking and like the idea of essentially "quarantining" them behind a magic sub-package. They do have their use, especially in larger code-bases where a developer may have less control over how well-typed external callers may be, so I don't want to get rid of them entirely.

Some notable changes in v3:

  1. Limited code-gen, instead preferring explicitly-defined "routines" and closures. This will aid in debugging and make the library much less mysterious.
  2. Promoting the constraints engine to a "core" feature set.
  3. Turning off implicit coercion in the class decorator - users must opt-in to this behavior.

Things I've considered:

  1. Completely dropping the jsonschema generation.
  2. Completely dropping the class decorator.

Thusfar I have done neither. There is even a core schema package for schema generation, which is built to be completely extensible and pluggable. I personally follow the schema-first approach in my own code, but I haven't moved forward with isolating/dropping schema generation because I feel it's a very popular feature and I'm unsure of the impact if it were to just go missing from the library. As for the class decorator... I personally know of a few heavy uses of it, but even there I've been discouraging its use for some time, pointing people instead to the Protocol or Functional API. Still, I'm wary of dropping it completely for fear of drastically increasing the pain to upgrade for the sake of what amounts to a personal style preference on my part.

Things You've Made Me Consider

I want to close this comment out by saying - your submission has opened my mind to a middle way. Typical can ship two isolated packages. The first can maintain the cutesy typic. and contain the useful-yet-problematic feature-set (@typic.al, @typic.klass, maybe schema-gen?). The second could be your "serious business" package, beginning with typical.

WRT "strict" mode - yes... it's honestly quite nasty to wrestle with, I'm honestly not even sure what typical looks like without it at this point! The constraints engine has its own limitations which make it less-than-desirable when it comes to using for SerDes. Perhaps the solution is to simply do away with the juggling.

When you invoke transmute(...), you are telling typical to take this input and make it the targeted output type. This is explicit and there's no reason to toggle the underlying behavior. When you invoke validate(...), you are telling typical to check if the input can be considered a member of the target type.

A major caveat: Currently, the constraints engine allows for validating mappings against user types (e.g., dataclasses). So validate(MyType, input_dict) could succeed if the values in the input_dict are aligned with MyType. I still think this is valid behavior because the structure of the input meets the requirements of the defined type. The problem with this is we lose a critical guarantee, which is that the validated input is not actually MyType. One option could be to transmute valid complex data-types, but that also breaks a guarantee: the output type is now different than the input type, which could be surprising. Additionally, it's computationally expensive to do both operations. So I've now thought myself into a recursive loop and I break out without a decision on how to handle it.

What do you think about all this?

@ryanhiebert
Copy link
Author

Wow, thank you for taking the time to respond so thoroughly to my inquiry!

Thank you for telling me a bit more about the history, and about how your thinking has changed since then. One thing I'll point out is that less-than-ideal APIs made while learning are inevitable and shouldn't be regretted. Instead, it is better to think about what the legacy and future of the API is, and how we can make those movements most effectively.

I think that JSON Schema generation is a really neat feature. I also, at least currently, define JSON schemas directly, but I see great independent value in being able to validate that the schemas match or are compatible. This is a challenging problem in its own right, dealing with what kinds of interfaces are breaking changes and which are not. But I can also see it being tangential to the focus of a small package. This is really the question: what is the scope of the package? How much is too much to expect to all be one well in the same package?


I agree that the serialization and deserialization aspect is the central aspect of Typical. I think this is necessary, because (a) it's a hard, large problem at the center of everything Typical does, and (b) as Python typing and other language features grow, I think that the explicit principles of Typical encourage you to actually discourage or even remove now-redundant interfaces, and that serialization and deserialization are the ones least likely to be soon added to the language.

I see, and I think you do as well, Serialization, validation, and coercion as different things. In the spirit of explicit being better than implicit, I think that it's wise to separate them as much as possible. You have some neat interfaces for the validation. It's neat how you're working them into types, and I wonder how much of that Python will do for itself in the long run. It sure seems like it's doing more and more.

The serialization piece is what I'm most focused on, followed by validation. Like you, I'm using this for network boundaries primarily. I think there is room for multiple approaches to all these problems, but its the first-class support of Python primitives that really drives me. I want to be able to start with native python features, sprinkle in some hints about how they should work in different contexts, and have the redundant parts of serialization be reduced and simplified to reduce human error.

I think that validation is best done in the destination type. In fact I'd probably define validation this way. Deserialization and coercion deal with putting things into the right types, while validation enforces further constraints. Your validation is interesting in that it often uses subclasses to implement them. In that sense, it rather blurs the line between serialization and validation. And I think that's actually a good thing. I expect that over time more and more validation will be able to be analyzed by type checkers. I suspect that your approach to validation is relatively less likely to stand the test of time than a strict serialization library, largely because new ways of writing these constraints are likely to be added to the type checker.

For deserialization, the rule that I want to enforce is that the type coming in matches the type that I expect the serializer to produce. Anything outside of that would fall under coercion, and I think can be left as a different concern.

A good many features of typical, due to its principles, are likely to be redundant and therefore counterproductive over time. It is wise to think about how even good features should come to be discouraged when language-preferred alternatives are available.


Ok, depending how in sync we are with those thoughts, here's what I might suggest:

  1. Decide what typical is and should be, and communicate that effectively. Perhaps its best, as its legacy, to leave it with its original purpose intact because the stability of what this package fundamentally means is most important. Or perhaps a change of focus, including breaking backward compatibility when needed, is a better fit for what typical is meant to be. Either way, document that in the form of a compatibility policy, so that (hopefully) people know what to expect.
  2. Start with serialization at the center. This could be a new package distribution if we don't want to change typical too much.
  3. Build features that are reasonably likely to be deprecated or discouraged in the future as some form of extension modules. Perhaps as separate packages that are extras that can be installed. A well-designed plugin system can even allow us to support more non-core Python libraries.

Let's see if I can distill it down to a shorter call to action. If you agree with me that nailing a serialization and deserialization protocol and extensibility approach is critical, is it better to (a) do that under the typical package name, or (b) create a new one? You've done a lot of work on this project already in your new branch, and I'm sure you've learned a ton, and I'd be interested to see what a super-minimal serialization library should look like. I'll admit, that PR was too big for me to attempt any kind of decent review.

It's scary, but I think if it were me, I'd document that there's going to be hard pivot in this focus of this package for the purpose of nailing down this core API.By using the typical package instead of the cute typic namespace, in a similar way to what attrs did, we can maintain backward compatibility while we do this work. We can go slower in that core namespace, and nail the serialization and deserialization generic API.

JSON Schema feels like it would, ultimately, fit really well as a separate extension package. Hugely useful, but tangential to the core mission. Much of the validation feels like it would fit in a similar category as well. But that I'm less sure of. Validation and JSON schema feel like they're likely to be more tightly coupled together, and less generic, just because the space of solutions already becomes very large, and being generic at that level is probably too much work.


OK, time for you to gather your thoughts before I keep going. I keep getting the feeling that I'm being too handwavy about how this can work, and that I'm missing important details like how the serialization format (e.g. JSON) is a critical piece of knowledge to how the serialization and deserialization work, and that even that might ultimately not be something we can unify into a single protocol effectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants