Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Expose enum types as actual enums in py-rattler #621

Open
iamthebot opened this issue Apr 23, 2024 · 12 comments
Open

Feat: Expose enum types as actual enums in py-rattler #621

iamthebot opened this issue Apr 23, 2024 · 12 comments
Labels
needs-design Needs discussion, investigation, or design python-bindings

Comments

@iamthebot
Copy link
Contributor

There are types like PyPathType that currently represent enums in rust but we expose them with an odd interface where the end-user needs to check for truthiness to ascertain the current value of the enum. Wouldn't it be cleaner to expose these on the python side as enum types?

@baszalmstra
Copy link
Collaborator

Do you have an example and how you would change it?

@iamthebot
Copy link
Contributor Author

iamthebot commented Apr 24, 2024

For example, the current PathType class in rattler (eg; here) has getters for hardlink, softlink, etc. that just return True if the PathType is of that value. In other words, to get the path type you have to do a bunch of if/else until you hit a possible value that returns True. That's not really idiomatic and a major hassle if you have a lot of values. It also means downstream code needs to worry about any new enum values not being handled.

So for example, to determine the type of a path in Python I have to do something like:

(currently)

...
if path_type.directory:
  return "directory"
elif path_type.softlink:
  return "softlink"
elif path_type.hardlink:
  return "hardlink"

In the rust code in rattler-conda-types however (eg; here this is an enum. That is, the values are mutually exclusive. This means it would be also safe to expose these types as python enums.

If we did that, now it would be easy in python to:

  1. Get all possible valid values for the enum
  2. Convert from a string to the enum type easily
  3. Convert from the enum type to a string
  4. Check equality across enums of the same type

For example:

>>> print(path_type.value)
PathType.hardlink

>>> PathType["hardlink"] == PathType.hardlink
`True`

>>> assert path_type == PathType.symlink, "some assertion error"

Doing some cursory research, PyO3 does not natively have a way to export rust enums as Python enums. But a few approaches are suggested in this SO post. Another approach I don't see listed there that might be cleaner (and backwards compatible with older versions of Python) is to expose an int member for these types and on the python side have a matching int based enum with those values. That's more memory efficient too.

@baszalmstra
Copy link
Collaborator

baszalmstra commented Apr 24, 2024

Thanks for the writeup! That makes total sense to me. Would you be able to whip up a Pr?

@iamthebot
Copy link
Contributor Author

iamthebot commented Apr 24, 2024

Yeah, I can put up a PR Friday.

Update: Ended up a little busy today (4/26) so didn't get to this but will add it next week.

@baszalmstra
Copy link
Collaborator

baszalmstra commented May 6, 2024

I have been looking at how polars does this. Essentially what they do is that they define an enum in python as a Literal. These are then converted on the Rust side back to enums.

This approach is relatively lightweight on the Python side which I kinda like, but doesn't introduce an "actual type" either which may or may not be nice.

@iamthebot wdyt?

@baszalmstra baszalmstra added the needs-design Needs discussion, investigation, or design label May 6, 2024
@iamthebot
Copy link
Contributor Author

@baszalmstra sorry, been ridiculously busy so haven't gotten back to this issue. FWIW there's an open issue in PyO3 on how to best handle enums. Until then...

I'm not a super big fan of the literal option tbh. You still don't get a proper python enum. In that case it's really no better than just using strings. And you miss out on all the magic you get from enums (converting to string representation and back, static analyzers like pyright catching impossible enum values, etc.)

Then again, the least performant option is similar to what we have today in that checking which value you actually in Python involves a ladder of if/else checks on each possible value.

A more performant approach would be to expose the numeric value of the enum to Python (i.e mapping rust enum value -> int inside rust) and inside the python class map that "back" to a proper python enum. But this has the downside of needing to keep the list of possible values in sync in both places. I think we can "cheat" and have a function in rust that returns all the possible enum values as strings. Then python-side we can use Enum's functional API to construct the enum class with those values. That negates us needing to keep the enum values in sync in the python code (since they come from rust).

@baszalmstra
Copy link
Collaborator

Mypy does still do type checking when using literals. It will complain when you assign a random string "bla" to a literal. But Im sure mypy can do more magic when using "proper" enums.

With the literal type we do also need to keep the strings in sync between Rust and python though. Unless we generate the enums statically at "compile-time" (whatever that means in python terms) I see no way around that. I realize we could build the type dynamically at runtime, but that will make static analysis like type checking and documentation generation harder.

@iamthebot
Copy link
Contributor Author

@baszalmstra what would be the downside of having a fn that returns all the enum strings to Python and at import time we construct the corresponding python enum type? Nothing to keep in sync in that case and we get a proper python enum. We could pregenerate stubs for static analysis tools (pyright/mypy/pyre-check)

@baszalmstra
Copy link
Collaborator

That would be fine with me as long as we have proper stubs to support documentation generation, IDE autocomplete, and static type checking, otherwise the values will be "invisible" for the user.

We would be happy to update all the enums in the entire codebase but a PR that shows what you have in mind would go a long way!

@Wackyator
Copy link
Collaborator

I would love to get started on this issue, @iamthebot do you think you could showcase a simple example for the same?

@iamthebot
Copy link
Contributor Author

@Wackyator yeah let me put something together today. Sorry, would have started on this earlier but been super busy this week.

@iamthebot
Copy link
Contributor Author

OK @Wackyator sorry for the delay, was doing some testing on my side to see what would:

  1. Work with type checkers (using pyright since that's what we're using here but also tested mypy)
  2. Prevent the likelihood of eg; updating enum values in Rust but forgetting to do that in the python package
  3. Minimize duplication of enum values as much as possible (since ideally we want to do this for all enums)

It seems that explicitly defining the enum values in Python is going to be the most compatible approach. Then using a unit test to verify that python enums match those in rust. We can send strings from rust to Python which we then convert into the corresponding python enum.

On the rust side (not in py-rattler but in the actual rattler type) the easiest thing would be to eg; adopt something like the strum_macros crate which lets you stringify an enum. Then in the py-rattler rust code for the enums eg; here we just return the string representation of the PrefixPathType enum. Something like a single enum_value method that returns a string instead of the individual boolean methods for is_hardlink, etc. This avoids the need to maintain an extra mapping inside the py-rattler rust code.

Then on the python side..

For example, for PrefixPathType

(note: this uses StrEnum which is Python 3.11+ but a backport package exists for Python 3.6+)

from enum import StrEnum
class PrefixPathType(StrEnum):
    hardlink = "hardlink"
    directory = "directory"
    pyc_file = "pyc_file"
    softlink = "softlink"
    windows_python_entrypoint_script = "windows_python_entrypoint_script"
    windows_python_entrypoint_exe = "windows_python_entrypoint_exe"
    unix_python_entrypoint = "unix_python_entrypoint"
    @classmethod
    def from_py_path_type(cls, py_path_type: PyPrefixPathType) -> PrefixPathType:
        """Construct Rattler PathType from FFI PyPathType object."""
        return PrefixPathType[py_path_type.enum_value()] 

Then in a unit test you can just check that you can check the enum can be constructed for all values of the rust enum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-design Needs discussion, investigation, or design python-bindings
Projects
None yet
Development

No branches or pull requests

3 participants