Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uuid.UUID or str in SQLAlchemy models #6743

Open
rdimaio opened this issue Apr 30, 2024 · 4 comments
Open

uuid.UUID or str in SQLAlchemy models #6743

rdimaio opened this issue Apr 30, 2024 · 4 comments
Assignees
Labels

Comments

@rdimaio
Copy link
Contributor

rdimaio commented Apr 30, 2024

This is a follow-up from this PR comment: #6497 (comment), where it was suggested to change all instances of Mapped[uuid.UUID] to Mapped[str].

uuid.UUID

This is the Python-implemented UUID type: https://docs.python.org/3/library/uuid.html#uuid.UUID, which is the type hint we're currently using, e.g.:

dest_rse_id: Mapped[uuid.UUID] = mapped_column(GUID())
src_rse_id: Mapped[uuid.UUID] = mapped_column(GUID())

str

There is this SO answer where this is discussed https://stackoverflow.com/a/41434923, and there are a couple comments that mention some downsides to storing UUIDs as strings:

there are many downsides - a few off the top of mind: size - string uuid takes up twice the space - 16bytes vs 32 chars - not including any formatters. Processing time - more bytes = more processing time by the CPU as your dataset gets bigger. uuid string formats differ by language, adding addition required translations. Easier for someone to misuse the column, as you can put anything in there, things that aren't uuids.

sqlalchemy.types.Uuid

As an alternative to both uuid.UUID and str, there is the SQLA2.0 Uuid type: https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Uuid

class sqlalchemy.types.Uuid
Represent a database agnostic UUID datatype.
For backends that have no “native” UUID datatype, the value will make use of CHAR(32) and store the UUID as a 32-character alphanumeric hex string.
For backends which are known to support UUID directly or a similar uuid-storing datatype such as SQL Server’s UNIQUEIDENTIFIER, a “native” mode enabled by default allows these types will be used on those backends.

This seems to be a similar implementation to the GUID type we're currently using:

class GUID(TypeDecorator):
"""
Platform-independent GUID type.
Uses PostgreSQL's UUID type,
uses Oracle's RAW type,
uses MySQL's BINARY type,
otherwise uses CHAR(32), storing as stringified hex values.
"""
impl = CHAR
cache_ok = True
def load_dialect_impl(self, dialect):
if dialect.name == 'postgresql':
return dialect.type_descriptor(UUID())
elif dialect.name == 'oracle':
return dialect.type_descriptor(RAW(16))
elif dialect.name == 'mysql':
return dialect.type_descriptor(BINARY(16))
else:
return dialect.type_descriptor(CHAR(32))

See also this SO answer: https://stackoverflow.com/a/76059829

To do

I think sqlalchemy.types.Uuid seems to be the appropriate type to use in this context. However, I think it might be worth investigating this a bit further before we commit to a specifc type.

@rdimaio rdimaio self-assigned this Apr 30, 2024
@dchristidis
Copy link
Contributor

This is the Python-implemented UUID type: https://docs.python.org/3/library/uuid.html#uuid.UUID, which is what we're currently using.

To avoid any misunderstanding, this is what the type hints claim we are using. As far as I can tell, we are not.

There is this SO answer where this is discussed https://stackoverflow.com/a/41434923, and there are a couple comments that mention some downsides to storing UUIDs as strings:

I’m not sure if this particular comment is relevant because we do not store UUIDs as strings in the database.

sqlalchemy.types.Uuid

At some point, we will have to convert this to a string (e.g. in the gateway or REST layer).

@rdimaio
Copy link
Contributor Author

rdimaio commented Apr 30, 2024

This is the Python-implemented UUID type: https://docs.python.org/3/library/uuid.html#uuid.UUID, which is what we're currently using.

To avoid any misunderstanding, this is what the type hints claim we are using. As far as I can tell, we are not.

Thanks for the correction on the first point, I updated the main post.

There is this SO answer where this is discussed https://stackoverflow.com/a/41434923, and there are a couple comments that mention some downsides to storing UUIDs as strings:

I’m not sure if this particular comment is relevant because we do not store UUIDs as strings in the database.

Good point. I mistakenly conflated using str as the Mapped type hint in the SQLA models with the type that would actually end up being stored in the database.

sqlalchemy.types.Uuid

At some point, we will have to convert this to a string (e.g. in the gateway or REST layer).

The sqlalchemy.types.Uuid.__init__as_uuid parameter (https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Uuid.params.as_uuid) mentions: "if True, values will be interpreted as Python uuid objects, converting to/from string via the DBAPI.", which seems to indicate that it allows for conversion to string in a DB-agnostic way

@dchristidis
Copy link
Contributor

The sqlalchemy.types.Uuid.__init__as_uuid parameter (https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Uuid.params.as_uuid) mentions: "if True, values will be interpreted as Python uuid objects, converting to/from string via the DBAPI.", which seems to indicate that it allows for conversion to string in a DB-agnostic way

Do I understand correctly that you are suggesting replacing our own GUID class with this Uuid, in a way like:

rse_id: Mapped[str] = mapped_column(Uuid(as_uuid=False))

@rdimaio
Copy link
Contributor Author

rdimaio commented Apr 30, 2024

The sqlalchemy.types.Uuid.__init__as_uuid parameter (https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Uuid.params.as_uuid) mentions: "if True, values will be interpreted as Python uuid objects, converting to/from string via the DBAPI.", which seems to indicate that it allows for conversion to string in a DB-agnostic way

Do I understand correctly that you are suggesting replacing our own GUID class with this Uuid, in a way like:

rse_id: Mapped[str] = mapped_column(Uuid(as_uuid=False))

Yeah, I think it might be better in the long run to move to their version of Uuid instead of maintaining our own implementation of this similar DB-agnostic logic. Also, here's the SQLA source code to double_check that the type of Uuid is str when as_uuid=False:

https://github.com/sqlalchemy/sqlalchemy/blob/f53f261ad85fb251ac8989d7d9b0ccf1c099a892/lib/sqlalchemy/sql/sqltypes.py#L3532-L3544

and thepython_type property:

https://github.com/sqlalchemy/sqlalchemy/blob/f53f261ad85fb251ac8989d7d9b0ccf1c099a892/lib/sqlalchemy/sql/sqltypes.py#L3565-L3567

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants