Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of type_code #348

Open
1 task done
abhiaagarwal opened this issue Mar 16, 2023 · 6 comments
Open
1 task done

Documentation of type_code #348

abhiaagarwal opened this issue Mar 16, 2023 · 6 comments

Comments

@abhiaagarwal
Copy link

Describe the feature

I'm working with adding some plumbing between our backend system and Trino, and we're trying to derive more type information to speed up Python. Trino gives access to column information with the cursor.description method, but it gives a type_code integer. I've been struggling to figure out what type_code maps to, and my google-fu has failed me. I'd appreciate any sort of guidance.

Describe alternatives you've considered

I've been running queries to try and manually figure out type_code, but it hasn't been exhaustive.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@abhiaagarwal
Copy link
Author

I did a bit of investigation by myself, and discovered a few things:

  1. The standard follows PEP-249, the DBAPI standard, so it's technically correct to have type_code returned rather than the type itself.
  2. While type_code should be an integer, it is actually a string. For example, a BOOLEAN trino value returns boolean. I'll attach a table below of the various mappings I've found.

I think the best solution is to change type_code to actually an integer, and then introduce a mapping function/table from these integers to their string representation, both trino and python-based types. This can also help reduce the boilerplate in functions like RowMapperFactory. The constructs relevant to PEP249 type_code matching are already available.

I'm willing to open a PR for this, but it might be a somewhat wide-reaching change.

ASIDE: some types, and their type_code representation

Trino Type type_code
BOOLEAN boolean
TINYINT tinyint
SMALLINT smallint
INTEGER integer
BIGINT bigint
REAL real
DOUBLE double
DECIMAL decimal(10, 7)
VARCHAR varchar
CHAR char(1)
VARBINARY varbinary
JSON json
DATE date
TIMESTAMP timestamp(0)
TIMESTAMP_TIMEZONE timestamp(3) with time zone
INTERVAL_MONTH INTERVAL YEAR TO MONTH
INTERVAL_DAY INTERVAL DAY TO SECOND
ARRAY array(integer)
IPADDRESS ipaddress
UUID uuid

@hashhar
Copy link
Member

hashhar commented Mar 30, 2023

Nice find @abhiagarwal01, this makes sense to fix. It'll be a breaking change but it'll make us follow the PEP correctly and allow third party tools to work better which make use of this information.

We'd appreciate if you want to submit a PR for this. Just beware that it probably won't be in the upcoming release since we like to include breaking changes in separate releases.

@abhiaagarwal
Copy link
Author

@hashhar Thank you! I'll likely be refactoring any type-specific logic to a separate types.py file (and this can be used in the sqlalchemy module as well).

@hashhar
Copy link
Member

hashhar commented Mar 30, 2023

Is it possible to do this in two steps - separate PRs or different commits in same PR?
Fix the type codes being used and any REQUIRED changes.
Refactor sqlalchemy and dbapi.py to reduce boilerplate type handling?

@abhiaagarwal
Copy link
Author

I think the idea of a type_code requires some level of refactoring, given that this library currently has no logical way to group multiple different data types. What I'll do is first a minimalistic PR that just fixes the type_code by defining constants in constants.py, and then add a separate PR that does any major refactoring. Does that sound good?

@hashhar
Copy link
Member

hashhar commented Mar 30, 2023

That sounds ideal. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants