Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: STRUCT and ARRAY support #318

Merged
merged 37 commits into from Sep 9, 2021
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
52cee8c
feat: STRUCT and ARRAY support
jimfulton Aug 30, 2021
a0b02f7
Merge branch 'main' into struct
jimfulton Aug 30, 2021
6bacc0d
Fixed test that expected JSON rather than STRUCT
jimfulton Aug 31, 2021
1ec0f88
Merge branch 'struct' of github.com:jimfulton/python-bigquery-sqlalch…
jimfulton Aug 31, 2021
74aab64
Added system test I neglected to check in before :(
jimfulton Aug 31, 2021
c5653e2
blacken
jimfulton Aug 31, 2021
a7f0b41
Merge branch 'main' into struct
jimfulton Aug 31, 2021
9df1804
Don't strip <ARRAY > from parameter types
jimfulton Aug 31, 2021
0df1701
Added system tests to verift PR 67 and issue 233
jimfulton Aug 31, 2021
7aad07f
Merge branch 'struct' of github.com:jimfulton/python-bigquery-sqlalch…
jimfulton Aug 31, 2021
f10a571
blacken
jimfulton Aug 31, 2021
ec31040
Renamed test file to conform to samples test-file naming conventions
jimfulton Sep 1, 2021
accf762
Require google-cloud-bigquery 2.25.2 to get struct field-name undersc…
jimfulton Sep 1, 2021
ef5f891
Added STRUCT documentation
jimfulton Sep 1, 2021
cce9dbb
fix bigquery version
jimfulton Sep 1, 2021
290d955
Merge branch 'main' into struct
jimfulton Sep 1, 2021
b697df6
get blacken to leave sample code alone.
jimfulton Sep 1, 2021
6a278b9
Check in missing file :(
jimfulton Sep 1, 2021
bc62a56
Merge branch 'struct' of github.com:jimfulton/python-bigquery-sqlalch…
jimfulton Sep 1, 2021
84426bd
need sqla 1.4 for unnest
jimfulton Sep 1, 2021
587a0f7
fixed typo
jimfulton Sep 1, 2021
e6f4adf
Merge branch 'main' into struct
jimfulton Sep 2, 2021
ffb5aa9
Merge branch 'main' into struct
jimfulton Sep 2, 2021
47fa14f
Merge branch 'main' into struct
jimfulton Sep 3, 2021
402bbbe
Merge branch 'main' into struct
jimfulton Sep 7, 2021
5bf07b4
Update sqlalchemy_bigquery/_struct.py
jimfulton Sep 7, 2021
e937167
added STRUCT docstring
jimfulton Sep 7, 2021
8661f5b
Add doc link
jimfulton Sep 7, 2021
b550aa1
Merge branch 'struct' of github.com:jimfulton/python-bigquery-sqlalch…
jimfulton Sep 7, 2021
af68a54
Added some comments
jimfulton Sep 7, 2021
da43fd2
Localize logic for getting subtye column specifications
jimfulton Sep 8, 2021
f04cac2
explain semi-private name mangling
jimfulton Sep 8, 2021
5af05bb
Make name magling more explicit
jimfulton Sep 8, 2021
09866c6
explain why we have different implementations of _field_index for SQL…
jimfulton Sep 8, 2021
054c227
get rid of cur_fields, we're not using it anymore.
jimfulton Sep 8, 2021
1a79305
Add a todo to find out why Sqlalchemy doesn't generate an alias when …
jimfulton Sep 8, 2021
5e2ae32
user `repr` rather than `str` to shpow an object in an error message
jimfulton Sep 8, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/alembic.rst
Expand Up @@ -43,7 +43,7 @@ Supported operations:
<https://alembic.sqlalchemy.org/en/latest/ops.html#alembic.operations.Operations.rename_table>`_

Note that some of the operations above have limited capability, again
do to `BigQuery limitations
due to `BigQuery limitations
<https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language>`_.

The `execute` operation allows access to BigQuery-specific
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Expand Up @@ -3,6 +3,7 @@
:maxdepth: 2

README
struct
geography
alembic
reference
Expand Down
69 changes: 69 additions & 0 deletions docs/struct.rst
@@ -0,0 +1,69 @@
Working with BigQuery STRUCT data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The BigQuery `STRUCT data type
<https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type>`_
provided data that are collections of named fields.

`sqlalchemy-bigquery` provided a STRUCT type that can be used to
define tables with STRUCT columns:

.. literalinclude:: samples/snippets/STRUCT.py
:language: python
:dedent: 4
:start-after: [START bigquery_sqlalchemy_create_table_with_struct]
:end-before: [END bigquery_sqlalchemy_create_table_with_struct]

`STRUCT` types can be nested, as in this example. Struct fields can
be defined in two ways:

- Fields can be provided as keyword arguments, as in the `cylinder`
and `horsepower` fields in this example.

- Fields can be provided as name-type tuples provided as positional
arguments, as with the `count` and `compression` fields in this example.

STRUCT columns are automatically created when existing database tables
containing STRUCT columns are introspected.

Struct data are represented in Python as Python dictionaries:

.. literalinclude:: samples/snippets/STRUCT.py
:language: python
:dedent: 4
:start-after: [START bigquery_sqlalchemy_insert_struct]
:end-before: [END bigquery_sqlalchemy_insert_struct]

When querying struct fields, you can use attribute access syntax:

.. literalinclude:: samples/snippets/STRUCT.py
:language: python
:dedent: 4
:start-after: [START bigquery_sqlalchemy_query_struct]
:end-before: [END bigquery_sqlalchemy_query_struct]

or mapping access:

.. literalinclude:: samples/snippets/STRUCT.py
:language: python
:dedent: 4
:start-after: [START bigquery_sqlalchemy_query_getitem]
:end-before: [END bigquery_sqlalchemy_query_getitem]

and field names are case insensitive:
tswast marked this conversation as resolved.
Show resolved Hide resolved

.. literalinclude:: samples/snippets/STRUCT.py
:language: python
:dedent: 4
:start-after: [START bigquery_sqlalchemy_query_STRUCT]
:end-before: [END bigquery_sqlalchemy_query_STRUCT]

When using attribute-access syntax, field names may conflict with
column attribute names. For example SQLAlchemy columns have `name`
and `type` attributes, among others. When accessing a field whose name
conflicts with a column attribute name, either use mapping access, or
spell the field name with upper-case letters.




90 changes: 90 additions & 0 deletions samples/snippets/STRUCT.py
@@ -0,0 +1,90 @@
# Copyright (c) 2021 The sqlalchemy-bigquery Authors
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
# IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


def example(engine):
# fmt: off
# [START bigquery_sqlalchemy_create_table_with_struct]
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, String, Integer, Float
from sqlalchemy_bigquery import STRUCT

Base = declarative_base()

class Car(Base):
__tablename__ = "Cars"

model = Column(String, primary_key=True)
engine = Column(
STRUCT(
cylinder=STRUCT(("count", Integer),
("compression", Float)),
horsepower=Integer)
)

# [END bigquery_sqlalchemy_create_table_with_struct]
Car.__table__.create(engine)

# [START bigquery_sqlalchemy_insert_struct]
from sqlalchemy.orm import sessionmaker

Session = sessionmaker(bind=engine)
session = Session()

sebring = Car(model="Sebring",
engine=dict(
cylinder=dict(
count=6,
compression=18.0),
horsepower=235))
townc = Car(model="Town and Counttry",
engine=dict(
cylinder=dict(
count=6,
compression=16.0),
horsepower=251))
xj8 = Car(model="XJ8",
engine=dict(
cylinder=dict(
count=8,
compression=10.75),
horsepower=575))

session.add_all((sebring, townc, xj8))
session.commit()

# [END bigquery_sqlalchemy_insert_struct]

# [START bigquery_sqlalchemy_query_struct]
sixes = session.query(Car).filter(Car.engine.cylinder.count == 6)
# [END bigquery_sqlalchemy_query_struct]
sixes1 = list(sixes)

# [START bigquery_sqlalchemy_query_STRUCT]
sixes = session.query(Car).filter(Car.engine.CYLINDER.COUNT == 6)
# [END bigquery_sqlalchemy_query_STRUCT]
sixes2 = list(sixes)

# [START bigquery_sqlalchemy_query_getitem]
sixes = session.query(Car).filter(Car.engine["cylinder"]["count"] == 6)
# [END bigquery_sqlalchemy_query_getitem]
# fmt: on
sixes3 = list(sixes)

return sixes1, sixes2, sixes3
27 changes: 27 additions & 0 deletions samples/snippets/STRUCT_test.py
@@ -0,0 +1,27 @@
# Copyright (c) 2021 The sqlalchemy-bigquery Authors
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
# IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


def test_struct(engine):
from . import STRUCT

sixeses = STRUCT.example(engine)

for sixes in sixeses:
assert sorted(car.model for car in sixes) == ["Sebring", "Town and Counttry"]
2 changes: 1 addition & 1 deletion setup.py
Expand Up @@ -83,7 +83,7 @@ def readme():
# Until this issue is closed
# https://github.com/googleapis/google-cloud-python/issues/10566
"google-auth>=1.25.0,<3.0.0dev", # Work around pip wack.
"google-cloud-bigquery>=2.24.1",
"google-cloud-bigquery>=2.25.2,<3.0.0dev",
tswast marked this conversation as resolved.
Show resolved Hide resolved
"sqlalchemy>=1.2.0,<1.5.0dev",
"future",
],
Expand Down
4 changes: 3 additions & 1 deletion sqlalchemy_bigquery/__init__.py
Expand Up @@ -23,7 +23,7 @@
from .version import __version__ # noqa

from .base import BigQueryDialect, dialect # noqa
from .base import (
from ._types import (
ARRAY,
BIGNUMERIC,
BOOL,
Expand All @@ -38,6 +38,7 @@
NUMERIC,
RECORD,
STRING,
STRUCT,
TIME,
TIMESTAMP,
)
Expand All @@ -58,6 +59,7 @@
"NUMERIC",
"RECORD",
"STRING",
"STRUCT",
"TIME",
"TIMESTAMP",
]
Expand Down
124 changes: 124 additions & 0 deletions sqlalchemy_bigquery/_struct.py
@@ -0,0 +1,124 @@
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
jimfulton marked this conversation as resolved.
Show resolved Hide resolved

from typing import Mapping, Tuple

import packaging.version
import sqlalchemy.sql.default_comparator
import sqlalchemy.sql.sqltypes
import sqlalchemy.types

from . import base

sqlalchemy_1_4_or_more = packaging.version.parse(
sqlalchemy.__version__
) >= packaging.version.parse("1.4")

if sqlalchemy_1_4_or_more:
import sqlalchemy.sql.coercions
import sqlalchemy.sql.roles

# We have to delay getting the type compiler, because of circular imports. :(
type_compiler = None


class STRUCT(sqlalchemy.sql.sqltypes.Indexable, sqlalchemy.types.UserDefinedType):
tswast marked this conversation as resolved.
Show resolved Hide resolved
def __init__(
self,
*fields: Tuple[str, sqlalchemy.types.TypeEngine],
**kwfields: Mapping[str, sqlalchemy.types.TypeEngine],
):
self.__fields = tuple(
tswast marked this conversation as resolved.
Show resolved Hide resolved
(
name,
type_ if isinstance(type_, sqlalchemy.types.TypeEngine) else type_(),
)
for (name, type_) in (fields + tuple(kwfields.items()))
)

self.__byname = {name.lower(): type_ for (name, type_) in self.__fields}
tswast marked this conversation as resolved.
Show resolved Hide resolved

def __repr__(self):
fields = ", ".join(f"{name}={repr(type_)}" for name, type_ in self.__fields)
return f"STRUCT({fields})"

def get_col_spec(self, **kw):
global type_compiler

try:
process = type_compiler.process
except AttributeError:
type_compiler = base.dialect.type_compiler(base.dialect())
process = type_compiler.process
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put this in a _get_type_compiler / _get_process function? I don't see anywhere else we initialize type_compiler, but I'd be more comfortable having this logic closer to the # We have to delay getting the type compiler, because of circular imports. :( comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored so this is combined and isolated in one place using a new, better named _get_subtype_col_spec function.


fields = ", ".join(f"{name} {process(type_)}" for name, type_ in self.__fields)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume process is able to handle nested arrays/structs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

return f"STRUCT<{fields}>"

def bind_processor(self, dialect):
return dict

class Comparator(sqlalchemy.sql.sqltypes.Indexable.Comparator):
def _setup_getitem(self, name):
if not isinstance(name, str):
raise TypeError(
f"STRUCT fields can only be accessed with strings field names,"
f" not {name}."
jimfulton marked this conversation as resolved.
Show resolved Hide resolved
)
subtype = self.expr.type._STRUCT__byname.get(name.lower())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does _STRUCT__byname come from? I'm assuming somewhere from SQLAlchemy, but I'm not getting any results when searching for byname.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I figured it out: https://docs.python.org/3/tutorial/classes.html#private-variables

Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped.

Can we comment about this? I assume we have to do it because we know self.expr.type is a STRUCT, but it's not self.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this to make name mangling more explicit and consistent, so I don't think comments are needed anymore. See if you agree. :)

I mainly use "private" variables, which aren't :), to avoid namespace conflicts when subclassing across responsibility boundaries. Arguably, explicit naming is better.

if subtype is None:
raise KeyError(name)
operator = struct_getitem_op
index = _field_index(self, name, operator)
return operator, index, subtype

def __getattr__(self, name):
if name.lower() in self.expr.type._STRUCT__byname:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why self.__byname doesn't work in this case.

Edit: I see now that it's part of the Comparator class. Still probably worth a similar comment to the one I recommend in _setup_getitem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my response on name mangling

return self[name]

comparator_factory = Comparator


if sqlalchemy_1_4_or_more:

tswast marked this conversation as resolved.
Show resolved Hide resolved
def _field_index(self, name, operator):
return sqlalchemy.sql.coercions.expect(
sqlalchemy.sql.roles.BinaryElementRole,
name,
expr=self.expr,
operator=operator,
bindparam_type=sqlalchemy.types.String(),
)


else:

def _field_index(self, name, operator):
return sqlalchemy.sql.default_comparator._check_literal(
self.expr, operator, name, bindparam_type=sqlalchemy.types.String(),
)


def struct_getitem_op(a, b):
raise NotImplementedError()


sqlalchemy.sql.default_comparator.operator_lookup[
struct_getitem_op.__name__
] = sqlalchemy.sql.default_comparator.operator_lookup["json_getitem_op"]


class SQLCompiler:
def visit_struct_getitem_op_binary(self, binary, operator_, **kw):
left = self.process(binary.left, **kw)
return f"{left}.{binary.right.value}"