Fix partitioning by DATE column #1057

bnaul · 2024-04-05T16:45:24Z

chalmerlowe · 2024-04-15T18:10:39Z

sqlalchemy_bigquery/base.py

        if time_partitioning.field is not None:
            field = time_partitioning.field
            if isinstance(
+                table.columns[time_partitioning.field].type,
+                sqlalchemy.sql.sqltypes.DATE,
+            ):
+                return f"PARTITION BY {field}"
+            elif isinstance(
                table.columns[time_partitioning.field].type,
                sqlalchemy.sql.sqltypes.TIMESTAMP,


@tswast this PR is looking at adding logic to also handle partitioning on a date above and beyond using just DATE_TRUNC and TIMESTAMP_TRUNC functions.

Looking at the breakdown :TIMESTAMP_TRUNC: and :DATE_TRUNC: of how bigquery accepts time partitioning, it feels like we are missing a variety of the options/possibilities.

Am I missing something?

@chalmerlowe can you clarify what you think is missing? it looks like the API expects one of

Supported values are: * HOUR * DAY * MONTH * YEAR

so it seems like TIMESTAMP_TRUNC would work for all of those and DATE would work for MONTH and YEAR, so all that's missing is just plain DATE

Column Data Type HOUR DAY MONTH YEAR

DATE N/A Fixed by #1057 via DATE_TRUNC via DATE_TRUNC

DATETIME Currently via DATE_TRUNC TODO: use DATETIME_TRUNC Currently via DATE_TRUNC TODO: use DATETIME_TRUNC Currently via DATE_TRUNC TODO: use DATETIME_TRUNC Currently via DATE_TRUNC TODO: use DATETIME_TRUNC

TIMESTAMP via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC via TIMESTAMP_TRUNC

_PARTITIONDATE N/A via DATE_TRUNC via DATE_TRUNC via DATE_TRUNC

_PARTITIONTIME Not supported TODO: USE TIMESTAMP_TRUNC Not supported TODO: USE TIMESTAMP_TRUNC Not supported TODO: USE TIMESTAMP_TRUNC Not supported TODO: USE TIMESTAMP_TRUNC

@bnaul For clarity, my question was less about whether this PR is correct and more focused on the bigger question:

Since we have the function _process_time_partitioning() and it handles two use cases (soon to be three based on this PR), does it cover all the use cases that it ought to?

Tim and I looked at this. I had a matrix to help me ID what the function does/doesn't do/should do.

Tim cleaned it up and posted it above. There are capabilities that are satisfied via an existing BigQuery function (i.e. DATE_TRUNC). But there are use cases that are not fully satisfied OR not supported at all. Tim provides suggestions for how to resolve those gaps.

NOTE: In no way am I saying you need to do all this work. This is more for me as a maintainer to figure out what I want the end state to be either through this PR or a follow-on PR.

ah yes that matrix is very helpful, thanks!

Just curious @chalmerlowe @tswast, any more thoughts on whether this PR should go ahead or wait on additional changes? To me it's distinct enough from the other "Not supported" cases above that it makes sense to keep them separate, but obviously I'm a bit biased 🙂

chalmerlowe · 2024-05-02T19:22:21Z

@bnaul

Updated: Copied the matrix to a new issue. Beyond that I intend to accept this PR.
Will need to run it through the kokoro test suite and will likely take a last look at it as it stands to make sure I am not missing anything in regards to adding this nugget of functionality.

chalmerlowe · 2024-05-02T20:31:00Z

The reason the Kokoro check is failing is because we do not detect full test coverage.
The coverage tool is set to monitor our overall unit test suite and confirm that the unit tests cover all the paths through the code. We do NOT apply coverage to assessing the tests that run during system testing and are found in the system folder.

Because the test we have included here is in the system folder and we do not get credit for testing the new branch.

I recommend adding a test to the unit folder that covers this pathway.

chalmerlowe · 2024-05-06T17:46:02Z

@bnaul

I was looking at the possibility of adding a unit test or two to this and run into a snag.
Checking with @tswast offline to see if I can get some clarity on what I am seeing.

More to come.

bnaul requested review from a team as code owners April 5, 2024 16:45

bnaul requested a review from farhan0102 April 5, 2024 16:45

product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. size: xl Pull request size is extra large. and removed size: s Pull request size is small. labels Apr 5, 2024

bnaul force-pushed the date_partition branch from 1490a8a to fd9e6f1 Compare April 5, 2024 18:18

product-auto-label bot added size: s Pull request size is small. and removed size: xl Pull request size is extra large. labels Apr 5, 2024

fix: Fix partitioning by DATE column

43ac9e8

Fixes googleapis#1056.

bnaul force-pushed the date_partition branch from fd9e6f1 to 43ac9e8 Compare April 5, 2024 18:20

chalmerlowe self-assigned this Apr 8, 2024

chalmerlowe mentioned this pull request Apr 8, 2024

Error when creating tables time partitioned by DATE column with type DAY #1050

Closed

chalmerlowe reviewed Apr 15, 2024

View reviewed changes

chalmerlowe added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels May 2, 2024

yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels May 2, 2024

chalmerlowe mentioned this pull request May 2, 2024

Add capability to partition tables by the full spectrum of supported capabilities #1072

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix partitioning by DATE column #1057

Fix partitioning by DATE column #1057

bnaul commented Apr 5, 2024

chalmerlowe Apr 15, 2024 •

edited

bnaul Apr 15, 2024

tswast Apr 18, 2024

chalmerlowe Apr 19, 2024

bnaul Apr 19, 2024

bnaul Apr 26, 2024

chalmerlowe commented May 2, 2024 •

edited

chalmerlowe commented May 2, 2024

chalmerlowe commented May 6, 2024

Column Data Type	HOUR	DAY	MONTH	YEAR
DATE	N/A	Fixed by #1057	via DATE_TRUNC	via DATE_TRUNC
DATETIME	Currently via DATE_TRUNC TODO: use DATETIME_TRUNC	Currently via DATE_TRUNC TODO: use DATETIME_TRUNC	Currently via DATE_TRUNC TODO: use DATETIME_TRUNC	Currently via DATE_TRUNC TODO: use DATETIME_TRUNC
TIMESTAMP	via TIMESTAMP_TRUNC	via TIMESTAMP_TRUNC	via TIMESTAMP_TRUNC	via TIMESTAMP_TRUNC
_PARTITIONDATE	N/A	via DATE_TRUNC	via DATE_TRUNC	via DATE_TRUNC
_PARTITIONTIME	Not supported TODO: USE TIMESTAMP_TRUNC	Not supported TODO: USE TIMESTAMP_TRUNC	Not supported TODO: USE TIMESTAMP_TRUNC	Not supported TODO: USE TIMESTAMP_TRUNC

Fix partitioning by DATE column #1057

Are you sure you want to change the base?

Fix partitioning by DATE column #1057

Conversation

bnaul commented Apr 5, 2024

chalmerlowe Apr 15, 2024 • edited

Choose a reason for hiding this comment

bnaul Apr 15, 2024

Choose a reason for hiding this comment

tswast Apr 18, 2024

Choose a reason for hiding this comment

chalmerlowe Apr 19, 2024

Choose a reason for hiding this comment

bnaul Apr 19, 2024

Choose a reason for hiding this comment

bnaul Apr 26, 2024

Choose a reason for hiding this comment

chalmerlowe commented May 2, 2024 • edited

chalmerlowe commented May 2, 2024

chalmerlowe commented May 6, 2024

chalmerlowe Apr 15, 2024 •

edited

chalmerlowe commented May 2, 2024 •

edited