Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: BigQuery columns sort_order attribute is incorrect #2225

Open
dlahyani opened this issue Jan 1, 2024 · 1 comment
Open

Bug Report: BigQuery columns sort_order attribute is incorrect #2225

dlahyani opened this issue Jan 1, 2024 · 1 comment
Labels
status:needs_triage For all issues that need to be processed type:bug An unexpected problem or unintended behavior

Comments

@dlahyani
Copy link

dlahyani commented Jan 1, 2024

Expected Behavior

When using the BigQueryMetadataExtractor to extract tables schema from BigQuery the values of the ColumnMetadata.sort_order attribute should reflect the ordinal position of the column in BigQuery, i.e. the ordinal_position of the column as reported by the <data_set_name>.INFORMATION_SCHEMA.COLUMN table. Such that the column with ordinal_position=1 should get sort_order=1, the column with ordinal_position=2 should get sort_order=2, the column with ordinal_position=3 should get sort_order=3, or more general the column at index i gets sort_order=i.

Current Behavior

While the order of columns seems to be correct, the values in ColumnMetadata.sort_order seem to be inaccurate and do not match the ordinal_position of the column as specified in the information schema table.

The ColumnMetadata.sort_order seems to be getting only odd numbers, such that the column with ordinal_position=1 gets sort_order=1, the column with ordinal_position=2 gets sort_order=3, the column with ordinal_position=3 gets sort_order=5, or generally a column with oridinal_position=i gets sort_order=(i*2 - 1).

Possible Solution

When calling the _iterate_over_cols method, the total_cols parameter should be populated with the real number of columns processed so far. For example, in this line pass total_cols as is instead of total_cols + 1.

And inside the _iterate_over_cols, when creating the ColumnMetadata instance the sort_order should be set to total_cols+1 and match the return value of the function.

Your Environment

  • Amunsen version used: amundesen-databuilder version 7.4.4
  • Data warehouse stores: BigQuery
  • Python: 3.11.3
@dlahyani dlahyani added status:needs_triage For all issues that need to be processed type:bug An unexpected problem or unintended behavior labels Jan 1, 2024
Copy link

boring-cyborg bot commented Jan 1, 2024

Thanks for opening your first issue here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:needs_triage For all issues that need to be processed type:bug An unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

1 participant