Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong results returned when querying a RECORD column #265

Open
prismec opened this issue Jan 31, 2024 · 5 comments
Open

Wrong results returned when querying a RECORD column #265

prismec opened this issue Jan 31, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@prismec
Copy link

prismec commented Jan 31, 2024

What happened?

Step 1: Create a table record_test with a record column containing a numeric field

[
  {
    "name": "rec",
    "mode": "REQUIRED",
    "type": "RECORD",
    "fields": [
      {
        "name": "value",
        "mode": "REQUIRED",
        "type": "NUMERIC",
      }
    ]
  }
]

Step 2: Insert a row

INSERT INTO record_test (rec) VALUES (STRUCT(47.123456789 AS value))

Step 3: Query the table using the bq tool

SELECT * FROM record_test

+------------------------------------+
|                rec                 |
+------------------------------------+
| {"value":"47123456789/1000000000"} |
+------------------------------------+

SELECT rec FROM record_test

+------------------------------------+
|                rec                 |
+------------------------------------+
| {"value":"47123456789/1000000000"} |
+------------------------------------+

SELECT rec.* FROM record_test

+--------------+
|    value     |
+--------------+
| 47.123456789 |
+--------------+

SELECT rec.value FROM record_test

+--------------+
|    value     |
+--------------+
| 47.123456789 |
+--------------+

Selecting rec.value with the Java BigQuery client works, but selecting rec or * fails with the error that 47123456789/1000000000 can't be parsed when iterating the field value lists.

What did you expect to happen?

Step 3: Query the table using the bq tool

The following results are returned when using BigQuery instead of the emulator.

SELECT * FROM record_test

+--------------+
|  rec_value   |
+--------------+
| 47.123456789 |
+--------------+

SELECT rec FROM record_test

+--------------------------+
|           rec            |
+--------------------------+
| {"value":"47.123456789"} |
+--------------------------+

SELECT rec.* FROM record_test

+--------------+
|  rec_value   |
+--------------+
| 47.123456789 |
+--------------+

SELECT rec.value FROM record_test

+--------------+
|  rec_value   |
+--------------+
| 47.123456789 |
+--------------+

Also iterating via BigQuery client libraries should work as expected for all the statements.

Note that also the returned column names are different.

How can we reproduce it (as minimally and precisely as possible)?

Follow the steps as described above

Anything else we need to know?

No response

@prismec prismec added the bug Something isn't working label Jan 31, 2024
@prismec
Copy link
Author

prismec commented Jan 31, 2024

The behaviour for TIMESTAMP fields of records is also affected (and maybe also others)

In this case, selecting the record returns '2023-08-01T21:22:23Z' for the inserted timestamp value 2023-08-01T21:22:23.123456Z. Educated guess is that the expected results should be a numeric timestamp value. Also the precision of the returned value is degraded.

bq outputs for this case

SELECT * FROM record_test

+------------------------------------------------------------------------------------------------+
|                                              rec                                               |
+------------------------------------------------------------------------------------------------+
| {"numeric_value":"47123456789/1000000000","timestamp_value":"<date out of range for display>"} |
+------------------------------------------------------------------------------------------------+

SELECT rec.timestamp_value FROM record_test

+---------------------+
|   timestamp_value   |
+---------------------+
| 2023-08-01 21:22:23 |
+---------------------+

@totem3
Copy link
Sponsor Collaborator

totem3 commented Jan 31, 2024

Thank you for the detailed report.
I have confirmed that there are indeed differences in output compared to BigQuery, and also outputs vary between values within a structure and those that are not.

I briefly investigated the cause and would like to document it.

Internally, in go-zetasqlite which we use for constructing responses, there is a difference in the method of construction for structured and non-structured data.

For Numeric types, we convert to a string using NumericValue's ToString.
https://github.com/goccy/go-zetasqlite/blob/e0a36047b564fab7bc36f17031c86225dabf9fea/internal/rows.go#L246-L251

In the case of a Struct, we first convert it to a StructValue using StructValue's ToStruct.
https://github.com/goccy/go-zetasqlite/blob/e0a36047b564fab7bc36f17031c86225dabf9fea/internal/rows.go#L297-L302

Then, we use StructValue's Interface to convert it to interface{}. In this process, we recursively call Interface for each field to convert all fields.

The definition of NumericValue's Interface is as follows, nv.Rat.String().
https://github.com/goccy/go-zetasqlite/blob/e0a36047b564fab7bc36f17031c86225dabf9fea/internal/value.go#L798-L800
This indeed returns a value like 47123456789/1000000000.

On the other hand, the definition of ToString is as follows, which indeed returns a value like 47.123456789.

Due to these differences, it seems that outputs within structures are different for other types as well.

@ohaibbq
Copy link
Contributor

ohaibbq commented Feb 1, 2024

I can take a look at this soon if you could assign it to me @totem3.
I have a pending PR up that fixes some API-level value encoding issues similar to this one at goccy/go-zetasqlite#111

@totem3
Copy link
Sponsor Collaborator

totem3 commented Feb 1, 2024

@ohaibbq

Thank you for your response. As I haven’t started working on it yet, I would be grateful if you could address this issue.

@ohaibbq
Copy link
Contributor

ohaibbq commented Feb 7, 2024

I haven't taken a look at this yet. If you have some time @totem3, that'd be appreciated.
I'm currently looking at if it'd be possible to rewrite how go-zetasqlite handles window functions to fix goccy/go-zetasqlite#153 and goccy/go-zetasqlite#161 as it'd block our usage from further adoption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants