Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON_GROUP_STRUCTURE is now (incorrectly?) case sensitive #11886

Closed
2 tasks done
ankrgyl opened this issue Apr 30, 2024 · 4 comments · Fixed by #11948
Closed
2 tasks done

JSON_GROUP_STRUCTURE is now (incorrectly?) case sensitive #11886

ankrgyl opened this issue Apr 30, 2024 · 4 comments · Fixed by #11948

Comments

@ankrgyl
Copy link
Contributor

ankrgyl commented Apr 30, 2024

What happens?

select json_group_structure(json('{"a": 1, "A": 1}'));

now returns

Invalid Input Error: Duplicate key "A" in object {"a":1,"A":1}

This did not repro with 10.1

To Reproduce

see above

OS:

mac os x, wasm, etc

DuckDB Version:

v0.10.2

DuckDB Client:

all

Full Name:

Ankur Goyal

Affiliation:

Braintrust

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
@ankrgyl ankrgyl changed the title JSON_GROUP_STRUCTURE is now case sensitive JSON_GROUP_STRUCTURE is now (incorrectly?) case sensitive Apr 30, 2024
@lnkuiper
Copy link
Contributor

lnkuiper commented May 3, 2024

Hi @ankrgyl, we use the same function to auto-detect the schema in our JSON reader. DuckDB is case-insensitive, and duplicate column names are not allowed in tables/structs, so this behavior is the result of a fix (#11271).

The json_structure function is just a function that exposes this automatic type detection, so its behavior also changed.

There are two options, we can either:

  1. Throw the error (like we do now)
  2. Ignore the error and return {"a":"UBIGINT","A":"UBIGINT"}, but you will not be able to use this as an input to json_transform (like before)

I'd happily receive your input on what you would prefer here.

@ankrgyl
Copy link
Contributor Author

ankrgyl commented May 4, 2024

Thanks for the clarification. In our case, we actually use json_group_structure without using json_transform. Specifically, we:

  • Call json_group_structure to find the distinct set of fields
  • Populate a UI dropdown based on those fields
  • If the user selects one of the subfields, run a query using json_extract (which is case sensitive) to project / group by it

So my preference would be option 2. One suggestion could be that json_transform could error if you try to use it with something like {"a":"UBIGINT","A":"UBIGINT"}, or you could have a flag to json_group_structure which allows you to pick your adventure.

@lnkuiper
Copy link
Contributor

lnkuiper commented May 6, 2024

Thanks, I've sent out a PR that fixes this for json_structure specifically, without changing the behavior of our JSON reader.

@ankrgyl
Copy link
Contributor Author

ankrgyl commented May 6, 2024

Appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants