Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nan123 #3023

Open
pkoppstein opened this issue Jan 29, 2024 · 6 comments
Open

nan123 #3023

pkoppstein opened this issue Jan 29, 2024 · 6 comments
Labels

Comments

@pkoppstein
Copy link
Contributor

jq is inconsistent in the way it handles squished entities when reading from STDIN:

# sometimes it's able to distinguish them properly
$ echo '1[]' | jq .
1
[]

# ... and sometimes an error is raised:
$ echo '1 +6 1+6' | jq .
1
6
jq: parse error: Invalid numeric literal at line 2, column 0

# and in the case of `nan` it simply skips over the interloper:
$ echo 'nan123 456' | jq .
null
456

Swallowing input silently probably isn't such a good idea.

Incidentally, nan123 was an oss-fuzz idea (so to speak).

@nicowilliams
Copy link
Contributor

jq can and does parse JSON texts concatenated without whitespace when the parse would be unambiguous, but one really should include some whitespace when concatenating JSON texts. It's not clear to me that true123 or nan123 are unambiguous -- they kinda look like they are, but they also kinda look like they are just malformed texts (after all, nan is not valid JSON). Whereas {}{} is unambiguous, and so is [][]. It's also not clear to me that we have committed to parsing concatenations of JSON texts w/o whitespace -- even if it looks like we might have, I'm willing to declare that in fact we don't.

@nicowilliams
Copy link
Contributor

So I guess maybe we want to discuss whether we want to support parsing of JSON texts concatenated w/o whitespace(s).

How should jq parse truefalse? Options: a) true, then false, b) error.

How should jq parse truef? Options: a) true then error, b) error.

Maybe truefalse really is meant to be two JSON texts concatenated w/o whitespace. And maybe truef is too, but then ENOSPC struck and only the f in false got appended, or maybe the process doing the concatenation got killed in the middle of writing false. But maybe it's just a typo. How should we know?

And so it goes for all combinations of texts other than objects, arrays, and strings.

Back when I wrote RFC 7464 we had lengthy discussions on the WG mailing list about this. My original intent had been to standardize the "jsonlines" data format, but as this was going to be a Standards-Track RFC the WG consensus is what mattered, and the WG consensus was that we should do something to disambiguate append failures, and thus JSON text sequences are nothing like "jsonlines". But the discussion you'll find in the JSON WG mailing list archives may yield some insights.

The argument that truef should be parsed as true followed by an error is not at all devoid of value. But I want specifically to understand why we should devote any of our limited developer cycles to making jq do that.

@itchyny
Copy link
Contributor

itchyny commented Jan 29, 2024

No, this is positive NaN with payload so nan123 is parsed as one value. This is confusing behavior though.

@pkoppstein
Copy link
Contributor Author

To be clear, the point is that the current processing of nan123 entails silent loss of data (or at least silent loss of presumptive data) and thus differs from the other exemplars that I mentioned: (1) 1[] (no error, no loss of data) and (2) 1+6 (error, no silent loss of data).

It seems to me that silent loss of data is akin to silent errors, and thus devoutly to be avoided.

@nicowilliams
Copy link
Contributor

@itchyny

No, this is positive NaN with payload so nan123 is parsed as one value. This is confusing behavior though.

I'm aware of signalling NaNs, but again, JSON doesn't support NaNs, and while jq does, jq doesn't support creating a NaN with a particular signal or reading the signal from a NaN value either. jq does have a JSON extension to parse NaNs, but it doesn't support signals in those either.

@pkoppstein

It seems to me that silent loss of data is akin to silent errors, and thus devoutly to be avoided.

Ah, I see. Sure, nan123 456 should either a) produce a NaN, 123, and 456, or b) produce an error and 456. I'm inclined to go with (b) for now.

@itchyny
Copy link
Contributor

itchyny commented Jan 30, 2024

I just wanted to mention the numbers are consumed as a part of NaN token, knowing it's not actually handled in the execution. https://jqplay.org/s/7vOnfN4jJIM

@itchyny itchyny added the bug label Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants