Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null bytes are handled inconsistently #3110

Open
SOF3 opened this issue May 4, 2024 · 5 comments
Open

Null bytes are handled inconsistently #3110

SOF3 opened this issue May 4, 2024 · 5 comments

Comments

@SOF3
Copy link
Contributor

SOF3 commented May 4, 2024

Describe the bug
A clear and concise description of what the bug is.

Whitespace-delimited NUL bytes are sometimes parsed as zero values but sometimes not.

To Reproduce
Provide a minimal test case to reproduce the behavior.
If the input is large, either attach it as a file, or create a gist and link to it here.

$ for cmd in xxd jq; do printf '1\r\x00\n\x00\n1\n\x00 \x00' | $cmd; done
00000000: 310d 000a 000a 310a 0020 00              1.....1.. .
1
0
0
1

(Btw, U+000D is a valid whitespace character according to RFC 8259, but does not seem to be included in the lexer. I am not familiar with flex so I don't know if there's some magic going on there)

jq/src/lexer.l

Line 133 in ed8f715

[ \n\t]+ {}

Expected behavior
A clear and concise description of what you expected to happen.

To be honest, I don't know what to expect for null bytes, but I would expect them to be something more consistent.

RFC 8259 does not permit NUL bytes as input, so it is reasonable (although probably unnecessary) to treat them, when outside string literals, either as invalid characters or whitespace. But magically creating a Number(0) value does not look right.

Environment (please complete the following information):

  • OS and Version: [e.g. macOS, Windows, Linux (please specify distro)]
  • jq version [e.g. 1.5]
$ jq --version
jq-1.6

Additional context
Add any other context about the problem here.

@SOF3
Copy link
Contributor Author

SOF3 commented May 4, 2024

Meanwhile, \x22\x00\x22 (" ") reports the following error, which appears to suggest that null bytes in general should not be allowed:

parse error: Unfinished string at EOF at line 1, column 1

@emanuele6
Copy link
Member

src/lexer.l is the jq lexer; not the json lexer

@emanuele6
Copy link
Member

emanuele6 commented May 4, 2024

jq 1.6 is an old version; I tried your example and I get a parse error:

$ printf '1\r\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 2, column 0

So, if NUL is supposed to be whitespace as you are saying (have not checked), that is wrong; but it does not return 0 for the NULs.

@emanuele6
Copy link
Member

emanuele6 commented May 4, 2024

Meanwhile, \x22\x00\x22 (" ") reports the following error, which appears to suggest that null bytes in general should not be allowed:

parse error: Unfinished string at EOF at line 1, column 1

@SOF3 That is just standard JSON as specified in https://json.org

You cannot have literal ASCII control characters (with the exception of DEL U+007f; mentioned in the rfc) in JSON strings.

@emanuele6
Copy link
Member

But the parser does seem to get confused by NUL when it is used as whitespace in the input:

$ printf '1\0 2 ' | jq      # stops parsing after NUL
1
$ printf '1\0 2\n' | jq     # treats NUL as whitespace
1
2
$ printf '1\r\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 2, column 0
$ printf '1\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 3, column 0
$ printf '1\x00\x00\n1\n\x00 \x00' | jq
1
1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants