Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should L24T/01/in.yaml and JEF9/02/in.yaml end with \n? #118

Open
camilstaps opened this issue Jun 18, 2022 · 2 comments
Open

Should L24T/01/in.yaml and JEF9/02/in.yaml end with \n? #118

camilstaps opened this issue Jun 18, 2022 · 2 comments

Comments

@camilstaps
Copy link

camilstaps commented Jun 18, 2022

I'm looking at an issue of my parser with L24T/01, added in #105. My parser gives back a string without trailing newline.

If I look at L24T/01/in.yaml in the data-2022-01-17 tag, the file does not have a trailing newline:

$ hexdump -C yaml-test-suite/L24T/01/in.yaml 
00000000  66 6f 6f 3a 20 7c 0a 20  20 78 0a 20 20 20        |foo: |.  x.   |
0000000e

If my understanding of the spec is correct, b-chomped-last would then use the alternative b-as-line-feed rather than <end-of-input>, so there should be no trailing newline parsed. Is this correct? Should the file be generated with a trailing newline, or should out.yaml be adapted? Or is my understanding of the spec incorrect?

I have the same issue with JEF9/02, added in #90.

Pinging @ingydotnet & @perlpunk because you were involved in the PR that added this test.

@camilstaps camilstaps changed the title Should L24T/01/in.yaml end with \n? Should L24T/01/in.yaml and JEF9/02/in.yaml end with \n? Jun 18, 2022
@ingydotnet
Copy link
Member

I recall looking into something like this where the presence of a final newline might affect the meaning.

I am of the firm opinion that the presence or absence of a final line break should not ever change the meaning of a yaml stream.

That said, I'm not 100% sure the 1.2.2 spec enforces that.

All the implementations we regularly test agree on the meaning as seen here:
https://play.yaml.io/main/parser?input=Zm9vOiB8CiAgeAogIA==

This should probably be addressed as 1.2.2 errata (if it shown to be misrepresented in the productions) and fixed in 1.2.3.

Looking at the spec link you provided:

b-chomped-last(CLIP)  ::= [b-as-line-feed](https://yaml.org/spec/1.2.2/#rule-b-as-line-feed) | <end-of-input>

would match here and CLIP is explained as

Clipping is the default behavior used if no explicit chomping indicator is specified. In this case, the final line break character is preserved in the scalar’s content. However, any trailing empty lines are excluded from the scalar’s content.

In my opinion, this explanation is assuming that all lines end with a line break.
This should be explicitly made clear in that explanation.

I'd go forward with handling all lines as if they end with a break.

Looking forward to your implementation, Ingy

@camilstaps
Copy link
Author

Thanks a lot!

In my opinion, this explanation is assuming that all lines end with a line break.

This makes sense to me.

Could you clarify a bit where you plan to fix this? At first I thought I could just interpret end-of-line as \n in b-chomped-last(KEEP|CLIP). But this is not enough for JEF9/02. Trying to follow the spec by hand, I end up in l-literal-content(3,KEEP). Then the first optional part of l-literal-content does not apply, because l-nb-literal-text requires nb-char+ and the spaces are all indentation. So l-chomped-empty(3,KEEP) should match ␣␣␣∎. This boils down to l-empty(3,BLOCK_IN)*. But this rule ends with a b-as-line-feed which is not in the input.

Alternatively I suppose you could let b-as-line-feed and perhaps b-non-content accept end-of-input. In my implementation this leads to infinite recursion. I assume that this is because there is an environment in which the end-of-input is then quantified over with * or + so that the parser does not move forward. Then this may just be a problem of my implementation and could possibly be fixed by requiring the parser to move forward in */+.

Or yet another way could be to accept end-of-input next to b-as-line-feed in l-empty(n,KEEP).

As my implementation uses parser combinators it follows the spec very closely, so knowing which way you're going would help to keep the two in sync. But there is no urgency on my side so I'm also perfectly happy to wait for an erratum / change for 1.2.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants