Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on indents #9855

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented May 3, 2024

Pull Request Description

Modify the parser to report an error on wrong indentation fixes #9419.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

changes, a screencast is preferred.

  • All code follows the Rust and Java style guides.
  • Unit tests have been written where possible.

@JaroslavTulach JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label May 3, 2024
@JaroslavTulach JaroslavTulach self-assigned this May 3, 2024
@@ -675,6 +675,16 @@ fn code_block_bad_indents1() {
test(&code.join("\n"), expected);
}

#[test]
fn fail_on_bad_indents() {
let code = ["group =", " space4", " space8", " space5"];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is:

group =
    space4
        space8
     space5

e.g. block with indentation 4, followed by nested block with indentation 8 and then block with indentation 5 - a typical sign of error.

Right now the test succeeds - the goal is to make it fail!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the 350c2c3 change I am getting a failure in following tests:

code_block_bad_indents1
code_block_bad_indents2
fail_on_bad_indents

all the tests seem to resemble to problem reported in #9419 (comment) - e.g. it seems good that they are now failing. However I am not sure if the failure is the right one? I tried to insert Variant::invalid() - is that the right way to mark an indentation problem, @kazcw? Thanks in advance for your guidance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expecting invalid node since 3e8e6c0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

token::Variant::invalid() is for an illegal token (e.g. an interpolation-quote outside of a text literal)--in this case the error it produces wouldn't be associated with the right span (which should be the rest of the tokens in the line). The marker token will also need to be located at the end of the preceding newline, not the beginning--token code references must cover the source code in order.

If this indentation-handling change is to be made, it could be done like this:

  • In the lexer, construct a token of a new type InvalidRestOfLine { error: Cow<'static, str> }. Use the marker_token function to attach the correct location information for a zero-length marker.
  • In the parser, we'll use a macro to handle it. The macro resolver isn't token-type aware, so we'll need to add the ability to use a zero-length marker token of a specific type as a macro segment header.
    • Define a macro that matches the header <InvalidRestOfLine> (this is the string we'll use to name the marker token; it would never match any real single token), followed by everything(). Its implementation parses its tokens and wraps them in an Invalid1.
    • In the macro resolver, in process_token, before looking up the token code in any macro maps, check if it is zero-length; if it is, match against the token's variant, and if it is InvalidRestOfLine look up <InvalidRestOfLine> instead of the actual (empty) token code.

Let me know if you have any questions!

Footnotes

  1. In the macro implementation, one tricky case will be if precedence.resolve returns None. This shouldn't be reachable now, but it might be possible with future uses of InvalidRestOfLine, or in case of a bug (like the one I noted in another comment). The difficulty is that a Tree must still be created, and its spans (though 0-length) must still be appropriately-positioned within the document. You can handle this by applying into_ident to the header token, and putting the result in a dummy Tree::Ident node, to have something to wrap with Invalid.

@JaroslavTulach JaroslavTulach changed the title Currently the test parses bad indents without an error Error on indents May 6, 2024
@JaroslavTulach
Copy link
Member Author

var errors = ir.preorder().filter((node) -> node instanceof Syntax);
assertEquals(1, errors.size());
var first = (Syntax) errors.head();
assertEquals(Syntax.UnexpectedExpression$.MODULE$, first.reason());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the error message?
Is it possible to have the message mention 'invalid indentation level'? That would be much more helpful than just 'unexpected expression'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I need to get that information from the Rust side. How do you want me to do it, @kazcw?

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.

It would be great to have a situation-specific syntax error message for this, to make the errors more user-friendly.

@@ -1397,6 +1397,9 @@ impl<'s> Lexer<'s> {
// The new line indent is smaller than current block but bigger than the
// previous one. We are treating the line as belonging to the
// block. The warning should be reported by parser.
let location = newline.left_offset.code.position_before();
let offset = Offset(VisibleOffset(0), location.clone());
self.submit_token(Token(offset, location, token::Variant::invalid()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When end_blocks is called when the input is at EOF, an error should not be emitted for a partial-indent in a whitespace-only final line. This test should pass:

fn partial_indent_at_eof_is_not_an_error() {
    test("main =\n    42\n   ", block![(Function /* ... */)]);
}

@@ -675,6 +675,16 @@ fn code_block_bad_indents1() {
test(&code.join("\n"), expected);
}

#[test]
fn fail_on_bad_indents() {
let code = ["group =", " space4", " space8", " space5"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

token::Variant::invalid() is for an illegal token (e.g. an interpolation-quote outside of a text literal)--in this case the error it produces wouldn't be associated with the right span (which should be the rest of the tokens in the line). The marker token will also need to be located at the end of the preceding newline, not the beginning--token code references must cover the source code in order.

If this indentation-handling change is to be made, it could be done like this:

  • In the lexer, construct a token of a new type InvalidRestOfLine { error: Cow<'static, str> }. Use the marker_token function to attach the correct location information for a zero-length marker.
  • In the parser, we'll use a macro to handle it. The macro resolver isn't token-type aware, so we'll need to add the ability to use a zero-length marker token of a specific type as a macro segment header.
    • Define a macro that matches the header <InvalidRestOfLine> (this is the string we'll use to name the marker token; it would never match any real single token), followed by everything(). Its implementation parses its tokens and wraps them in an Invalid1.
    • In the macro resolver, in process_token, before looking up the token code in any macro maps, check if it is zero-length; if it is, match against the token's variant, and if it is InvalidRestOfLine look up <InvalidRestOfLine> instead of the actual (empty) token code.

Let me know if you have any questions!

Footnotes

  1. In the macro implementation, one tricky case will be if precedence.resolve returns None. This shouldn't be reachable now, but it might be possible with future uses of InvalidRestOfLine, or in case of a bug (like the one I noted in another comment). The difficulty is that a Tree must still be created, and its spans (though 0-length) must still be appropriately-positioned within the document. You can handle this by applying into_ident to the header token, and putting the result in a dummy Tree::Ident node, to have something to wrap with Invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expressions not being run if indented by 1 more space than current indentation level
4 participants