Error on indents #9855

JaroslavTulach · 2024-05-03T11:06:02Z

Pull Request Description

Modify the parser to report an error on wrong indentation fixes #9419.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

changes, a screencast is preferred.

All code follows the Rust and Java style guides.
Unit tests have been written where possible.

JaroslavTulach · 2024-05-03T11:08:05Z

lib/rust/parser/debug/tests/parse.rs

@@ -675,6 +675,16 @@ fn code_block_bad_indents1() {
    test(&code.join("\n"), expected);
 }

+#[test]
+fn fail_on_bad_indents() {
+    let code = ["group =", "    space4", "        space8", "     space5"];


The code is:

group = space4 space8 space5

e.g. block with indentation 4, followed by nested block with indentation 8 and then block with indentation 5 - a typical sign of error.

Right now the test succeeds - the goal is to make it fail!

With the 350c2c3 change I am getting a failure in following tests:

code_block_bad_indents1 code_block_bad_indents2 fail_on_bad_indents

all the tests seem to resemble to problem reported in #9419 (comment) - e.g. it seems good that they are now failing. However I am not sure if the failure is the right one? I tried to insert Variant::invalid() - is that the right way to mark an indentation problem, @kazcw? Thanks in advance for your guidance.

Expecting invalid node since 3e8e6c0

token::Variant::invalid() is for an illegal token (e.g. an interpolation-quote outside of a text literal)--in this case the error it produces wouldn't be associated with the right span (which should be the rest of the tokens in the line). The marker token will also need to be located at the end of the preceding newline, not the beginning--token code references must cover the source code in order.

If this indentation-handling change is to be made, it could be done like this:

In the lexer, construct a token of a new type InvalidRestOfLine { error: Cow<'static, str> }. Use the marker_token function to attach the correct location information for a zero-length marker.

In the parser, we'll use a macro to handle it. The macro resolver isn't token-type aware, so we'll need to add the ability to use a zero-length marker token of a specific type as a macro segment header.

Define a macro that matches the header <InvalidRestOfLine> (this is the string we'll use to name the marker token; it would never match any real single token), followed by everything(). Its implementation parses its tokens and wraps them in an Invalid¹.

In the macro resolver, in process_token, before looking up the token code in any macro maps, check if it is zero-length; if it is, match against the token's variant, and if it is InvalidRestOfLine look up <InvalidRestOfLine> instead of the actual (empty) token code.

Let me know if you have any questions!

Footnotes

In the macro implementation, one tricky case will be if precedence.resolve returns None. This shouldn't be reachable now, but it might be possible with future uses of InvalidRestOfLine, or in case of a bug (like the one I noted in another comment). The difficulty is that a Tree must still be created, and its spans (though 0-length) must still be appropriately-positioned within the document. You can handle this by applying into_ident to the header token, and putting the result in a dummy Tree::Ident node, to have something to wrap with Invalid. ↩

JaroslavTulach · 2024-05-06T04:23:37Z

We have few failures:

AvoidIdInstrumentationTagTest.avoidIdInstrumentationInLambdaMapFunctionWithFloor - Compilation aborted due to errors.
Cannot compile Standard/Database.
lexer::tests::test_case_block_bad_indents

radeusgd · 2024-05-06T09:58:21Z

engine/runtime-parser/src/test/java/org/enso/compiler/core/EnsoParserTest.java

+    var errors = ir.preorder().filter((node) -> node instanceof Syntax);
+    assertEquals(1, errors.size());
+    var first = (Syntax) errors.head();
+    assertEquals(Syntax.UnexpectedExpression$.MODULE$, first.reason());


What is the error message?
Is it possible to have the message mention 'invalid indentation level'? That would be much more helpful than just 'unexpected expression'.

I guess I need to get that information from the Rust side. How do you want me to do it, @kazcw?

radeusgd

Cool.

It would be great to have a situation-specific syntax error message for this, to make the errors more user-friendly.

kazcw · 2024-05-06T18:53:15Z

lib/rust/parser/src/lexer.rs

@@ -1397,6 +1397,9 @@ impl<'s> Lexer<'s> {
                // The new line indent is smaller than current block but bigger than the
                // previous one. We are treating the line as belonging to the
                // block. The warning should be reported by parser.
+                let location = newline.left_offset.code.position_before();
+                let offset = Offset(VisibleOffset(0), location.clone());
+                self.submit_token(Token(offset, location, token::Variant::invalid()));


When end_blocks is called when the input is at EOF, an error should not be emitted for a partial-indent in a whitespace-only final line. This test should pass:

fn partial_indent_at_eof_is_not_an_error() { test("main =\n 42\n ", block![(Function /* ... */)]); }

kazcw · 2024-05-06T19:18:05Z

lib/rust/parser/debug/tests/parse.rs

@@ -675,6 +675,16 @@ fn code_block_bad_indents1() {
    test(&code.join("\n"), expected);
 }

+#[test]
+fn fail_on_bad_indents() {
+    let code = ["group =", "    space4", "        space8", "     space5"];


token::Variant::invalid() is for an illegal token (e.g. an interpolation-quote outside of a text literal)--in this case the error it produces wouldn't be associated with the right span (which should be the rest of the tokens in the line). The marker token will also need to be located at the end of the preceding newline, not the beginning--token code references must cover the source code in order.

If this indentation-handling change is to be made, it could be done like this:

In the lexer, construct a token of a new type InvalidRestOfLine { error: Cow<'static, str> }. Use the marker_token function to attach the correct location information for a zero-length marker.

In the parser, we'll use a macro to handle it. The macro resolver isn't token-type aware, so we'll need to add the ability to use a zero-length marker token of a specific type as a macro segment header.

Define a macro that matches the header <InvalidRestOfLine> (this is the string we'll use to name the marker token; it would never match any real single token), followed by everything(). Its implementation parses its tokens and wraps them in an Invalid¹.

In the macro resolver, in process_token, before looking up the token code in any macro maps, check if it is zero-length; if it is, match against the token's variant, and if it is InvalidRestOfLine look up <InvalidRestOfLine> instead of the actual (empty) token code.

Let me know if you have any questions!

Footnotes

In the macro implementation, one tricky case will be if precedence.resolve returns None. This shouldn't be reachable now, but it might be possible with future uses of InvalidRestOfLine, or in case of a bug (like the one I noted in another comment). The difficulty is that a Tree must still be created, and its spans (though 0-length) must still be appropriately-positioned within the document. You can handle this by applying into_ident to the header token, and putting the result in a dummy Tree::Ident node, to have something to wrap with Invalid. ↩

Currently the test parses bad indents without an error

f2ec4e0

JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label May 3, 2024

JaroslavTulach requested review from radeusgd and Akirathan May 3, 2024 11:06

JaroslavTulach self-assigned this May 3, 2024

JaroslavTulach requested review from mwu-tow, farmaazon, kazcw, vitvakatu and Frizi as code owners May 3, 2024 11:06

JaroslavTulach commented May 3, 2024

View reviewed changes

Make the fail_on_bad_indents test fail

350c2c3

enso-bot bot mentioned this pull request May 4, 2024

Treat errors from Frgaal compiler as errors, not as warnings #8513

Closed

Let wrong indentation produce Tree.Invalid

3e8e6c0

JaroslavTulach requested review from 4e6 and hubertp as code owners May 6, 2024 03:05

JaroslavTulach changed the title ~~Currently the test parses bad indents without an error~~ Error on indents May 6, 2024

hubertp approved these changes May 6, 2024

View reviewed changes

Fixing broken indentation

fc82e3a

JaroslavTulach requested review from jdunkerley, GregoryTravis and AdRiley as code owners May 6, 2024 07:42

radeusgd reviewed May 6, 2024

View reviewed changes

radeusgd approved these changes May 6, 2024

View reviewed changes

kazcw requested changes May 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on indents #9855

Error on indents #9855

JaroslavTulach commented May 3, 2024 •

edited

JaroslavTulach May 3, 2024

JaroslavTulach May 3, 2024 •

edited

JaroslavTulach May 6, 2024

kazcw May 6, 2024

JaroslavTulach commented May 6, 2024

radeusgd May 6, 2024

JaroslavTulach May 6, 2024

radeusgd left a comment

kazcw May 6, 2024

kazcw May 6, 2024

Error on indents #9855

Are you sure you want to change the base?

Error on indents #9855

Conversation

JaroslavTulach commented May 3, 2024 • edited

Pull Request Description

Checklist

JaroslavTulach May 3, 2024

Choose a reason for hiding this comment

JaroslavTulach May 3, 2024 • edited

Choose a reason for hiding this comment

JaroslavTulach May 6, 2024

Choose a reason for hiding this comment

kazcw May 6, 2024

Choose a reason for hiding this comment

Footnotes

JaroslavTulach commented May 6, 2024

radeusgd May 6, 2024

Choose a reason for hiding this comment

JaroslavTulach May 6, 2024

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

kazcw May 6, 2024

Choose a reason for hiding this comment

kazcw May 6, 2024

Choose a reason for hiding this comment

Footnotes

JaroslavTulach commented May 3, 2024 •

edited

JaroslavTulach May 3, 2024 •

edited