Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unterminated ifs and loops execute instead of throwing syntax error #2327

Open
maetshju opened this issue Feb 21, 2023 · 7 comments
Open

Unterminated ifs and loops execute instead of throwing syntax error #2327

maetshju opened this issue Feb 21, 2023 · 7 comments

Comments

@maetshju
Copy link

When using control structures like if and for, the scripting language parser is not checking for a terminating symbol for the structure before executing the code.

For example, all of the following scripts run without raising a syntax error:

if 1 = 1
    writeInfoLine: 1
for i from 1 to 10
    writeInfoLine: i
while 1
    writeInfoLine: 1
repeat
    writeInfoLine: 1

The loops will run the interior code block once. Other programming languages I work with (like Julia, R, and Bash) would raise a syntax error here for not terminating the code block appropriately with equivalents of endif, endfor, endwhile, or until.

This feels like a bug in how the language grammar is parsed, but I'm not sure if this was perhaps intended for some reason. If this was intended in the parser, I would put forth as a feature request that these unterminated control structures throw syntax errors instead of executing.

@maetshju
Copy link
Author

maetshju commented Feb 22, 2023

I've been reading some of the code in sys/Interpreter.cpp, and I suspect some of the error-checking lines aren't actually being hit.

For example, this line

Melder_throw (U"Unmatched 'for' (matching 'endfor' not found).");

is guarded by

if (loopVariable > toValue) {

If the for loop is written as

for i from 11 to 10
    writeInfoLine: 1

to make sure the guard is passed, the Umatched 'for' line is reached and the Umatched 'for' error is thrown.

Similarly, writing a while loop as

i = 2
while i < 2
    writeInfoLine: 1

seems to hit

Melder_throw (U"Unmatched 'while'.");
and throw the Umatched 'while' error.

@maetshju
Copy link
Author

I've spent some time looking over how loops are interpreted this afternoon. The general while interpretation strategy I have understood from Interpreter.cpp is the following (Markdown formatting for this is a bit weird):

  1. Encounter the keyword while
  2. Evaluate the guard after while
    • 2.1. If the guard is true, continue eagerly interpreting each subsequent line.
      • 2.1.1. If endwhile encountered, then work backward and repeat from 1.
    • 2.2. If the guard is false, find endwhile.isomorphicisomorphic
      • 2.2.1. If endwhile found, move past it.
      • 2.2.2. If endwhile not found, throw unmatched while error.

There is, of course, some loop-depth checking and other error checking (e.g., text after endwhile), but I think I've captured the core strategy. I haven't looked at for or repeat until as much, but they seemed similar (and loops are isomorphic to each other). I am also not entirely sure what the calls to the Melder functions do, outside of interpretation as a general concept.

Regardless, the final syntax check for the loop only appears to occur at 2.2.2., after the entire loop has been run and terminated. Ideally, the syntax would be checked at 1. and then would not need to be checked again.

I don't see a simple change to suggest to have the syntax check occur earlier; any substantive change would require a change in the parsing strategy. Some thoughts on how the syntax check could be implemented:

  1. Do an overall syntax check before executing any line of code. This could be done as part of the scan that finds procedures before executing the rest of the code. In theory, this would make all Praat programs safer since nothing will be executed if there are any syntax errors.
  2. When encountering while (and analogous structures):
    1. Check loop guard and skip over loop if false. (As already performed)
    2. Scan for endwhile. If not found, throw Unmatched 'while'
    3. Otherwise, work backward and collect lines between endwhile and while. Once while is reached, execute the collected lines. Repeat from i.

As an alternative in strategy 2, the lines could be collected as part of the scan to find endwhile, executed after endwhile is found, and then the interpreter moves back up to the while statement.

@PaulBoersma
Copy link
Member

This is all intentional, and meanwhile many scripts will rely on it, e.g. by freely writing text after exit. Taking a one-line-at-a-time approach used to be crucial when the role of variable substitution (i.e. self-modifying code) was stronger. This doesn't really extend to for/endfor, if/else/endif, while/endwhile and repeat/until, though, so a stricter version of the language could already be devised, apart from breaking some unusual scripts. If we don't want to break too much, we could take your version 2 (scanning for block-ends at runtime), but in a lot simpler way (the "collecting lines" is superfluous).

We would still break the following trick, for which I don't know how often people use it (can you guess what it outputs?):

a = 3
b = 0
while a < 10
    a += 1
    if a <= 5
        b = 2*a
endwhile
    endif
writeInfoLine: a, " ", b

We have been thinking about a stricter language, without variable substitution, for some time, the major goal being optimization (by compiling all lines before running the first) rather than syntax checking, but syntax checking would automatically come with it.

@maetshju
Copy link
Author

maetshju commented Mar 4, 2023

(the "collecting lines" is superfluous)

Yes, it is. There are manifold ways a change could be accomplished. The line collection was intended to be as small a modification to the current code as possible since the backward progression is already part of the loop. I personally think it would be preferable to not have to collect the lines, but I am not a dev for this project, so my opinion only matters so much.

(can you guess what it outputs?):

It produces the function of a break statement, taking advantage of how evaluating the if condition to false will advance to the next endif, thereby moving past endwhile. I will say that I have often wanted to use explicit break and continue keywords in Praat scripts. However, I am not interested in writing code where the blocks are not entirely nested within each other to achieve that functionality.

I can only speak for myself (and slightly for some other users I know), but I would have expected that block of code to error out, rather than be an intended programming trick. I also can't think of any programming language I know or have used or studied where such block mixing is possible (even though that lack of imagination is evidence of nothing much but my own lack of exposure).

I recognize, ultimately, that there is no "good" solution to constraints of keeping old scripts runnable while also changing the execution model. I have witnessed too much of the Python 2 to 3 debacle for too long to think that would be easy. I will say, though, that I am hesitant to write non-trivial Praat scripts knowing that the syntax of control structures is not rigorously checked.

@PaulBoersma
Copy link
Member

The example above cannot indeed be seriously recommended. The only common use of variable substitution in control structures is "computed goto", as in

goto 'label$'

This works in Praat, and is used in jump tables, as when you're using the Demo window to create a PowerPoint-like presentation (with Praat capabilities, such as on-the-fly simulations). This run-time computation of where to jump should be kept alive.

For break and continue, both very useful, I would like to have an unconditional version and a conditional version, just as already exists for goto. For instance, one should be able say

if numberOfSamplesLeft = 0
    break
endif

but also

break numberOfSamplesLeft = 0

and

continue vowel$ = ""

or would that be clearer if written as?:

break if numberOfSamplesLeft = 0
continue if vowel$ = ""

@maetshju
Copy link
Author

maetshju commented Mar 7, 2023

I feel a bit more live-and-let-live about goto statements, and I recognize the utility of variable substitution there. I'm generally disinclined from using goto statements (perhaps I was just told too many times never to use them in computer science classes a decade+ ago), but I think preserving that kind of functionality if it could be of use to someone is fine. And, hopefully, anyone using a goto is aware of their reputation.

For the conditional break and continue, I find the versions using if easier to read. It's also somewhat reminiscent of how those statements could look in Python as one-liners, e.g., if number_of_samples_left == 0: break. I wonder, though, if it might be better to have separate keywords breakif and continueif. In particular, I'm wondering if someone may try to use an else clause after break if X. I'm not sure what semantics I would expect from that besides normal if-else operation, and else is superfluous after a break or continue.

There is also possibly an option similar to a common shortcut that is written in Julia, where short-circuiting in Boolean evaluation is taken advantage of to write numberOfSamplesLeft == 0 && break. It could look like if numberOfSamplesLeft = 0 and break in Praat, though perhaps that's too esoteric.

@PaulBoersma
Copy link
Member

PaulBoersma commented Mar 19, 2023

Yes, I had written break_if and continue_if into the previous comment before I decided that there was too much detail there, but these do seem to be clear as well as well as prevent some elses. We don't use underscores much, but I guess they can be used for other compound keywords as well, as for instance for the parallel_for that we are planning, or for error strategies (e.g. min(...) giving an undefined but min_e(...) raising an error, if any argument is not a finite number).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants