Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include list of reserved identifiers (keywords) in the written spec #524

Open
jlapeyre opened this issue Mar 25, 2024 · 4 comments
Open

Comments

@jlapeyre
Copy link
Contributor

It might be useful to maintain in the spec a list of reserved words and/or keywords. And to maintain some consistent language for talking about them. Poking around the internet it looks like there is some variation in the language used

The spec uses the word "shadow" for an identifier in an inner scope shadowing one in an outer scope. It also says

Identifiers may not override a reserved identifier.

I think the meaning is clear enough. But better might be something like

 Following is the list of reserved words. These words may not be used as identifiers in OpenQASM 3 programs:
  `gate`, `def`, etc.

  The following words have a special meaning in certain contexts. Outside of
  these contexts, they may be used as identifiers in OpenQASM 3 programs:
   `im`, `s`, `ns`, etc.

But the last statement might not be what we want.

Timing and imaginary suffixes

Doing things like like making the meaning of s context dependent, and allowing both 10ns and 10 ns makes reasoning about the language and implementing it more difficult. There are advantages I suppose. In any case, we are stuck with this for the moment.

What kind of language elements are im, s, ms, etc?

Making s a reserved word is not intended I suppose. Are any of the others reserved?

Python: My best understanding is that imaginary literals are a kind of token. They represent a value of type complex. Since no intervening space is allowed before the j in 123.0j, this requires no special machinery. A sequence of non-whitespace characters that look like a numeric literal, but have suffix j are an imaginary-literal token. I think this would be a great fit for OQ3. But we must allow spaces.

Julia: im is a global constant. Its value is Complex{Bool}(false, true). Julia allows juxtaposition for multiplication if the first operand is a numeric literal. So both 3im and 3 * im work. But not 3 im. Like any scoped identifier, it can be shadowed by a local variable. Making im have a value of type Complex{Bool} has worked well. However, the juxtaposition rule was added for convenience, but causes headaches. We could perhaps use the same design for OQ3. But I don't like the idea of introducing the rule for juxtaposition, which evidently would be necessary.

Current implementation in openqasm3_parser

The lexer normally eats any suffix on a numeric literal, for example e-3, and it becomes part of the value of the token. But because 1.23 im and 1.23im mean the same thing, we make a special search for imaginary and timing suffixes and do not collect them as part of the token. We leave them on the stream where they will be lexed as an ordinary identifier.

Later, we add syntax tags to the stream of tokens to create a concrete syntax tree. The combination of a numeric literal followed by an identifier is tagged as a timing-or-imaginary literal. At a later stage, these suffixes are validated.

@jlapeyre
Copy link
Contributor Author

jlapeyre commented Apr 3, 2024

This seems like a case where the effort to write the proposed text is rather small. This would help in deciding if it's a good idea. I can do this.

@jakelishman
Copy link
Contributor

s, im and the rest of the duration units aren't implemented (in any implementation I know of) as separate tokens - they're suffixes that form part of the tokenisation rules, and so don't cause identifier problems (s is the obvious one, since that's a gate in stdlib.inc). They're resolved in the ANTLR lexer by maximal munch - just like 12 is tokenised as one integer rather than [1, 2], 12s is tokenised as one duration and not [12, s].

im is super messy because the text at least historically (not sure if it still does) implied that could use im with actual expressions, in which case it would need to be a separate token, probably a lot like how Julia does it, and that might imply implicit multiplication by juxtaposition which probably isn't ideal for a language like OpenQASM 3.

I think it's a good idea to produce a list of the actually reserved keywords for sure.

@jlapeyre
Copy link
Contributor Author

jlapeyre commented Apr 3, 2024

hmm. wait a minute. I don't see any examples in the spec of spaces between digits and time units in timing literals. For some reason, I thought this was allowed, which spurred me to write this issue The spec should be cleaned up in this respect.... perhaps define "suffix" in the context of literals. Then say "e+3" is a suffix, and "ns" is a suffix, etc.

In any case, this makes the question of timing literals easier. The suffixes are just part of the token, as you say. In the new parser, I currently allow spaces, which adds some clumsiness and complexity.

But the spec does show examples of spaces before im in what would be an imaginary literal if there were no space. I suppose it's too late to make im strictly a suffix, so that it becomes part of the token, as in Python.

@jakelishman
Copy link
Contributor

The ANTLR lexer's rule for timing literals is

TimingLiteral: (DecimalIntegerLiteral | FloatLiteral) [ \t]* TimeUnit;

as well - I think it's ok as an idea to permit spaces, and I'd be a little surprised if it were particularly difficult to implement that in a lexer. The main example I'm thinking of is that floating-point values with an exponential might well want to put a space between the exponent and the time unit to make things clearer (1.0e-3 s instead of 1.0e-3s).

I have no strong feelings, really.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants