Update grammar to accept user defined literal suffixes #14054

matheusaaguiar · 2023-03-16T19:05:50Z

Task of #12656.

matheusaaguiar · 2023-03-16T19:29:05Z

AFAIU, whitespaces are ignored in the lexer and then values like the one below (from a failing test) are parsed by antlr with no errors:

solidity/test/libsolidity/syntaxTests/literalSuffixes/application/invalid_suffix_no_whitespace.sol

Line 4 in ce63256

uint x = 1000suffix;

The same already happened before for denominations, but we don't seem to have any tests for this case. The following will also be parsed successfully by antlr:

contract C {
 uint x = 1000gwei;
}

docs/grammar/SolidityParser.g4

cameel · 2023-03-20T10:28:26Z

AFAIU, whitespaces are ignored in the lexer and then values like the one below (from a failing test) are parsed by antlr with no errors

I see we have a WS token that represents whitespace. Try putting it before identifier and see if that makes ANTLR require whitespace there.

Also, please add a syntax test for each of the whitespace characters that is allowed there - mine only has the space.

The same already happened before for denominations, but we don't seem to have any tests for this case.

Good find. We should also do the same for subdenominations: fix the grammar and add syntax tests.

docs/grammar/SolidityParser.g4

matheusaaguiar · 2023-03-21T03:45:40Z

I see we have a WS token that represents whitespace. Try putting it before identifier and see if that makes ANTLR require whitespace there.

I tried it but doesn't work.
In the previous message, I should have explained better that the WS token defined in the lexer recognizes whitespaces and then discards them using the skip command. So, when the parser gets the tokens, none of them are WS and thus it never chooses suffixedLiteral or numberLiteralWithSubdenomination rules.

I also found a problem with the functionDefinition rule. Since it can recognize multiple modifierInvocation and suffix is not a language keyword, the antlr parser will fail to error out a function with repeated suffix specifiers:

function suffix1(uint) pure suffix suffix returns (uint) {}
// ----
// ParserError 2878: (82-88): Suffix already specified.

cameel · 2023-03-21T14:14:59Z

I also found a problem with the functionDefinition rule. Since it can recognize multiple modifierInvocation and suffix is not a language keyword, the antlr parser will fail to error out a function with repeated suffix specifiers:

Hmm... Indeed looks like a problem. Actually, why virtual and override do not cause similar problems? The current grammar allows them on free function definitions. Do we just have no syntax test that tries to put virtual on a free function?

But anyway, one way out would be to split functionDefinition into freeFunctionDefinition and contractFunctionDefinition. It actually would not be that bad because it would make the grammar more precise and freeFunctionDefinition would be quite simple because it does not allow a lot of stuff that's allowed in contract functions.

Another one would be to allow only either modifierInvocation or Suffix. But I suspect that the only nice way to achieve that in ANTLR may be the split I mentioned above.

cameel · 2023-03-21T14:35:53Z

I tried it but doesn't work.

ok, so here's another idea. Looks like our scanner actually properly parses 1000suffix as a number followed by a literal and just has an extra validation that rejects it:

solidity/liblangutil/Scanner.cpp

Lines 1004 to 1009 in 2ca349c

    
           // The source character immediately following a numeric literal must 
        
           // not be an identifier start or a decimal digit; see ECMA-262 
        
           // section 7.8.3, page 17 (note that we read only one decimal digit 
        
           // if the value is 0). 
        
           if (isDecimalDigit(m_char) || isIdentifierStart(m_char)) 
        
           	return setError(ScannerError::IllegalNumberEnd);

We could simulate this by recognizing a number followed by literal as its own thing. Try to define DecimalNumberFollowedByIdentifier, which can basically consists of chars allowed in DecimalNumber, and then chars allowed in identifier. Just defining such a rule should make ANTLR recognize that 1000suffix is not a decimal number.

Also, to make sure it works, please add test cases for hex and fractional numbers (I suspect you'll need separate rules to handle them): 0x1000suffix, 1000.0suffix 1000.0e-5suffix and 0x1000abcdefgh.

cameel · 2023-03-21T14:38:47Z

Also, these grammar changes (i.e. split into free/contract function, number+identifier, renaming of NumberUnit) are quite general and do not depend on suffixes. Once you get them to pass all tests, I'd suggest to move them (along with relevant test cases) to another PR, straight to develop and here leave only changes that introduce suffix. We can get them merged separately and keep the main suffix PR smaller that way.

docs/grammar/SolidityParser.g4

test/libsolidity/syntaxTests/literalSuffixes/application/invalid_suffix_denomination.sol

matheusaaguiar · 2023-03-21T18:08:28Z

Actually, why virtual and override do not cause similar problems? The current grammar allows them on free function definitions. Do we just have no syntax test that tries to put virtual on a free function?

We do have tests for that, but they generate SyntaxError rather than ParserError, which is the only category test_antlr_grammar.sh looks for in order to detect divergence with solc parser.

solidity/scripts/test_antlr_grammar.sh

Lines 83 to 88 in 2ca349c

    
           if grep -qE "^\/\/ ParserError" "${SOL_FILE}"; then 
        
             if [[ "${output}" != "" ]] 
        
             then 
        
               echo -e "${SGR_BLUE}[${cur}/${max}] Testing ${SOL_FILE}${SGR_RESET} ${SGR_BOLD}${SGR_GREEN}FAILED AS EXPECTED${SGR_RESET}" 
        
             else 
        
               echo -e "${SGR_BLUE}[${cur}/${max}] Testing ${SOL_FILE}${SGR_RESET} ${SGR_BOLD}${SGR_RED}SUCCEEDED DESPITE PARSER ERROR${SGR_RESET}"

But anyway, one way out would be to split functionDefinition into freeFunctionDefinition and contractFunctionDefinition.

Ok, agreed.

cameel · 2023-03-21T20:02:16Z

We do have tests for that, but they generate SyntaxError rather than ParserError, which is the only category test_antlr_grammar.sh looks for in order to detect divergence with solc parser.

Interesting. Well, then maybe working around it test_antlr_grammar.sh in some way would not be out of the question if adjusting it turns out to be too complicated. But first let's try to do it the nice way. We should be able to match the grammar.

cameel

ok, looks good now apart from some minor tweaks.

Please now extract the generic part into a PR on develop as I suggested in an earlier comment (note that this will require changing some tests to use denominations rather than suffixes).

test/libsolidity/syntaxTests/invalid_octal_number.sol

test/libsolidity/syntaxTests/invalid_hex_number.sol

docs/grammar/SolidityParser.g4

cameel · 2023-03-22T12:26:25Z

By the way, please remember to mark PRs as reviewable once you're done implementing them.

cameel · 2023-03-24T18:46:31Z

I rebased the suffix PR on #14066, so if you rebase this one on the suffix PR, we can now continue review here.

cameel · 2023-03-24T20:27:08Z

docs/grammar/SolidityParser.g4

+ 	(
+		{!$mutabilitySet}? stateMutability {$mutabilitySet = true;}
+		| {!$suffixSpecifierSet}? Suffix {$suffixSpecifierSet = true;}
+ 	 )*


With just mutability and suffix we could still do it without variables and keep it simple:

Suggested change

(

{!$mutabilitySet}? stateMutability {$mutabilitySet = true;}

| {!$suffixSpecifierSet}? Suffix {$suffixSpecifierSet = true;}

)*

stateMutability Suffix?

| Suffix stateMutability?

cameel · 2023-03-24T20:28:40Z

docs/grammar/SolidityParser.g4


 literal: stringLiteral | numberLiteral | booleanLiteral | hexStringLiteral | unicodeStringLiteral;

 literalWithSubDenomination: numberLiteral SubDenomination;

+suffixedLiteral: literal identifier;


Actually, just realized that this is not entirely correct. It does not have to be a single literal. Can be member access as well.

Not sure if I follow here. Should we be doing something like this?

Suggested change

suffixedLiteral: literal identifier;

memberAcess: identifier (Period identifier)+;

suffixedLiteral: (literal | memberAcess) identifier;

What would be a member access that is equivalent to a literal?

I mean that something like 123 M.suffix is allowed because you can do import "A.sol" as M and A.sol can have suffix defined in it. identifier alone does not allow it.

Not sure if we want to define memberAccess explicitly though. I see that inside expression that's defined in place like this:

expression Period (identifier | Address) # MemberAccess

So I guess the solution is either to repeat that for suffixedLiteral or indeed define it as memberAccess.

Ah, ok, understood it now. I was thinking about the literal, not the identifier.
We have a test with suffix function imported from other file:

solidity/test/libsolidity/syntaxTests/literalSuffixes/usableAsSuffix/imported_function_as_suffix.sol

Lines 6 to 14 in 65188ca

import "A.sol" as A;

import "B.sol" as B;

contract C {

function f() pure public {

1 s;

2 z;

3 A.s;

4 B.B.B.A.s;

I have removed the failing test in chk_antlr_grammar before running it locally and then got no errors, so I think that the current grammar already covers that somehow, but not sure.

I think that the current grammar already covers that somehow, but not sure.

But how? It can't be. expression does account for that explicitly so the literals should have to as well.

cameel · 2023-03-27T17:57:46Z

docs/grammar/SolidityLexer.g4

 Ufixed: 'ufixed' | ('ufixed' [1-9][0-9]+ 'x' [1-9][0-9]+);
 Unchecked: 'unchecked';
+Unicode: 'unicode';
 /**


Oh, so a keyword was missing here. This is a general change that should go straight to develop.

With a test case preferably.

Ok. Created #14078 .

cameel · 2023-03-27T18:01:21Z

docs/grammar/SolidityParser.g4


 literal: stringLiteral | numberLiteral | booleanLiteral | hexStringLiteral | unicodeStringLiteral;

 literalWithSubDenomination: numberLiteral SubDenomination;

+suffixedLiteral: literal identifier;


I think that the current grammar already covers that somehow, but not sure.

But how? It can't be. expression does account for that explicitly so the literals should have to as well.

matheusaaguiar · 2023-03-28T14:23:12Z

But how? It can't be. expression does account for that explicitly so the literals should have to as well.

Yeah, I am not sure why it works. I will investigate it.

cameel · 2023-04-03T10:57:36Z

Yeah, I am not sure why it works. I will investigate it.

So, did you figure it out?

matheusaaguiar · 2023-04-03T14:18:01Z

So, did you figure it out?

Not yet. I will prioritize it though.

matheusaaguiar · 2023-04-03T19:38:16Z

I checked the test_antlr_grammar script and found out that tests that are multi-source files like the one I mentioned before are excluded before actually running the ANTLR parser.

solidity/test/libsolidity/syntaxTests/literalSuffixes/usableAsSuffix/imported_function_as_suffix.sol

Lines 1 to 5 in 65188ca

    
           ==== Source: A.sol ==== 
        
           function s(uint) pure suffix returns (uint) { return 1; } 
        
           ==== Source: B.sol ==== 
        
           import {s} from "A.sol"; 
        
           import {s as z} from "A.sol";

Still, seems like the parser accepts member access as literal suffixes...

cameel · 2023-04-03T21:09:45Z

Still, seems like the parser accepts member access as literal suffixes...

You mean ANTLR or solc parser? Because the latter obviously does, but ANTLR should too.

In any case, this explains why it did not fail. This means that the grammar is really incomplete, we just can't properly test it. It needs to be updated to cover this case.

matheusaaguiar · 2023-04-03T21:51:57Z

You mean ANTLR or solc parser?

I meant ANTLR parser. I just removed the ==== Source ... separators from the file and then let the script apply the ANTLR parser on it, which reported success.

It needs to be updated to cover this case.

Are we going to do this in the context of the literal suffix PR or is it a task we leave for later?

cameel · 2023-04-03T21:59:50Z

Are we going to do this in the context of the literal suffix PR or is it a task we leave for later?

In this PR. Without it the grammar is incomplete.

I just removed the ==== Source ... separators from the file and then let the script apply the ANTLR parser on it, which reported success.

This sounds weird. Is it only for suffixes or does it let you use dotted paths in other places that grammar does not officially include?

matheusaaguiar · 2023-04-03T23:13:59Z

I checked with the ANTLR test tool using the options -tokens and -gui (manually typing in CLI). When parsing expression uint x = 1 A.b.c; it produces the following output:

[@0,0:3='uint',<UnsignedIntegerType>,1:0]
[@1,5:12='constant',<'constant'>,1:5]
[@2,14:14='x',<Identifier>,1:14]
[@3,16:16='=',<'='>,1:16]
[@4,18:18='1',<DecimalNumber>,1:18]
[@5,20:20='A',<Identifier>,1:20]
[@6,21:21='.',<Period>,1:21]
[@7,22:22='b',<Identifier>,1:22]
[@8,23:23='.',<Period>,1:23]
[@9,24:24='c',<Identifier>,1:24]
[@10,25:25=';',<Semicolon>,1:25]
[@11,27:26='<EOF>',<EOF>,2:0]

The visualization tool produces this tree:

It first recognizes the value of the assignment as a member access, and then decomposes the expression before the . as either another member access or finally as a suffixedLiteral.

matheusaaguiar · 2023-04-03T23:15:13Z

This sounds weird. Is it only for suffixes or does it let you use dotted paths in other places that grammar does not officially include?

I am trying to find if there's other cases.

cameel · 2023-04-04T10:57:24Z

Oh, that's very helpful. I see what's happening there.

Basically, ANTLR sees it as if it was uint x = (1 A).b.c;, which is already a valid expression. solc parser on the other hand interprets it as uint x = 1 (A.b.c);. We do need an extra rule to get that second interpretation.

cameel · 2023-04-04T15:25:29Z

I pushed a new version of the base PR. Please rebase.

matheusaaguiar · 2023-04-05T14:11:52Z

scripts/test_antlr_grammar.sh

@@ -129,7 +129,8 @@ done < <(
      grep -v -E 'license/license_hidden_unicode.sol' |
      grep -v -E 'license/license_unicode.sol' |
      # Skipping tests with 'something.address' as 'address' as the grammar fails on those
-      grep -v -E 'inlineAssembly/external_function_pointer_address.*.sol'
+      grep -v -E 'inlineAssembly/external_function_pointer_address.*.sol' |
+      grep -v -E 'literalSuffixes/application/invalid_address_member_on_suffix.sol'


I added this because I noticed we already had a workaround for the case.

cameel

ok, looks good now!

matheusaaguiar added roadmap has dependencies The PR depends on other PRs that must be merged first labels Mar 16, 2023

matheusaaguiar self-assigned this Mar 16, 2023

cameel requested changes Mar 20, 2023

View reviewed changes

docs/grammar/SolidityParser.g4 Outdated Show resolved Hide resolved

docs/grammar/SolidityParser.g4 Outdated Show resolved Hide resolved

cameel reviewed Mar 20, 2023

View reviewed changes

docs/grammar/SolidityParser.g4 Outdated Show resolved Hide resolved

matheusaaguiar force-pushed the literal_suffix_functions_update_grammar branch from 7ef6987 to 4af093a Compare March 21, 2023 03:22

cameel reviewed Mar 21, 2023

View reviewed changes

docs/grammar/SolidityParser.g4 Outdated Show resolved Hide resolved

docs/grammar/SolidityParser.g4 Outdated Show resolved Hide resolved

test/libsolidity/syntaxTests/literalSuffixes/application/invalid_suffix_denomination.sol Show resolved Hide resolved

cameel reviewed Mar 22, 2023

View reviewed changes

test/libsolidity/syntaxTests/invalid_octal_number.sol Outdated Show resolved Hide resolved

test/libsolidity/syntaxTests/invalid_hex_number.sol Outdated Show resolved Hide resolved

docs/grammar/SolidityParser.g4 Outdated Show resolved Hide resolved

cameel force-pushed the literal_suffix_functions branch from ce63256 to 4adfb76 Compare March 22, 2023 12:20

cameel mentioned this pull request Mar 22, 2023

User defined operators/literals #13718

Closed

5 tasks

matheusaaguiar mentioned this pull request Mar 22, 2023

Update grammar functionDefinition parser rule and rename NumberUnit lexer rule #14066

Merged

matheusaaguiar force-pushed the literal_suffix_functions_update_grammar branch 2 times, most recently from d3b8067 to d133142 Compare March 24, 2023 16:00

cameel force-pushed the literal_suffix_functions branch from 11e308c to c4aee73 Compare March 24, 2023 18:44

matheusaaguiar force-pushed the literal_suffix_functions_update_grammar branch from d133142 to f54ddfc Compare March 24, 2023 20:20

cameel requested changes Mar 24, 2023

View reviewed changes

cameel reviewed Mar 27, 2023

View reviewed changes

matheusaaguiar mentioned this pull request Mar 28, 2023

Fix missing keyword unicode in grammar #14078

Merged

cameel force-pushed the literal_suffix_functions branch from c4aee73 to 559d62f Compare April 4, 2023 15:24

cameel force-pushed the literal_suffix_functions branch from 559d62f to f229c36 Compare April 4, 2023 15:31

matheusaaguiar force-pushed the literal_suffix_functions_update_grammar branch from b21d2f9 to 1dc56e6 Compare April 4, 2023 15:46

matheusaaguiar added 5 commits April 5, 2023 10:42

Updated grammar with suffix specifier for free functions

bec531a

Added rule for suffixed literal

46ca718

Added missing tests

93d456f

Simplified freeFunctionDefinition

c65d15a

Adjusted suffixedLiteral to account for member access as suffix

a1059cf

matheusaaguiar force-pushed the literal_suffix_functions_update_grammar branch from 1dc56e6 to a1059cf Compare April 5, 2023 13:45

Added workaround for test which grammar fails to parse

3210232

matheusaaguiar commented Apr 5, 2023

View reviewed changes

cameel approved these changes Apr 5, 2023

View reviewed changes

cameel marked this pull request as ready for review April 5, 2023 15:36

cameel merged commit 67ca210 into literal_suffix_functions Apr 5, 2023
1 check passed

cameel deleted the literal_suffix_functions_update_grammar branch April 5, 2023 15:36

cameel mentioned this pull request Apr 17, 2023

User-defined literal suffixes. #12656

Closed

52 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update grammar to accept user defined literal suffixes #14054

Update grammar to accept user defined literal suffixes #14054

matheusaaguiar commented Mar 16, 2023

matheusaaguiar commented Mar 16, 2023 •

edited

cameel commented Mar 20, 2023

matheusaaguiar commented Mar 21, 2023

cameel commented Mar 21, 2023 •

edited

cameel commented Mar 21, 2023

cameel commented Mar 21, 2023 •

edited

matheusaaguiar commented Mar 21, 2023

cameel commented Mar 21, 2023

cameel left a comment

cameel commented Mar 22, 2023 •

edited

cameel commented Mar 24, 2023

cameel Mar 24, 2023

cameel Mar 24, 2023

matheusaaguiar Mar 24, 2023

cameel Mar 25, 2023 •

edited

matheusaaguiar Mar 25, 2023

cameel Mar 27, 2023

cameel Mar 27, 2023

matheusaaguiar Mar 28, 2023

cameel Mar 27, 2023

matheusaaguiar commented Mar 28, 2023

cameel commented Apr 3, 2023

matheusaaguiar commented Apr 3, 2023

matheusaaguiar commented Apr 3, 2023 •

edited

cameel commented Apr 3, 2023

matheusaaguiar commented Apr 3, 2023

cameel commented Apr 3, 2023

matheusaaguiar commented Apr 3, 2023

matheusaaguiar commented Apr 3, 2023

cameel commented Apr 4, 2023 •

edited

cameel commented Apr 4, 2023

matheusaaguiar Apr 5, 2023

cameel left a comment

-suffixedLiteral: literal identifier;
+memberAcess: identifier (Period identifier)+;
+suffixedLiteral: (literal | memberAcess) identifier;

	import "A.sol" as A;
	import "B.sol" as B;

	contract C {
	function f() pure public {
	1 s;
	2 z;
	3 A.s;
	4 B.B.B.A.s;

Update grammar to accept user defined literal suffixes #14054

Update grammar to accept user defined literal suffixes #14054

Conversation

matheusaaguiar commented Mar 16, 2023

matheusaaguiar commented Mar 16, 2023 • edited

cameel commented Mar 20, 2023

matheusaaguiar commented Mar 21, 2023

cameel commented Mar 21, 2023 • edited

cameel commented Mar 21, 2023

cameel commented Mar 21, 2023 • edited

matheusaaguiar commented Mar 21, 2023

cameel commented Mar 21, 2023

cameel left a comment

Choose a reason for hiding this comment

cameel commented Mar 22, 2023 • edited

cameel commented Mar 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cameel Mar 25, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matheusaaguiar commented Mar 28, 2023

cameel commented Apr 3, 2023

matheusaaguiar commented Apr 3, 2023

matheusaaguiar commented Apr 3, 2023 • edited

cameel commented Apr 3, 2023

matheusaaguiar commented Apr 3, 2023

cameel commented Apr 3, 2023

matheusaaguiar commented Apr 3, 2023

matheusaaguiar commented Apr 3, 2023

cameel commented Apr 4, 2023 • edited

cameel commented Apr 4, 2023

Choose a reason for hiding this comment

cameel left a comment

Choose a reason for hiding this comment

matheusaaguiar commented Mar 16, 2023 •

edited

cameel commented Mar 21, 2023 •

edited

cameel commented Mar 21, 2023 •

edited

cameel commented Mar 22, 2023 •

edited

cameel Mar 25, 2023 •

edited

matheusaaguiar commented Apr 3, 2023 •

edited

cameel commented Apr 4, 2023 •

edited