Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C grammar, standard C90 #3325

Open
wants to merge 126 commits into
base: master
Choose a base branch
from
Open

C grammar, standard C90 #3325

wants to merge 126 commits into from

Conversation

andr1972
Copy link

@andr1972 andr1972 commented Apr 8, 2023

C Grammar , old C90 standard, tested on 216 includes standard library , mainly from /usr/include, some from /usr/include/x86_64-linux-gnu/bits. Only these headers, which not caused error when GCC was called with "-std=c90 -fsyntax-only -E" to file /dev/shm/file.c, next parsed with Antr4 and C++ runtime.
C99, C11, C17 can be created incrementally, and could be tested on big C repositories.

@teverett teverett added the c label Apr 8, 2023
@andr1972
Copy link
Author

andr1972 commented Apr 8, 2023

FuncCallwithVarArgs.c :
"int a2(int param1, param2)" pass but it is Clang quirk and parser don't distinguish between parameter
without type and named type without paramter.

FunctionReturningPointer.c
My compilers don't compile
int MyStruct *
f3 (
int param1,
char param2
);
where both int and MyStruct are types

pr403.c:
Joining lines at \ is the first preprocessor job,
\ has the highest priority
GCC allows even:
#include <stdio.
h>
But at least 3rd commit give joining if lines with string

@andr1972
Copy link
Author

although I made FunctionReturningPointer.c.errors
I have

========
Parsing tests of c for Java failed.
Difference in output.
diff --git a/c/examples/FunctionReturningPointer.c.errors b/c/examples/FunctionReturningPointer.c.errors
index b2bae35a..5634b8f9 100644
--- a/c/examples/FunctionReturningPointer.c.errors
+++ b/c/examples/FunctionReturningPointer.c.errors
@@ -1 +1 @@
-line 16:13 no viable alternative at input 'intMyStruct*'
\ No newline at end of file
+line 16:13 no viable alternative at input 'intMyStruct*'
Test failed.
========
Error: Process completed with exit code 1.

https://github.com/parstools/grammars-v4/actions/runs/4657790061/jobs/8242732094#step:18:45

@kaby76
Copy link
Contributor

kaby76 commented Apr 10, 2023

No newline at end of file

Every single error message from the parser must end with a newline. You should use trgen to remaster your test files. I don't know why you left the bogus input in the file though.

@kaby76
Copy link
Contributor

kaby76 commented Apr 10, 2023

NB!!

You placed an entirely new grammar for C in the same directory containing an existing grammar. They are not the same and it confuses everyone--and the build tool. The build tool can handle multiple grammars, but you have to program for the two, which you have to do with the desc.xml. At some point, there should be more recent versions of the C grammar based on scraping of the ISO Spec. So, it's best you start the directory structure right now.

The build tool opens every grammar in the directory and parses it. It then looks for the grammar name and the start rule. There are two different grammar names and two start rules.

Please create the following directory structure.

c/
c/c/
c/c90/

Place the existing files before your changes in c/c/. Place your new grammar in c/c90/. Add a desc.xml to c/c90/. Copy the c/c/examples/ to c/c90/examples.

@teverett teverett requested a review from kaby76 April 11, 2023 01:04
Copy link
Contributor

@kaby76 kaby76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • git mv c/desc.xml c/C/desc.xml. A copy of desc.xml is placed in the same directory as the .g4. (There should not be a desc.xml without a .g4; every directory that contains a .g4 needs a desc.xml.)
  • cp c/C/desc.xml grammars-v4/c/C90/desc.xml.
  • In c/C/pom.xml, change <packageName>c</packageName> to <packageName></packageName>. (mvn clean test was failing.)
  • In c/C90/pom.xml, change <packageName>c90</packageName> to <packageName></packageName>. (mvn clean test was failing.)
  • For the new C90 grammar, can we add the tests you used to validate the grammar? We may need to place those in another directory, like "c/C90/validation-large/" (or whatever you think). We can then validate the grammar against this for Java and CSharp since the other targets may be too slow.

By the way, excellent work. Thank you for adding this grammar!

@andr1972
Copy link
Author

andr1972 commented Apr 11, 2023 via email

@kaby76
Copy link
Contributor

kaby76 commented Apr 12, 2023

This looks good @borneq

I tested out the grammar with all the targets, and it looks like Cpp, CSharp, Dart, Java all parse fast. The Go target works fast too, but only on the most recent dev fixes, which we're not testing yet. But, of the targets mentioned, all take 3-6 s. Dart is the fastest at 3s, CSharp and Java slowest of these 5 targets. (My machine a standard AMD Ryzen 7 2700, 16G DDR4 1311 MHz, SSD, B450 motherboard. Single test from command-line, i.e., not a statistical analysis.)

@borneq Let's add in two test scenarios, one for short tests that all targets can pass, and the longer test scenario for the "fast" targets. Please update the c/C90/desc.xml file as follows:

<?xml version="1.0" encoding="UTF-8" ?>
<desc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../_scripts/desc.xsd">
   <targets>CSharp;Cpp;Dart;Go;Java;JavaScript;PHP;Python3;TypeScript</targets>
   <test>
      <name>long</name>
      <targets>Cpp;CSharp;Dart;Java</targets>
   </test>
   <test>
      <name>short</name>
      <targets>CSharp;Cpp;Dart;Go;Java;JavaScript;PHP;Python3;TypeScript</targets>
   </test>      
</desc>

Copy link
Contributor

@kaby76 kaby76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@teverett This PR is looks good.

…nd statement; simplification and fix - example
@KvanTTT KvanTTT added the example New example of file(s) parsed by grammar-generated parser label Apr 12, 2023
@kaby76
Copy link
Contributor

kaby76 commented Apr 20, 2023

If you aren't doing so now, please use the Antlr Intellij profiler to find problems in the grammar. If you decide to add symbol table semantics, you'll need to use other tools. I would first sort the profiler table by "Ambiguities", and try to eliminate those problems.

Here are a couple of obvious problems:

You might want to add an EOF-start rule for typeName (e.g., typeNameStart: typeName (';' typeName)* EOF;) and test the rule on a large number of input in order to clean up the rule.

@andr1972
Copy link
Author

andr1972 commented Apr 20, 2023

It is many GCC extension, is not pure C90. Maybe rename C90 to GCC90 and later add second grammar C90 or IsoC90 with removing extensions. (GCC compile with -pedantic-errors) Will 2x faster if remove K&R style function definitions, but ISO C90 don't parse validation-large/std90.c and other preprocessed GCC includes.
K&R is slow because inside definition are semicolons and parser must have very long lookahead.
Second proposal: after testing all examples, remove K&R style. GCC extension without K&R style enable parsing preprocessed GCC includes but not some odd GCC examples.

@andr1972
Copy link
Author

I will correct typeName: simple generator od type name generates enormous deep lookahead

@andr1972
Copy link
Author

andr1972 commented Apr 22, 2023

I change intTypeName to can be any combinations of components

intTypeName
    : ('int'| complex | signedUnsigned | longShort)+
    ;

floatTypeName is specified

floatTypeName
    : 'float'
    | 'double'
    | 'long' 'double'
    | 'double' 'long'
....other

If I test file

void fun() {
    long double f1;
    long double f2;    
    long double f3;
...etc

I obtain unlimited maximal k for lookahead.
prediction is from "long double f1" to "}", although now typeName is unambiguous,
"long" can be start type "long int" or alone "long" type.
But if next is "double" why predict further, especially if next is ';' ?
what next: I will test minimal grammar with only typenames and some other rules to see what is cause it.

I found two problems:

  1. K&R style function definition caused problems for all grammar, it can be removed, only ancient compatibility uses it
  2. general declarator for both functions and variables is used also for function definition

@andr1972
Copy link
Author

Syntax distinguish between variable and function declaration, speedup.
K&R style slows down only a bit, negligibly.

@andr1972
Copy link
Author

disambiguation, speedup twice on test file

@andr1972
Copy link
Author

Simple expressions like "n = 0" have deep expression chain, depth 20.
I try change expression chain to grammar expr: expr op expr with alternatives precedence like is Rust Antlr grammar, but it slow down by orders of magnitude.

@andr1972
Copy link
Author

Is possible solve conflict by me and synchronize repositories or I must close this and create new pull request?
By way, creating new pull request will be desired, because my c90.g4 grammar has gnu extensions, should remove this extensions and maybe rename it to to c90gnu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c example New example of file(s) parsed by grammar-generated parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants