Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird behavior #209

Open
Victorious3 opened this issue May 14, 2021 · 7 comments
Open

Weird behavior #209

Victorious3 opened this issue May 14, 2021 · 7 comments

Comments

@Victorious3
Copy link
Collaborator

I have the following grammar for a small parser that parses C types:

function_decl = ret: type_1 '(' args: ','.{ type_1 | varargs } ')';

sign = 'signed' | 'unsigned';
modifier = 'long' 'long' @:`'llong'` | 'long' | 'short';
specifier = 'int' | 'char' | 'float' | 'double' | '__int128';

identifier::Identifier      = /(?!\d)\w+/;
tagged_idenitifier::Tagged  = ('struct' | 'union' | 'enum') identifier;
primitive::Primitive        = [sign] [modifier] specifier | [sign] modifier | sign;
pointer::Pointer            = type: type '*' ['const'] ['volatile'] ['restrict'];
array::Array                = type: type '[' size: [/\d+/] ']';
function::Function          = ret: type '(' '*' ')' '(' args: ','.{ type_1 | varargs } ')';

void::Void = 'void';
varargs::Varargs = '...';

type = pointer | function | array | void | primitive | tagged_idenitifier | identifier;
type_1 = ['const'] ['volatile'] @:type;

This fails for the input

void (*)(int, siginfo_t *, void*)

Looking at the trace I see this here:

≡type↙function↙type↙type_1 ~1:5
 (*)(int, siginfo_t *, void*)
≢'('

For some reason it doesn't seem to recognize the open paren.
I have a workaround which looks as follows, changing the relevant line to:

    function::Function = ret: type / \(\*\)/ '(' args: ','.{ type_1 | varargs } ')';
                                   ~~~~~~~~

Notice that the space in the regex is significant.

Now the trace looks as follows:

≡type↙function↙type↙type_1 ~1:5
 (*)(int, siginfo_t *, void*)
≡' (*)' / \(\*\)/

This might be a problem with leftrec, I'm not entirely sure. I'll work on finding a minimal example. I remember that I had to change some strings to regexes in my other project as well to work around potentially the same bug.

@apalala
Copy link
Collaborator

apalala commented Jul 9, 2022

Can you revisit this, @Victorious3 ?

@Victorious3
Copy link
Collaborator Author

I'll see what I can do, I'm just really busy with university at this point.

@Victorious3
Copy link
Collaborator Author

Victorious3 commented May 10, 2023

I have come to revisit this and I have an updated sample.,.. It seems like left recursion still has some problems that need to be addressed.

´sign = 'signed' | 'unsigned';
modifier = 'long' 'long' @:`'llong'` | 'long' | 'short';
specifier = 'int' | 'char' | 'float' | 'double' | '__int128' | '_Bool';
winptr = '__ptr32' | '__sptr' | '__uptr';

identifier::Identifier      = /(?!\d)\w+/;
tagged_idenitifier::Tagged  = ('struct' | 'union' | 'enum') identifier;
primitive::Primitive        = [sign] [modifier] specifier | [sign] modifier | sign;
pointer_suffix              = '*' {winptr | 'const' | 'volatile' | 'restrict'};
array_suffix                = /\[/ size: [/\d+/] ']';
pointer::Pointer            = type: type >pointer_suffix;
array::Array                = type: type >array_suffix;
function::Function          = ret: type / \(/ ~ inner: {array_suffix | pointer_suffix} ')' '(' args: ','.{ start | varargs } ')';

void::Void = 'void';
varargs::Varargs = '...';

type = pointer | array | function | void | primitive | tagged_idenitifier | identifier;
start = {winptr | 'const' | 'volatile' | '__unaligned'} @:type;
Trace for input bfd *
↙start ~1:1
bfd *
↙winptr↙start ~1:1
bfd *
≢'__ptr32'
bfd *
≢'__sptr'
bfd *
≢'__uptr'
bfd *
≢winptr↙start ~1:1
bfd *
≢'const'
bfd *
≢'volatile'
bfd *
≢'__unaligned'
bfd *
↙winptr↙start ~1:1
bfd *
≢winptr↙start ~1:1
bfd *
≢'const'
bfd *
≢'volatile'
bfd *
≢'__unaligned'
bfd *
↙type↙start ~1:1
bfd *
↙pointer↙type↙start ~1:1
bfd *
↙type↙pointer↙type↙start ~1:1
bfd *
⟲ type↙pointer↙type↙start ~1:1
bfd *
⟲ pointer↙type↙start ~1:1
bfd *
↙array↙type↙start ~1:1
bfd *
↙type↙array↙type↙start ~1:1
bfd *
⟲ type↙array↙type↙start ~1:1
bfd *
⟲ array↙type↙start ~1:1
bfd *
↙function↙type↙start ~1:1
bfd *
↙type↙function↙type↙start ~1:1
bfd *
⟲ type↙function↙type↙start ~1:1
bfd *
⟲ function↙type↙start ~1:1
bfd *
↙void↙type↙start ~1:1
bfd *
≢'void'
bfd *
≢void↙type↙start ~1:1
bfd *
↙primitive↙type↙start ~1:1
bfd *
↙sign↙primitive↙type↙start ~1:1
bfd *
≢'signed'
bfd *
≢'unsigned'
bfd *
≢sign↙primitive↙type↙start ~1:1
bfd *
↙modifier↙primitive↙type↙start ~1:1
bfd *
≢'long'
bfd *
≢'long'
bfd *
≢'short'
bfd *
≢modifier↙primitive↙type↙start ~1:1
bfd *
↙specifier↙primitive↙type↙start ~1:1
bfd *
≢'int'
bfd *
≢'char'
bfd *
≢'float'
bfd *
≢'double'
bfd *
≢'__int128'
bfd *
≢'_Bool'
bfd *
≢specifier↙primitive↙type↙start ~1:1
bfd *
↙sign↙primitive↙type↙start ~1:1
bfd *
≢sign↙primitive↙type↙start ~1:1
bfd *
↙modifier↙primitive↙type↙start ~1:1
bfd *
≢modifier↙primitive↙type↙start ~1:1
bfd *
↙sign↙primitive↙type↙start ~1:1
bfd *
≢sign↙primitive↙type↙start ~1:1
bfd *
≢primitive↙type↙start ~1:1
bfd *
↙tagged_idenitifier↙type↙start ~1:1
bfd *
≢'struct'
bfd *
≢'union'
bfd *
≢'enum'
bfd *
≢tagged_idenitifier↙type↙start ~1:1
bfd *
↙identifier↙type↙start ~1:1
bfd *
≡'bfd' /(?!\d)\w+/
 *
≡identifier↙type↙start ~1:4
 *
↙pointer↙type↙start ~1:1
bfd *
⟲ pointer↙type↙start ~1:1
bfd *
↙array↙type↙start ~1:1
bfd *
↙type↙array↙type↙start ~1:1
bfd *
≡type↙array↙type↙start ~1:4
 *
≢'' /\[/
 *
↙function↙type↙start ~1:1
bfd *
↙type↙function↙type↙start ~1:1
bfd *
≡type↙function↙type↙start ~1:4
 *
≢'' / \(/
 *
↙void↙type↙start ~1:1
bfd *
≢void↙type↙start ~1:1
bfd *
↙primitive↙type↙start ~1:1
bfd *
≢primitive↙type↙start ~1:1
bfd *
↙tagged_idenitifier↙type↙start ~1:1
bfd *
≢tagged_idenitifier↙type↙start ~1:1
bfd *
↙identifier↙type↙start ~1:1
bfd *
≡identifier↙type↙start ~1:4
 *
≡type↙start ~1:4
 *
≡start ~1:4
 *
I initially made some changes to be able to parse code like bfd_boolean (*[4])(bfd *) but it seems to fail for more simple cases as well.

For some reason it recurses into pointer and then never considers that rule again?

@apalala
Copy link
Collaborator

apalala commented May 11, 2023

I can't see where the problem is off the top of my head.

It could be that we need to implement leftrec using what @gvanrossum used for Python?

@apalala
Copy link
Collaborator

apalala commented Aug 24, 2023

Once again, @Victorious3 ?

@Victorious3
Copy link
Collaborator Author

Sorry, I have a new job and I don't have much time for personal projects at the moment.
I don't know if the code I wrote is serviceable, it's been a while since I implemented this. If Python's PEG has solved left recursion in a better way then it might be a good idea to try implementing it like that.

@apalala
Copy link
Collaborator

apalala commented Oct 15, 2023

In #305 @dnicolodi reported that parseinfo positions were off for nodes because space was not skipped. I solved this by adding an eat_whitespace() to memokey(), the memoization function.

Because ho have space in the regexp for your patch, perhaps the above fix is a fix for your case?

Can you do a quick check, @Victorious3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants