Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning progress when parsing fails #41

Open
zenMaya opened this issue Feb 3, 2023 · 8 comments
Open

Returning progress when parsing fails #41

zenMaya opened this issue Feb 3, 2023 · 8 comments

Comments

@zenMaya
Copy link

zenMaya commented Feb 3, 2023

There is a use case for this library, where the parser should report, where it failed. Since that would allow for some "graceful" failing. Instead of crashing, there is some merit in reporting where. It would suffice to report the position/rest, as everything else can be calculated if that happens, but now I think there is no way to report progress.

This would essentially mean moving the Error inside the ParserResult, which isn't ideal and is an architecture change. I understand if such modification is out of scope, and I need to implement it myself.

@Hejsil
Copy link
Owner

Hejsil commented Feb 3, 2023

If we can come up with something that is decently ergonomic, then I wouldn't mind the change. All my libs will not have a 1.0 release before zig 1.0 so let's just break things while we can.

@zenMaya
Copy link
Author

zenMaya commented Feb 4, 2023

I think the change would require moving out of Error!Result(T) into something like:

pub fn Result(comptime T: type) type {
    return struct {
        pub const Value = union(enum) {
            Ok: T,
            Err: Error, // using the pub const Error
        };

        value: Value,
        rest: []const u8 = "",
    };
}

But that would mean rewriting all combinators and their respective functions

@zenMaya
Copy link
Author

zenMaya commented Feb 5, 2023

maybe it would be even better to support in what state the parser left, as there are some recovery strategies that can be implemented, to report as many errors as possible

for example:
c strings are always one line:

char* str = "aaaaa
";

is invalid but most parsers would assume that the string was just improperly terminated, although the input is invalid, the parser would continue assuming that the string is valid:

char* str = "aaaaa"
";";

erroring with unterminated string x2 and missing semicolon x2

or even some smarter error recovery strategy. Like having a special rule for multiline strings that fails as it is not valid in the C grammar, but continuing with parsing as the parser can still find more errors. Failing at the first error and not knowing what the error is: "ParserError", is not sufficient for recovery.

If there could be either a "panic mode" (discard characters until a "safe character" is found, like }, end of a block)

or fail with continuation, so the implementor could modify the state to get it into a safe state. For example if string literal fails, continue by inserting a ghost ", then continue with parsing.

@zenMaya
Copy link
Author

zenMaya commented Feb 7, 2023

I have starting to test the "continuation" tactic.

The Result structure would look like this:

pub fn Result(comptime T: type) type {
    return struct {
        pub const Value = union(enum) {
            Ok: T,
            InternalError: Error, // Parser failed irrecoverably
            ParserError: Parser(T), // Parser failed at this state
        };
        value: Value,
        rest: []const u8 = "",
    };
}

and for example the string combinator would look like this (I had to remove the comptime since the parsing must be able to dynamically recreate the parser):

pub fn string(str: []const u8) Parser(void) {
    return struct {
        fn func(_: mem.Allocator, s: []const u8) Void {
            const diff = mem.indexOfDiff(u8, s, str); // find the first characters where input differs
            if (diff) |diff_index| { // if there is one
                if (diff_index == str.len) // if the first difference is the ending of str, the parse was successful
                    return Void{ .value = .{ .Ok = {} }, .rest = s[str.len..] };
                // else create a new string, that expects the rest of the string to be parsed
                // for example: if s = "strang" and str = "string"
                // => string("ing") and rest = "ang"
                const resume_string = string(str[diff_index..]);
                return Void{ .value = .{ .ParserError = resume_string }, .rest = s[diff_index..] };
            } else return Void{ .value = .{ .Ok = {} }, .rest = s[str.len..] };
        }
    }.func;
}

@Hejsil please let me know what you think! Does it seem interesting to you? Should I continue with the implementation?

PS. I noticed that this may not be possible, I am not that versed in every zig implementation detail, and string not beeing a comptime function might make it not function.

@Hejsil
Copy link
Owner

Hejsil commented Feb 7, 2023

PS. I noticed that this may not be possible, I am not that versed in every zig implementation detail, and string not beeing a comptime function might make it not function.

const resume_string = string(str[diff_index..]);

Yes correct, this will not work. I'm also not sure I understand why we would want ParserError to have Parser(T) as the result. How would one use this? I think you need to show me how you imagine error recovery looking.

@zenMaya
Copy link
Author

zenMaya commented Feb 7, 2023

Yes correct, this will not work. I'm also not sure I understand why we would want ParserError to have Parser(T) as the result. How would one use this? I think you need to show me how you imagine error recovery looking.

Okay maybe I haven't thought this through properly, sorry. I have been thinking about this for a few hours, and parser combinators aren't really suitable for what I had in mind. It would work for LL(1) parsers, but you cannot really inspect combinators like you can inspect Finite State Machines with explicit transition tables. The opt function essentially does almost everything that you need for error recovery I think, maybe some more ergonomic wrapper for collecting errors, but that doesn't require drastic API changes.

I'm really sorry that I have bothered you with this, I should've thought about it much longer before posting this.

Maybe the original Ok/Err proposal would be still useful? At least if you don't want to modify your combinators to recover from errors using opt, you can at least report the syntax error position.

@Hejsil
Copy link
Owner

Hejsil commented Feb 7, 2023

Maybe the original Ok/Err proposal would be still useful? At least if you don't want to modify your combinators to recover from errors using opt, you can at least report the syntax error position.

Yea, I think there is a usecase for the Ok/Err thing. Being able to see how far the parser got furthest can be quite useful for simple error reporting, but for things more complicated, I think that is outside this libraries scope.

@zenMaya
Copy link
Author

zenMaya commented Feb 7, 2023

Yea, I think there is a usecase for the Ok/Err thing. Being able to see how far the parser got furthest can be quite useful for simple error reporting,

Okay, I'll try to get it done soon'ish and make a PR!

but for things more complicated, I think that is outside this libraries scope.

You are right, I think there could be some combinators for opt and error message, or combine opt with map, but otherwise the library is already featurefull enough. Once again I'm sorry for all the other stuff I suggested, it was dumb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants