Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating to package:characters? #80

Open
domesticmouse opened this issue Aug 16, 2020 · 5 comments
Open

Migrating to package:characters? #80

domesticmouse opened this issue Aug 16, 2020 · 5 comments

Comments

@domesticmouse
Copy link

Hi Lukas,

I'm curious as to your feelings about the potential for porting Petit Parser to the Characters package?

I'm wondering if the CharacterRange iterator is sufficient for Petit Parser's backtracking requirements. The upside of migrating to the Character package is that Petit Parser would be parsing in terms of Unicode Grapheme clusters instead of characters.

brett

@renggli
Copy link
Member

renggli commented Aug 16, 2020

Hi Brett,

The characters package is great, I've already converted some of my other code to use it. For example, the more package uses it for its configurable printers. I've also tried to use it for the char_matcher in the same package (which is used in similar form in PetitParser), with less success. I found it challenging to implement character-based predicates efficiently, for example character ranges or character sets (whitespaces, letters, digits, lower-case, upper-case, ...).

PetitParser needs random access to characters. It currently does it mostly through codeUnitAt, however the original implementation in Smalltalk used something similar to CharacterRange to navigate forward and backward through the input. I expect this could easily be adopted in Dart.

What I am more concerned about is performance: I expect moveNext, current and copy to be the most frequently called methods. They all execute quite a bit of code, and most of them also allocate new objects. In version 2.2.0 I introduced a fast-parse mode that avoids memory allocations during parsing, which brought a huge speed improvement on some inputs for lexer-like matching. Not sure how this could be adopted with characters?

@domesticmouse
Copy link
Author

I'm sure we'd love some feedback at https://github.com/dart-lang/characters if you have a chance =)

@Hamza5
Copy link

Hamza5 commented Nov 9, 2021

Now after more than a year, is there any Unicode support available? We are in 2021 and Unicode support is really required.

@renggli
Copy link
Member

renggli commented Nov 9, 2021

This library is designed to parse over streams of bytes (or UTF-16 characters), and as such it is agnostic to the encoding of your input. For example, the xml library is successfully parsing unicode input without the need for either of the libraries to "understand" the underlying characters.

While I understand that out of the box decoding would be desirable, it comes at a hefty cost in performance. So far I haven't seen a compelling need to support it natively. A real-world use-case where the current infrastructure doesn't work would definitely help to motivate investing time into this issue.

@RandalSchwartz
Copy link

To only slightly hijack this thread, I'm wondering if it's possible to use the higher-level logic provided by the parser to extract values out of an arbitrary jsonDecode. It'd be nice if I could specify a pattern of a json object containing some strings and bools and a json list of further objects, and then the .map action could map that into a Dart object via a constructor. I suspect it's only a matter of coming up with a useful set of primitives, and then using the rest of the mechanics without change, but if there's been any thought about this, I'd be interested. I know the switch stuff makes some of this easier, but it still doesn't feel as satisfying as just writing a composable Parser rule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants