Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add no_std support #294

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

stevefan1999-personal
Copy link
Contributor

@stevefan1999-personal stevefan1999-personal commented Apr 16, 2022

This patch will replace the use of std crate to their no_std equivalent. One notable change is that peg_runtime::error::ExpectedSet would use BTreeSet instead of HashSet before, as I observed that the expected set order is random during a parser debug.

I did not add a HashSet shim for peg_runtime::error::ExpectedSet because this technically means the underlying structure can be changed many times over. Although it wouldn't hurt that much since it is not exposed, this could cause confusion on document generation. This does imply that the expected order will be based on lexicographical order and I do think this would make it more stable which is a nice thing.

@kevinmehall
Copy link
Owner

no_std support would be great to have!

ExpectedSet switching to BTreeMap seems fine. That set is not expected to be very large, and is used only after a parse failure, so performance is less critical there.

Rather than adding the hashbrown dep, I wonder if it makes sense to add syntax for configuring the map type used for the cache. For instance, it might be desirable to specify a non-default hasher. I am not sure of the syntax to use in the grammar, possibly type ParserCacheMap<K, T> = HashMap<K, T>;, or #![cache_map_type(HashMap)].

It's unfortunate that the Error trait requires the configuration option, but it looks like there is work in progress to get Error in core. If we don't want to wait for that, I think the best practice is to call the feature "std" (as a default feature) rather than "no_std" so it is additive.

@stevefan1999-personal
Copy link
Contributor Author

stevefan1999-personal commented May 23, 2022

@kevinmehall for hashbrown how should we expose the type for user input as a macro in the grammar? It should be allowed in Rust right?

peg::parser! {
    #[cache_map_type(HashMap)]
    grammar lol(config: bool) for str {
        #[cache_left_rec]
        rule one() -> ()
            = one() / "foo"
    }
}

but wouldn't that attach cache_map_type attribute to lol?

Since both HashMap and BTreeMap does not have a common set of trait, maybe we should make a specialization for the cache as a minimal set of functions required so user can supply a custom cache implementation without having to worry about the type recursion problem as well.

@stevefan1999-personal
Copy link
Contributor Author

Hmm, we could make use of super, so that we can define the cache type to whatever we want

@kevinmehall
Copy link
Owner

A proc macro gets everything inside of the peg::parser! { ... } as a token tree. So there's no such thing as an attribute at that level, just a # token and a [] group containing a cache_map_type ident, etc. and it's up to us to parse that and do whatever we want with it. That does mean that if we want to support both a #[] attribute syntax outside the {} and a #![] attribute inside, we must explicitly parse both.

Yes, we'd want to document what methods the map type must have.

It already injects use super::*; in the generated mod so that types (e.g. rule return types) resolve like they do in the surrounding code.

@stevefan1999-personal
Copy link
Contributor Author

stevefan1999-personal commented Jul 20, 2022

A proc macro gets everything inside of the peg::parser! { ... } as a token tree. So there's no such thing as an attribute at that level, just a # token and a [] group containing a cache_map_type ident, etc. and it's up to us to parse that and do whatever we want with it. That does mean that if we want to support both a #[] attribute syntax outside the {} and a #![] attribute inside, we must explicitly parse both.

Yes, we'd want to document what methods the map type must have.

It already injects use super::*; in the generated mod so that types (e.g. rule return types) resolve like they do in the surrounding code.

Agreed, so should we further attach the cache type information to #[cache_left_rec], and also expand the type parser to handle generic as well? Recently I've been trying to use cache on stack that requires const generic which resulted in things like FixedHashMap<_, 1024> for 1024 entries only. Although it doesn't really fit in the use case of compilers (as there could be infinite lookaheads) it does make it allocator-free.

I've also starting to think about repeated tokens, as the use of Vec type (either std::vec::Vec or alloc::vec::Vec) means it is not allocator-free. can also be treated the same way to have a fixed set before overflowing to make it very suitable for parsing data with formal structure in kernel. It also helps for people who use immutable data structure in compiler, but the way it works is not the same (rather than directly pushing into the vector vec.push(123), you do a new assignment vec = vec.push(123)) and the new vector can now be copied freely. I've been thinking of integrating of peg with rpds lately with huge potential from my own test use, but the problem is the use of alloc::vec:Vec means additional heap and into conversion is needed.

@stevefan1999-personal
Copy link
Contributor Author

I think we should make the cache type a facade trait that can be defined by the user externally rather than applying it in macro time. This way we can have better flexibility because we can even use it for things like tracing -- capturing the event when a rule is cached and observe what is going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants