Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recursive-leak: fix leak by using middle-man #494

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Zij-IT
Copy link
Contributor

@Zij-IT Zij-IT commented Jul 28, 2023

This introduces a middle man to prevent both memory leaks (#486) in recursive parsers, and prevents the user from foot-gunning themselves by calling the wrong clone (referring to craftspider's weak_clone) suggestion.

Copy link
Owner

@zesterer zesterer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat trick!

src/recursive.rs Outdated
Comment on lines 159 to 162
pub fn define<P: Parser<'a, I, O, E> + Clone + MaybeSync + 'a + 'b>(
parser: P,
declaration: Recursive<Indirect<'a, 'b, I, O, E>, Declared>,
) -> Self {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function be on Recursive<..., Declared> and return a Recursive<..., Defined>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you could do that too. For type inference it doesn’t matter, but it could make more sense logically.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say so, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed with 4572bb5

@CraftSpider
Copy link
Collaborator

While this method seems nice, it also makes more complicated recursive patterns harder - you can't have a single context struct on which you both use and define the recursive parser, since the two things are separate types. While you may be able to re-architect code in most cases to avoid this, I'm curious if that's an intended limitation?

@Zij-IT
Copy link
Contributor Author

Zij-IT commented Aug 7, 2023

While this method seems nice, it also makes more complicated recursive patterns harder - you can't have a single context struct on which you both use and define the recursive parser, since the two things are separate types. While you may be able to re-architect code in most cases to avoid this, I'm curious if that's an intended limitation?

No, it's not intended. We could wrap them both in an enum (somewhat like Cow), and have an arm for both Defined and Declared, which would mean that the user would only have to deal with Declared and Defined. The one thing I don't like about the enum is that it opens the door for the user to attempt to use define on a parser that is already been defined, which isn't possible with the way it's currently structured.

@Zij-IT
Copy link
Contributor Author

Zij-IT commented Aug 7, 2023

Given that those checks fail for reasons outside of this PR, how would you like me to proceed?

@zesterer
Copy link
Owner

zesterer commented Aug 7, 2023

You should be able to rebase to fix the failing CI now :)

@Zij-IT
Copy link
Contributor Author

Zij-IT commented Aug 17, 2023

Finally got around to it. My bad for the mess that is the commit history in this pull request 😅 One would think that after using it for four+ years that I would be able to keep it somewhat clean

@zesterer
Copy link
Owner

I totally forgot about this PR, I'll take a final look at it tomorrow.

@zesterer
Copy link
Owner

Was the issue mentioned by CraftSpider above resolved?

@zesterer zesterer mentioned this pull request Oct 10, 2023
34 tasks
@faassen
Copy link

faassen commented Oct 10, 2023

Is there any way I could help on this issue?

@zesterer
Copy link
Owner

zesterer commented Oct 10, 2023

I think it probably just needs a rebase and then it's ready to go. That said, I am a bit concerned about the issue @CraftSpider mentioned above: it will probably break the way some people are using Recursive. That said, arguably such uses are just wrong and should be discouraged? Interested in thoughts on this.

@CraftSpider
Copy link
Collaborator

I think that you can probably change it to always store one or the other in the state, but I haven't really gotten around to trying it out with this change. I'm not too worried personally.

@zesterer
Copy link
Owner

Right, in that case it's probably fine. I would appreciate the commit history being cleaned up a bit here before merging too (otherwise I think I'll squash).

@faassen
Copy link

faassen commented Oct 10, 2023

I need to look at my use of recursive tomorrow to see whether whether I can understand the limitation and whether it hurts my usage, I don't get it yet without staring at the code.

@zesterer
Copy link
Owner

I believe it's specifically for the case of Recursive::declare/Recursive::define. Doing my_recursive.define(...) previously had the signature fn(self: &mut Recursive, impl Parser) (i.e: it was an in-place mutation) but now has the signature fn(self: Recursive<A>, impl Parser) -> Recursive<B> (i.e: it consumes self and transforms it to a new type, thereby turning it into a strong reference, as is the case internally with the recursive function).

@Zij-IT
Copy link
Contributor Author

Zij-IT commented Jan 28, 2024

So, my apologies for dropping off of the face of this earth for so long. If I get this rebased in the next coming days, would it be good to be merged, or is there anything else that it is needing?

@zesterer
Copy link
Owner

zesterer commented Jan 29, 2024

No worries, there's no time constraint here :) I think this change is probably the way to go, on reflection. It's unfortunate that it breaks certain uses of Recursive::declare/define but those modes of use are inherently leaky in a way that seems like it would require generalised garbage collection to disentangle. So, yes, I'd be happy to see this merged.

@Zij-IT
Copy link
Contributor Author

Zij-IT commented Jan 30, 2024

So... I'll be honest. I tried to rebase this thing for like an hour :') Merging took all of two seconds, so obviously I suck at rebasing.

Edit: It seems to fail with an issue that is unrelated to this PR. Should I fix it up anyway and push the change?

@zesterer
Copy link
Owner

zesterer commented Jan 31, 2024

Maybe it would be better to just take the diff from main, then go back to main and apply that diff on top, creating a new commit entirely? Then you can force-push that to this branch.

Don't worry about unrelated CI stuff, I'll address that elsewhere :)

@Zij-IT
Copy link
Contributor Author

Zij-IT commented Jan 31, 2024

Sounds good! I’ll take care of that after I get outta work!

@Flygrounder
Copy link

@zesterer Could we have a new alpha release once this gets merged?

@Flygrounder
Copy link

Maybe it would be better to just take the diff from main, then go back to main and apply that diff on top, creating a new commit entirely? Then you can force-push that to this branch.

Don't worry about unrelated CI stuff, I'll address that elsewhere :)

Wanted to ask, isn't it the same as using "Squash and merge" option when merging on GitHub? It seems like @Zij-IT got busy and it's a bit sad that this PR is essentially ready, but not merged. We want to use this library in production in our company, but this memory leak is a problem in our use case (long-running process that can parse thousands of inputs).

@zesterer
Copy link
Owner

I believe the behaviour is the same, yes: with the exception that the commit message will be a concatenation of every intermediate commit, which is undesirable. I would do it myself, but I don't want to be taking credit for @Zij-IT's work in the commit log.

@Zij-IT
Copy link
Contributor Author

Zij-IT commented Feb 29, 2024

My apologies to everyone that has been waiting for me to make the change. I just did what was recommended by @zesterer, so this should be good to roll.

The test that fails fails because what was a runtime error became a compiler error in the tests case. There isn't AFAIK a way to test that something fails to compile, so I can't write an equivalent test. As such, I just deleted the test :/

@wackbyte
Copy link
Contributor

wackbyte commented Feb 29, 2024

You can test something that fails to compile by writing a doctest:

/// ```compile_fail
/// // ...
/// ```
fn test_name() {}

A specific error code can be provided, separated by a comma: compile_fail,E9999.

There might be some more setup required, but this is a method I've used in the past.

This has the drawback of not being able to specify an expected error (other than its code, and not all errors have codes). A better solution would likely use trybuild.

(rustdoc documentation)

src/recursive.rs Outdated
Comment on lines 157 to 160
pub fn define<P: Parser<'a, I, O, E> + Clone + MaybeSync + 'a + 'b>(
parser: P,
declaration: Self,
) -> Recursive<Indirect<'a, 'b, I, O, E>, Defined> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for not having the declaration argument be self?

Copy link
Contributor Author

@Zij-IT Zij-IT Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly nothing in particular. I'll swap it over to having it be a method!

Edit: Actually, there is a small factor for why. If it's a method that consumes self, than you end up having to do something like:

let mut chain = Recursive::declare();

// Note the clone  vvvvvvv
let result = chain.clone().define(just::<_, _, extra::Err<Simple<char>>>('+')
    .then(chain.clone())
    .map(|(c, chain)| Chain::Link(c, Box::new(chain)))
    .or_not()
    .map(|chain| chain.unwrap_or(Chain::End)));

// As opposed to:
let result = Recursive::define(just::<_, _, extra::Err<Simple<char>>>('+')
    .then(chain.clone())
    .map(|(c, chain)| Chain::Link(c, Box::new(chain)))
    .or_not()
    .map(|chain| chain.unwrap_or(Chain::End)), chain);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zesterer Which one do you prefer? I'll do it whichever way you'd prefer. I personally like the symmetry between Recursive::declare and Recursive::define and the second way makes it more obvious that Recursive::define consumes the parser.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I prefer the first, personally. The extra parameter way off at the end of the definition feels very isolated and random if you're not already sure what it is. Also, putting it at the front makes clear what the target of the definition is.

@Zij-IT
Copy link
Contributor Author

Zij-IT commented May 4, 2024

I'm pretty sure GitHub decided the 7 commits before mine are different because the hashes are different... I try and do the right thing with a rebase I manage to create clutter :D

Anyhow, this should be the final commit for the PR to be merged, right?

@zesterer
Copy link
Owner

I'll try to find time to take another pass of this today. Hopefully, yep!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants