Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Diagnosis accuracy & how to deal with hidden state, closures and more... #33

Open
foreseaz opened this issue Sep 27, 2022 · 5 comments

Comments

@foreseaz
Copy link

foreseaz commented Sep 27, 2022

Hi memlab devs, I have a question on how memlab deals with the hidden state that cannot be captured by simply taking heap snapshots. As we know that, many native methods (implemented in C++) do not expose their state to JavaScript heap snapshots. For example Besides, the variables in closures cannot be easily diagnosed because of scoping. BLeak paper has a solution for this by rewriting the closure variables. https://github.com/plasma-umass/BLeak/blob/master/src/lib/closure_state_transform.ts As far as I go through the memlab codebase, I haven't seen the logic related to it. Will this issue affect the detection diagnosis accuracy? Is this kind of feature on the roadmap?

@JacksonGL
Copy link
Member

JacksonGL commented Sep 27, 2022

Looks like the following two statements are talking about two different problems:

  1. As we know that, many native methods (implemented in C++) do not expose their state to JavaScript heap snapshots.

MemLab is not designed to detect memory leaks in browser/JS engine/native library. I think rewriting JavaScript probably won't expose C++ native state.

  1. For example, the variables in closures cannot be easily diagnosed because of scoping.

Heap snapshot captures shared context (although less intuitive for debugging). Correct me if I am wrong, the purpose of rewriting the closure variables in BLeak is to collect additional information to assist debugging, not to improve detection accuracy.

Instrumentation or rewriting may bring subtle correctness issue, introduce new memory leaks that doesn't exist in the original code, or hide memory leaks from the original code. This actually decreases detection accuracy. I would be interested to see a more concrete example where the heap snapshot fails to capture a memory leak in JavaScript heap. If that happens, we might want to file a bug report or feature request to Chromium or V8.

@foreseaz foreseaz changed the title Discussion: Detection accuracy & how to deal with hidden state Discussion: Diagnosis accuracy & how to deal with hidden state Sep 27, 2022
@foreseaz
Copy link
Author

Thank @JacksonGL for the clarification and answers. Sorry for the confusion and let me rephrase my questions a little bit, my questions are in the context of a more accurate diagnosis, instead of detection.

@foreseaz foreseaz changed the title Discussion: Diagnosis accuracy & how to deal with hidden state Discussion: Diagnosis accuracy & how to deal with hidden state, closures and more... Sep 27, 2022
@JacksonGL
Copy link
Member

JacksonGL commented Sep 28, 2022

@foreseaz If you are talking specifically about the closure scope, I agree the closure context in heap snapshot is less intuitive for debugging. But in terms of accuracy, the same argument also applies to debugging, since the heap snapshot faithfully records what's happening in JS heap. Instrumentation may change heap layout and bring extra noises that confuse the debugging process.

Most of the time shared context in the retainer trace actually includes the information to narrow down to the source location. For example, each closure object has a context field, which contains a scope_info field, it contains the closure variable defined in the closure scope that defines the closure instance, there is also a previous field pointing to the previous closure context in the closure scope chain.

About JS instrumentation, can you share a small code example on what kind of instrumentation is on your mind and explain why it would help debugging?

@mdblr
Copy link

mdblr commented Sep 28, 2022

Yes, that will increase the memory foot print of the program. However, Bleak collects heap snapshots, detects leaks, and finds the leaks paths prior to changing the memory of the program in order to avoid this.

Although we are aware that Memlab focuses on front-end javascript, we are exploring how we may be able to leverage it to use with nodejs programs.

In this context, code rewriting and memory mirroring would have been especially helpful for leaks obfuscated by objects like anonymous or external. In theory, we would see something like what the paper includes:

37 body.jQuery223056319336220622061.events.mouseup in closure of $widgetContent.__proto__.mwheelIntent

I do not currently have a code block or heap snapshot that I can share with you since they are work related. Nevertheless, the BLeak paper reports that it outperformed other metrics used for growth reduction at the time, which the metaprogramming contributed to.

All this is to say, we're just interested in leveraging Memlab if we can to make it easier to debug the situations I mentioned earlier.

Thanks for sharing your insights into this.

@JacksonGL
Copy link
Member

JacksonGL commented Sep 30, 2022

@mdblr Bleak seems to intercept and rewrite JavaScript with mitmproxy. MemLab could potentially do that by intercepting and rewriting JavaScript in puppeteer for front-end JS leak detection and debugging. One ideas is to inject source information as a special closure variable in closure context, so that it's easier to link back to the source location when debugging.

For Node.js that's a different story, you may need to take heap snapshots in Node.js by yourself and then pass the snapshots to MemLab for heap analysis. One potential idea is to overwrite require so that you can intercept all require calls and rewrite JavaScript before the modules were loaded by node.js (I did something similar before for a different project) and it's feasible.

MemLab doesn't have any "front-end" that drives a node.js program (like something using puppeteer to drive browser). Let us know what kind of APIs you would like for better Node.js support if you have any ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants