Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage when attach_stacktrace=true #597

Open
agrinman opened this issue Jul 12, 2023 · 4 comments
Open

High CPU usage when attach_stacktrace=true #597

agrinman opened this issue Jul 12, 2023 · 4 comments

Comments

@agrinman
Copy link

Environment

actix-web

What version are you running? Etc.
0.31.5

Steps to Reproduce

When setting up sentry to attach stack traces, as below:

    let _guard: sentry::ClientInitGuard = sentry::init((
        config.sentry_url.as_str(),
        sentry::ClientOptions {
            release: sentry::release_name!(),
            environment: Some(Cow::Owned(config.service_config.environment.clone())),
            sample_rate,
            attach_stacktrace: true, // <---- this right here
            ..Default::default()
        },
    ));

We see a really high CPU usage (up to ~800% on my dev linux machine with 16 cores) when using a load test tool like vegeta:

For example,

echo "GET http://localhost:8000/just_return_an_error" | vegeta attack -duration=10s -rate=2000/1s -timeout=30s | tee results.bin | vegeta report

A bunch of requests actually don't even complete (timeout) because the CPU usage spikes so high. This all goes away when making attach_stacktrace = false.

It's important to note that we have a large enum error return type, but it's been boxed:

struct Error(Box<ErrorInner>);

enum ErrorInner {
	Many,
	Different,
	Variants(WithMaybeLargeProperties),
	//..
}

Any idea what's going on here or how we can still get stack traces?

@Swatinem
Copy link
Member

Are you capturing a lot of errors or transactions during an average request?

I believe the problem here is simply that backtrace is slow. Especially if it is set up to do client-side symbolication, which it is.

Put simply, the first step of unwinding the stack trace involves using / accessing unwind info from all the binaries the call stack goes through.
Afterwards, the instruction addresses are resolved to a number of source locations (function+file+line), multiple even as it resolves inlined functions.

This process is just slow in itself. The backtrace crate has a new_unresolved function which skips the expensive symbolication. But it means that people would have to rely on server-side symbolication to have something readable.
Mixing those two is also a bad idea due to getsentry/sentry#46435.

Long story short, we should be able to take half of the pain here away by using new_unresolved, and rely on server-side symbolication by default. The Native SDK offers a symbolize_stacktraces runtime option which allows to enable that behavior. We could potentially add such an option to the Rust SDK as well.

@Swatinem
Copy link
Member

Lets reopen this, as this is a good thing to eventually do, maybe not right away though :-)

@Swatinem Swatinem reopened this Jul 12, 2023
@gautamg795
Copy link

gautamg795 commented May 10, 2024

Hi @Swatinem, I think my question is related to this: we're finding that stacktrace generation (specifically symbolication) is causing memory usage of the process to jump significantly — from < 2MB to > 400MB, in a moderately complex process.
Removing all debug info and uploading it to Sentry itself fixes this (as we discussed a while in the linked issue); however, we'd actually like to keep debug info in the production binaries to ease debugging with gdb, profiling, etc when needed.

So I think like the original discussion above, we'd like a way to disable symbolication even when debug info is present. It seems like disabling the backtrace feature means nothing useful is sent up, not even an unsymbolicated trace (which makes sense).

If you have any pointers on where this change would be made, we're happy to submit a PR or utilize a fork with the change. We're specifically using sentry-anyhow which seems to just Debug-print the anyhow-provided backtrace, which triggers the high memory usage (deep in gimli).

EDIT: I guess this isn't exactly the right place for my request, as our issue is not specific to attach_stacktraces, but just any time Sentry adds a stacktrace to an event.

@Swatinem
Copy link
Member

I had another look at this, and the situation is still quite bad.

Due to not having stable accessors for either the anyhow backtrace, or std::backtrace::Backtrace, using Display is still the only way to do anything with the backtrace.

However, if you are relaying on anyhow exclusively, depending on how you interpret the docs (https://docs.rs/anyhow/latest/anyhow/struct.Error.html#stability), the backtrace feature would still give you the anyhow backtrace shim.

If you are comfortable maintaining a fork of that, you can just remove the lazy resolution of the frames here:
https://docs.rs/anyhow/latest/src/anyhow/backtrace.rs.html#321-327

That is however not possible with std unless you really want to build std yourself, which I highly doubt.

Figuring out a way to disable the lazy resolve should hopefully fix the memory usage problems.
Not quite sure if backtrace::clear_symbol_cache might help, as its possible that std has its own isolated copy of that.

However the roundtrip through String formatting and parsing things via regex will still hurt performance in those cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants