Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel handling fixes for seastar-addr2line. #24

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

travisdowns
Copy link
Member

Previously we used 0x0 as a sentinel at the end of every address
sent to addr2line, but this fails with some binaries (e.g., redpanda
compiled with clang 14) because 0 actually resolves to something.

Instead just use a string like 'sentinel' as the sentinel since
addr2line (and llvm-addr2line) just echos these unchanged.

Previously we used 0x0 as a sentinel at the end of every address
sent to addr2line, but this fails with some binaries (e.g., redpanda
compiled with clang 14) because 0 actually resolves to something.

Instead just use a string like 'sentinel' as the sentinel since
addr2line (and llvm-addr2line) just echos these unchanged.

See redpanda issue 5004.
@jcsp
Copy link

jcsp commented Sep 28, 2022

This looks like it got forgotten?

@travisdowns does this PR get us back into a state where crashes in integration tests will come with redpanda_backtrace.log populated?

@travisdowns
Copy link
Member Author

travisdowns commented Sep 28, 2022

@jcsp - probably? What is odd is that even without this decoding generally hasn't been hanging for me so I'm not 100% sure what to make of that.

Do we have an example of any binary + stack where it hangs? I can test this change there.

Not sure what happened with the view, looks like I removed Ben for some reason after adding him.

@travisdowns
Copy link
Member Author

IIRC I may have removed him because sentinel approach actually did not work with one of addr2line or llvm-addr2line, version dependent.

@travisdowns
Copy link
Member Author

I found a stash which has the correct fix, I believe, stand by.

@travisdowns
Copy link
Member Author

@jcsp - I looked a bit more into this and played around locally and I believe the hang is related to slowness in addr2line binary, which can hang run almost indefinitely at 100% for even just a few backtraces.

So I think the fix here is to use -a llvm-addr2line argument to seastar-backtrace which uses the llvm version which is much faster (generally < 1 second per backtrace, IME).

I propose we wrap this up in a redpanda-addr2line.sh in our repo so both DT and people can use a single recommended way to call the underlying seastar script.

I can do this quickly, but wondering if there's a backtrace & binary handy I can try this on or will just any old DT test hang?

@jcsp
Copy link

jcsp commented Sep 29, 2022

I don't have anything handy -- would probably just hack a segfault into the code and run a ducktape test, that's what I've done in the past for testing things like the log scraping.

@travisdowns travisdowns self-assigned this Sep 29, 2022
@jcsp
Copy link

jcsp commented Oct 11, 2022

@travisdowns shall we merge redpanda-data/redpanda#6550 while this one is pending? Not sure how imminent it is

@travisdowns
Copy link
Member Author

@jcsp go ahead and merge 6550, I think I am too busy to get this this week and I imagine this is being quite painful in its current state.

@travisdowns travisdowns requested review from a team and ballard26 and removed request for a team and ballard26 January 24, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants