-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for native extensions on ARM (for linux) #330
base: master
Are you sure you want to change the base?
Conversation
This takes a similar approach to the x86-64 support for native extensions: - unwind the stack using libunwind - look-up the pthread ID from registers in early stack frames The first of these is relatively simple (benfred/remoteprocess#5), and the second is answerable in a basic form, using a debugger: on my machine (see below for details), the pthread ID ended up in registers r5 and r9 in one of the initial stack frames. Machine details: - Raspberry Pi 4B - kernel: 5.4.72-v7l+ - glibc: `GNU C Library (Debian GLIBC 2.28-10+rpi1) stable release version 2.28.`
I've updated this to use remoteprocess 0.4.0, which includes benfred/remoteprocess#5. I've also updated the README. I couldn't find any CI that tests |
Hm, it looks like this isn't quite ready, which I can keep poking at over the next while but no guarantees on time (sorry!):
I tried the following, and usually observed a good stack trace (mostly just $ cat test.py
import os
with open("/tmp/pid", "w") as f:
print(os.getpid(), file=f)
def func():
pass
while True:
func()
$ python3.9 test.py &
$ while :; do py-spy dump --native --pid $(cat /tmp/pid); done
...
Python v3.9.0 (/home/pi/.pyenv/versions/3.9.0/bin/python3.9)
Error: Failed to merge native and python frames (Have 1 native and 2 python)
... I added some debug logging to the loop in Click for diffdiff --git a/src/native_stack_trace.rs b/src/native_stack_trace.rs
index f9ec7e0..66cf347 100644
--- a/src/native_stack_trace.rs
+++ b/src/native_stack_trace.rs
@@ -64,7 +64,14 @@ impl NativeStack {
let is_python_addr = self.python.as_ref().map_or(false, |m| m.contains(addr)) ||
self.libpython.as_ref().map_or(false, |m| m.contains(addr));
let merge_frame = &mut |frame: &remoteprocess::StackFrame| {
- match self.get_merge_strategy(is_python_addr, frame) {
+ let strategy = self.get_merge_strategy(is_python_addr, frame);
+ debug!(
+ "using {:?} for native={} python={}",
+ strategy,
+ frame,
+ frames.get(python_frame_index).map_or("<none>".to_string(), |f| format!("{} ({}:{})", f.name, f.filename, f.line))
+ );
+ match strategy {
MergeType::Ignore => {},
MergeType::MergeNativeFrame => {
if let Some(python_frame) = self.translate_native_frame(frame) { This shows:
Based on that trace, it suggests it might be something about "vector calls", which seems to be used more in 3.9 https://docs.python.org/3.9/whatsnew/3.9.html . Strangely, I can't reproduce this on x86-64 Linux with the latest release (although in a virtualized docker-for-mac container rather than bare metal). Click for docker configuration# Dockerfile
FROM python:3.9
RUN apt update && apt install libunwind8
RUN curl -L https://github.com/benfred/py-spy/releases/download/v0.3.3/py-spy-v0.3.3-x86_64-unknown-linux-gnu.tar.gz | tar xzv
COPY test.sh test.py ./
CMD bash ./test.sh # test.py
import os
with open("/tmp/pid", "w") as f:
print(os.getpid(), file=f)
def func():
pass
while True:
func() # test.sh
python test.py &
sleep 0.1 # time to start up
pid=$(cat /tmp/pid)
while ./py-spy dump --pid $pid --native; do
:;
done Execute: $ ls
Dockerfile test.py test.sh
$ docker build . -t py-spy-test
...
Successfully tagged py-spy-test:latest
$ docker run -it --privileged py-spy-test:latest
... lots of correct stack traces, loop never stops ... |
I think the CI problem isn't related to the wrong remoteprocess version getting picked up - but is instead maybe related to missing unwind binaries on the travis ci box. This seems to be the relevant lines from the travis log file
For https://github.com/benfred/remoteprocess/pull/5/files - you added a step to install libunwind-dev in travis.yml for arm. this might also be necessary here? For the merging error, it does look like PyFunction_Vectorcall might need to be 'MergePythonFrame' rather than Ignore when merging the two stack together - though I'm interested if you can dump out both the python stack and original stack on the error? I don't have an ARM machine to test out on unfortunately ... I keep on wondering if we should accept some level of mismatch when merging the two stacks together - and maybe do what vmprof does (https://vmprof.readthedocs.io/en/latest/native.html) and just switch from native to python on the first python call found rather than discard the frame entirely if things don't quite line up. This would mean we can't handle callbacks or other code that switches from native->python multiple times - but would prevent these type of errors from happening |
I've taken a pass at the CI here: I changed to use github actions (instead of travis/appveyor) and am using a raspberry pi with this instead of just cross compiling (Build / build-linux-armv7) . This will let us run the tests in addition to just checking if it builds. The build problems with libunwind should be sorted out here too 🤞 - both on the raspberry pi box and using rust-musl-cross docker containers for cross compiling w/ musl (also handles aarch64/i686 etc). |
Note: circle ci provides ARMv8 workers for free nowadays: https://circleci.com/docs/2.0/arm-resources/#pricing-and-availability |
What would be required to get this to work as well on M1? |
This takes a similar approach to the x86-64 support for native extensions:
The first of these is relatively simple (benfred/remoteprocess#5), and the
second is answerable in a basic form, using a debugger: on my machine
(see below for details), the pthread ID ended up in registers r5 and r9
in one of the initial stack frames.
Machine details:
GNU C Library (Debian GLIBC 2.28-10+rpi1) stable release version 2.28.
Fixes #327
As an example, the following
test2.py
program just does some operations using Numpy and its native extensions:The following two traces compare the
--native
vs. not flamegraphs. The--native
one is clearly more detailed and informative, showing the internals of the Numpy calls.py-spy record --native --output test2.svg python ./test2.py
py-spy record --output test2-no-native.svg python ./test2.py
For testing, one approach might be https://github.com/rust-embedded/cross/ on CI.