Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for native extensions on ARM (for linux) #330

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

huonw
Copy link

@huonw huonw commented Dec 13, 2020

This takes a similar approach to the x86-64 support for native extensions:

  • unwind the stack using libunwind
  • look-up the pthread ID from registers in early stack frames

The first of these is relatively simple (benfred/remoteprocess#5), and the
second is answerable in a basic form, using a debugger: on my machine
(see below for details), the pthread ID ended up in registers r5 and r9
in one of the initial stack frames.

Machine details:

  • Raspberry Pi 4B
  • kernel: 5.4.72-v7l+
  • glibc: GNU C Library (Debian GLIBC 2.28-10+rpi1) stable release version 2.28.

Fixes #327


As an example, the following test2.py program just does some operations using Numpy and its native extensions:

import numpy as np


def foo():
    data = np.random.gamma(2, 3, size=10000000)
    data.sort()
    print(data.mean())


def bar():
    foo()


def baz():
    foo()
    bar()


baz()

The following two traces compare the --native vs. not flamegraphs. The --native one is clearly more detailed and informative, showing the internals of the Numpy calls.

py-spy record --native --output test2.svg python ./test2.py

image

py-spy record --output test2-no-native.svg python ./test2.py

image


For testing, one approach might be https://github.com/rust-embedded/cross/ on CI.

This takes a similar approach to the x86-64 support for native extensions:

- unwind the stack using libunwind
- look-up the pthread ID from registers in early stack frames

The first of these is relatively simple (benfred/remoteprocess#5), and the
second is answerable in a basic form, using a debugger: on my machine
(see below for details), the pthread ID ended up in registers r5 and r9
in one of the initial stack frames.

Machine details:

- Raspberry Pi 4B
- kernel: 5.4.72-v7l+
- glibc: `GNU C Library (Debian GLIBC 2.28-10+rpi1) stable release version 2.28.`
@huonw
Copy link
Author

huonw commented Dec 20, 2020

I've updated this to use remoteprocess 0.4.0, which includes benfred/remoteprocess#5. I've also updated the README. I couldn't find any CI that tests --native, so I've not made any changes there.

@huonw
Copy link
Author

huonw commented Dec 20, 2020

Hm, it looks like this isn't quite ready, which I can keep poking at over the next while but no guarantees on time (sorry!):

  • CI is failing, as it seems to still be using remoteprocess 0.3.4, despite updating Cargo.toml and Cargo.lock
  • This occasionally fails on Python 3.9, it seems (details below)

I tried the following, and usually observed a good stack trace (mostly just <module> (test.py:12)), but occasionally the error highlighted in the transcript below:

$ cat test.py
import os

with open("/tmp/pid", "w") as f:
    print(os.getpid(), file=f)


def func():
    pass


while True:
    func()

$ python3.9 test.py &
$ while :; do py-spy dump --native --pid $(cat /tmp/pid); done
...
Python v3.9.0 (/home/pi/.pyenv/versions/3.9.0/bin/python3.9)

Error: Failed to merge native and python frames (Have 1 native and 2 python)
...

I added some debug logging to the loop in merge_native_stack:

Click for diff
diff --git a/src/native_stack_trace.rs b/src/native_stack_trace.rs
index f9ec7e0..66cf347 100644
--- a/src/native_stack_trace.rs
+++ b/src/native_stack_trace.rs
@@ -64,7 +64,14 @@ impl NativeStack {
             let is_python_addr = self.python.as_ref().map_or(false, |m| m.contains(addr)) ||
                     self.libpython.as_ref().map_or(false, |m| m.contains(addr));
             let merge_frame = &mut |frame: &remoteprocess::StackFrame| {
-                match self.get_merge_strategy(is_python_addr, frame) {
+                let strategy = self.get_merge_strategy(is_python_addr, frame);
+                debug!(
+                    "using {:?} for native={} python={}",
+                    strategy,
+                    frame,
+                    frames.get(python_frame_index).map_or("<none>".to_string(), |f| format!("{} ({}:{})", f.name, f.filename, f.line))
+                );
+                match strategy {
                     MergeType::Ignore => {},
                     MergeType::MergeNativeFrame => {
                         if let Some(python_frame) = self.translate_native_frame(frame) {

This shows:

[2020-12-20T23:08:54.180176497Z DEBUG py_spy::native_stack_trace] using MergePythonFrame for native=0x00000000000284b8 _PyEval_EvalFrameDefault (/home/pi/.pyenv/versions/3.9.0/bin/python3.9) python=func (/home/pi/projects/benfred/py-spy/test.py:7)
[2020-12-20T23:08:54.180658301Z DEBUG py_spy::native_stack_trace] using Ignore for native=0x000000000002774c function_code_fastcall (/home/pi/.pyenv/versions/3.9.0/bin/python3.9) python=<module> (/home/pi/projects/benfred/py-spy/test.py:12)
[2020-12-20T23:08:54.180865519Z DEBUG py_spy::native_stack_trace] using Ignore for native=0x000000000003c5c4 _PyFunction_Vectorcall (/home/pi/.pyenv/versions/3.9.0/bin/python3.9) python=<module> (/home/pi/projects/benfred/py-spy/test.py:12)
Error: Failed to merge native and python frames (Have 1 native and 2 python)

Based on that trace, it suggests it might be something about "vector calls", which seems to be used more in 3.9 https://docs.python.org/3.9/whatsnew/3.9.html . Strangely, I can't reproduce this on x86-64 Linux with the latest release (although in a virtualized docker-for-mac container rather than bare metal).

Click for docker configuration
# Dockerfile
FROM python:3.9

RUN apt update && apt install libunwind8
RUN curl -L https://github.com/benfred/py-spy/releases/download/v0.3.3/py-spy-v0.3.3-x86_64-unknown-linux-gnu.tar.gz | tar xzv
COPY test.sh test.py ./

CMD bash ./test.sh
# test.py
import os

with open("/tmp/pid", "w") as f:
    print(os.getpid(), file=f)


def func():
    pass


while True:
    func()
# test.sh
python test.py &

sleep 0.1 # time to start up

pid=$(cat /tmp/pid)
while ./py-spy dump --pid $pid --native; do
  :;
done

Execute:

$ ls
Dockerfile	test.py		test.sh
$ docker build . -t py-spy-test 
...
Successfully tagged py-spy-test:latest
$ docker run -it --privileged py-spy-test:latest
... lots of correct stack traces, loop never stops ...

@benfred
Copy link
Owner

benfred commented Jan 2, 2021

I think the CI problem isn't related to the wrong remoteprocess version getting picked up - but is instead maybe related to missing unwind binaries on the travis ci box. This seems to be the relevant lines from the travis log file

  = note: /usr/lib/gcc-cross/arm-linux-gnueabihf/5/../../../../arm-linux-gnueabihf/bin/ld: cannot find -lunwind
          /usr/lib/gcc-cross/arm-linux-gnueabihf/5/../../../../arm-linux-gnueabihf/bin/ld: cannot find -lunwind-ptrace
          /usr/lib/gcc-cross/arm-linux-gnueabihf/5/../../../../arm-linux-gnueabihf/bin/ld: cannot find -lunwind-arm
          collect2: error: ld returned 1 exit status

For https://github.com/benfred/remoteprocess/pull/5/files - you added a step to install libunwind-dev in travis.yml for arm. this might also be necessary here?

For the merging error, it does look like PyFunction_Vectorcall might need to be 'MergePythonFrame' rather than Ignore when merging the two stack together - though I'm interested if you can dump out both the python stack and original stack on the error? I don't have an ARM machine to test out on unfortunately ...

I keep on wondering if we should accept some level of mismatch when merging the two stacks together - and maybe do what vmprof does (https://vmprof.readthedocs.io/en/latest/native.html) and just switch from native to python on the first python call found rather than discard the frame entirely if things don't quite line up. This would mean we can't handle callbacks or other code that switches from native->python multiple times - but would prevent these type of errors from happening

@benfred
Copy link
Owner

benfred commented Jan 16, 2021

I've taken a pass at the CI here: I changed to use github actions (instead of travis/appveyor) and am using a raspberry pi with this instead of just cross compiling (Build / build-linux-armv7) . This will let us run the tests in addition to just checking if it builds. The build problems with libunwind should be sorted out here too 🤞 - both on the raspberry pi box and using rust-musl-cross docker containers for cross compiling w/ musl (also handles aarch64/i686 etc).

@ogrisel
Copy link

ogrisel commented Aug 18, 2021

Note: circle ci provides ARMv8 workers for free nowadays: https://circleci.com/docs/2.0/arm-resources/#pricing-and-availability

@samaahitabelavadi
Copy link

Hello @huonw @benfred, thanks for trying to add native extension support for ARM as well. Can you please take this PR to the finish line?

@battaglia01
Copy link

What would be required to get this to work as well on M1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support native extensions on ARM
5 participants