Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RenderDoc is unable to connect and debug on Samsung S23 Ultra #3283

Open
unrealsid opened this issue Mar 26, 2024 · 22 comments
Open

RenderDoc is unable to connect and debug on Samsung S23 Ultra #3283

unrealsid opened this issue Mar 26, 2024 · 22 comments
Labels
Bug A crash, misbehaviour, or other problem Need More Info More information is needed from a user to work on this issue

Comments

@unrealsid
Copy link

unrealsid commented Mar 26, 2024

Description

Hello @baldurk,

Hope you're well.
I've been using RenderDoc for debugging an Android application I've been developing since sometime. It worked quite well until an Android system update got installed on the Samsung S23 ultra, and now RenderDoc cannot correctly debug or connect to the end device. I am running the latest verison of RenderDoc.

Steps to reproduce

Reproduction steps are as follows:

  1. Enable developer mode on the mobile device and start wireless debugging.
  2. Ensure usb debugging is permitted for the computer.
  3. Open a command prompt to connect the pc to the Samsung Device. On the pc prompt, type adb connect xyz.xyz.xyz.xyz:port where this is the local IP you get from the android device after enabling wireless debugging and ensure adb is correctly connected to the end device.
  4. Start RenderDoc on the pc. Under the Replay Context drop down menu, the Samsung device should be visible. Select it. This starts running the Remote Server prompt on RenderDoc on the PC. The RenderDocCMD Application on the Android device starts up, but then later throws a warning saying this is an older version of the application and the connection between RenderDoc on the PC and Android application fails with RenderDoc on the PC reporting it was unable to start the Remote CMD application.
    image
    Clicking on Check for Update does not help or resolve the issue.
  5. Any further connection attempts from RenderDoc on the PC fail and I am unable to debug the application I'm developing on my android device.

Environment

  • RenderDoc version: 1.31
  • Operating Systems:
  • PC: Windows 11 23H2
  • Android: 14 (1st March 2024 Patch)
  • Graphics API: Opengl ES 3.2
    I also tried the latest nightly build with the same results.

Would be happy if you could please help me resolve this issue. RenderDoc is an integral part of the development and not being able to use it for debugging has put my Android app on hold. Should you require more information for debugging/helping resolve the issue, I would be glad to provide it. Thanks again,

@baldurk
Copy link
Owner

baldurk commented Mar 26, 2024

The update message is a known problem with the Android system, it is a false positive and can be safely ignored. It's not clear from your description - did you try clicking "OK"? It should go away and function as normal, and should only appear the first time you run a new RenderDoc version.

I don't know what the adb connect command does, it's possible it is broken or causing problems. Can you try disconnecting everything, shutting down all running RenderDoc and android programs on the PC, restarting the phone, and then starting up and plugging in the phone without running adb connect to see if that works? Clicking OK on the update popup if it appears.

@baldurk baldurk added Bug A crash, misbehaviour, or other problem Need More Info More information is needed from a user to work on this issue labels Mar 26, 2024
@unrealsid
Copy link
Author

Yes, I did click OK on the prompt, but if it's a false positive, then that likely won't be the root of the problem. And yes, it only appeared once.

adb connect is a command used to establish wireless debug connectivity to an Android device. It used to work until a few months ago. I managed to get the RenderDoc app working with my Android app after following your suggestions. It connects and works well with the device plugged in via USB.

I seem to be running into 2 other issues after this:
On RenderDoc on the PC, after setting the Replay Context to my Samsung S23U and having RenderDoc launch the mobile app I'm working on, I select the Capture Frames Immediately option in the PC app.
The app does some background work and captures frame data and I then get the following error:
image
I do see the Capture in the Captures Collected panel but the remote server disconnects.

I need to then reconnect RenderDoc on the PC via the Replay Context dropdown and it then seems to be able to open the Captured data well after that. It's not a major issue since I'm still able to read the data, but I'm wondering if there's a way around it?

And also, another thing I observed: I have two buttons in my app, they usually work well. One opens an Android System file picker and the other loads a mesh when clicked. But when I connect RenderDoc and I press either of the buttons, my app hangs. Is there an issue with clicking Android UI Buttons when RenderDoc is connected to Android?

Thanks

@cmannett85-arm
Copy link
Contributor

Hi @unrealsid, I too have had problems with wireless ADB - my hunch is that it doesn't play nice with the port forwarding RD sets up but I've not looked at it too closely.

I do see the Capture in the Captures Collected panel but the remote server disconnects.

As quick sanity check, can you increase the network timeout? Go to Tools->Settings->Core->Config Editor, then RemoteServer->TimeoutMS and up it to something stupid like 120000. If it works then we know that something is stalling the phone a little but otherwise it's working fine, if it still fails instantly then something at the capture end is likely crashing the server, and if still fails but takes two minutes longer to then we know something is stalling the device indefinitely (a real bug!).

Is there an issue with clicking Android UI Buttons when RenderDoc is connected to Android?

Not in principle but it depends on the what the buttons do. If they both behave the same and one of them only opens the OS file picker then I can't see how RD would affect that - it's only interested in graphics API interception after all. Is there anything interesting in logcat at the time of the button presses?

@unrealsid
Copy link
Author

Hello @cmannett85-arm, just to quickly confirm, after increasing the network timeout, you want me to connect via USB or via wireless ADB?

About the second issue, I'll check logcat and report my results back here shortly. Thanks.

@cmannett85-arm
Copy link
Contributor

USB ADB for now, let's tackle one problem at a time.

@unrealsid
Copy link
Author

I set the timeout to be 120000 seconds from the advanced settings and then proceeded as follows:

  1. I set the ReplayContext to my phone. RenderDoc runs some remote commands. Remote server is still connected.
  2. I launched the application via RenderDoc. The remote server was still connected.
  3. I proceeded to take a capture. RenderDoc on the PC hung for a bit and the remote server immediately got disconnected, The device data was captured, however.
  4. I then reselect my phone from the Replay Context in RenderDoc and I'm able to open the capture and view the information in it.

@unrealsid
Copy link
Author

Regarding the freezing issue, all I get is the following log info when I press a UI button in my app and RenderDoc is connected:

Activity reported stop, but no longer stopping
ANR in com.viewer.fbxviewer (com.viewer.fbxviewer/.MainActivity)
                 PID: 22171
                 Reason: Input dispatching timed out (com.viewer.fbxviewer.MainActivity (server) is not responding. Waited 10001ms for FocusEvent(hasFocus=false))
                 Parent: com.viewer.fbxviewer/.MainActivity

And the app then crashes and the remote server disconnects.
I don't really see a reason for the ANR in the log.

This isn't an issue when the objects I'm drawing on screen are loaded the moment the app loads. Which is how I'm testing content at the moment. Ideally, I'd be glad if I could test items as I load them on UI button presses.

Thanks.

@cmannett85-arm
Copy link
Contributor

RenderDoc on the PC hung for a bit and the remote server immediately got disconnected

Did the remote server disconnect at the start of the hang or at the end? In the device tab do you see the 'Capture in progress' progress bar filling up? What do you see in logcat during capture?

This isn't an issue when the objects I'm drawing on screen are loaded the moment the app loads.

Loaded from where to where? Is this is a Vulkan or GLES app?

@unrealsid
Copy link
Author

unrealsid commented Mar 29, 2024

After restarting the Android device, I'm seeing either of 3 things happening randomly each time I try a capture:

  • Yesterday, it automatically disconnected at the end of the hang.
  • Today, I had to manually shut the remote server down to finish the capture. Otherwise, RenderDoc would've remained permanently suspended.
  • And sometimes, I see RenderDoc capture device data successfully without a disconnect, but the remote server disconnects while trying to load the capture. RenderDoc then throws a Network I/O Operation failed error.

Yes, I see a 'Capture in Progress' bar fill up.

Also, Logcat has only the following info in mostly the first 2 cases:
failed to connect to socket 'localabstract:renderdoc_39920': could not connect to localabstract address 'localabstract:renderdoc_39920'
I don't know if this is relevant, but I see a lot of __rdoc_internal_android_logcat 345569 messages also.

The objects I'm drawing on screen are loaded from the disk using the Default file selector on Android and are drawn on screen. It is a GLES application. Thanks

@cmannett85-arm
Copy link
Contributor

cmannett85-arm commented Apr 2, 2024

I don't know if this is relevant, but I see a lot of __rdoc_internal_android_logcat 345569 messages also.

You can filter those out, they're internal messages used by RD.

failed to connect to socket 'localabstract:renderdoc_39920': could not connect to localabstract address 'localabstract:renderdoc_39920'

This is the remote server port, you shouldn't be getting connection failures down a wired connection...

Are you able to share your APK so we can try debugging it? Or create a simple equivalent that displays the same issue?

@unrealsid
Copy link
Author

Yes, I can create a simpler equivalent of that and send it across. But that will take me a bit of time.
I'll try to send it as soon as possible.

Do you have an Samsung S23 Ultra device on hand to test this?

Thanks.

@cmannett85-arm
Copy link
Contributor

Do you have an Samsung S23 Ultra device on hand to test this?

No, I have modern Samsung phones running the same Android version though, so if it's down to some funky Samsung bloatware on the device affecting RD I should be able to reproduce it.

The Samsung S23 Ultra uses a Qualcomm Adreno 740 GPU, so if your problem is down to an odd interaction between RD and the GP driver I'll struggle to help you.

@unrealsid
Copy link
Author

unrealsid commented Apr 9, 2024

Hello @cmannett85-arm, I've built a test app that somewhat mirrors my own production application. Where can I sent it to you? I do not want to put a download link in the comments. Thanks.

@cmannett85-arm
Copy link
Contributor

Thanks @unrealsid, you can send it to me in an email to camden.mannett@arm.com.

@cmannett85-arm
Copy link
Contributor

@unrealsid have you sent the email? I haven't received anything and there's nothing in my email quarantine.

@unrealsid
Copy link
Author

Hello @cmannett85-arm, sorry it's taking a while here. I had to make a few adjustments to the app before sending it and have been pretty caught up on other fronts.
I'll be making some time to make those adjustments and sending it to you in the next few days. Thanks for checking in. :)

@unrealsid
Copy link
Author

Hello @cmannett85-arm, I've sent you an email with a link to the app. Thanks

@cmannett85-arm
Copy link
Contributor

Hi @unrealsid, just letting you know we haven't forgotten about this. I tried your test app on a few different devices:

  • Pixel 7 (Mali G710, Android 13), works
  • Pixel 8 (Mali G715, Android 14), works
  • Samsung ZFlip (Adreno 650, Android 14), works
  • Samsung Galaxy A34 (Mali G68, Android 14), hangs!

However unlike your S23 Ultra, the A34 hangs immediately. Attaching a debugger, the OS fires a STOP signal at it once it has realised the app has frozen. Judging by the call stacks it looks like there are OpenGL ES calls coming from two different threads:
image
image

Both pass-through RD and both are stuck waiting for mutex, sadly without more debug info I can't know if they're waiting for the same mutex.

@unrealsid
Copy link
Author

Hey @cmannett85-arm, thanks for the updates. What more information would you require?

@cmannett85-arm
Copy link
Contributor

I've gotten a little further with this. RD uses a global OpenGL ES lock called glLock, and on my test device I see this issue on every run:

GLThread:
    glLock locked by glClear call:
        android::BufferQueueProducer::waitForFreeSlotThenRelock(android::BufferQueueProducer::FreeSlotCaller, std::unique_lock<…> &, int *) const
        android::BufferQueueProducer::dequeueBuffer(int *, android::sp<…> *, unsigned int, unsigned int, int, unsigned long, unsigned long *, android::FrameEventHistoryDelta *)
        android::Surface::dequeueBuffer(ANativeWindowBuffer **, int *)
        ...
        glClear 0x00000070014efbfc
        WrappedOpenGL::glClear(unsigned int) gl_draw_funcs.cpp:4572
        glClear_renderdoc_hooked(unsigned int) gl_hooks.cpp:167
        <unknown> 0x000000701e699914
        <unknown> 0x000000701e6997d0

RenderThread:
    glObjectLabelKHR blocked waiting for glLock:
        NonPI::MutexLockWithTimeout(pthread_mutex_internal_t *, bool, const timespec *) 0x00000070d67bb2bc
        Threading::CriticalSectionTemplate::Lock() posix_threading.cpp:95
        Threading::ScopedLock::ScopedLock(Threading::CriticalSectionTemplate<…> *) threading.h:39
        glObjectLabelKHR_renderdoc_hooked(RDCGLenum, unsigned int, int, const char *) gl_hooks.cpp:167
        set_khr_debug_label(GrGLGpu*, unsigned int, std::__1::basic_string_view<char, std::__1::char_traits<char>>) (.__uniq.111230615403708898952873255848304878871) 0x00000070c78d8708
        GrGLGpu::createTexture(SkISize, GrGLFormat, unsigned int, GrRenderable, GrGLTextureParameters::SamplerOverriddenState *, int, GrProtected, std::string_view) 0x00000070c78d6ec4
        GrGLGpu::onCreateTexture(SkISize, const GrBackendFormat &, GrRenderable, int, skgpu::Budgeted, GrProtected, int, unsigned int, std::string_view) 0x00000070c78d682c
        ...
        android::uirenderer::renderthread::RenderThread::threadLoop() 0x00000070c7557228
        android::Thread::_threadLoop(void *) 0x00000070bdde9310
        __pthread_start(void *) 0x00000070d67b9c30
        __start_thread 0x00000070d674da04

MainThread:
    Blocked waiting for a signal from the RenderThread:
        android::uirenderer::renderthread::DrawFrameTask::drawFrame()

The gist that MainThread is blocked waiting for the RenderThread which is blocked waiting for GLThread to release glLock.
GLThread is stuck because it is calling back into the platform when glClear is called but it's waiting on something before it can release glLock.

There's two questions to resolve:

  1. Why are multiple threads doing GL calls? @unrealsid are you using multiple graphics frameworks?
  2. What's consuming the android::BufferQueueProducer slots?

@unrealsid
Copy link
Author

Hello @cmannett85-arm:

  1. I'm only using OpenGL ES. In the extended version of the application. But I did notice that the OpenGL ES calls seem to be wrapped in Vulkan calls. I ran AGI and got this for a single draw call. Maybe it's related?
    image
  2. Would you need any kind of source code from me?

Also, are these issues happening when you press a button on the app?

@cmannett85-arm
Copy link
Contributor

  1. I wouldn't be surprised if on Adreno GLES is implemented in Vulkan in the driver, this isn't visible to RD but if AGI is getting all it's data from Perfetto then maybe the driver is reporting it's Vulkanness through that. You'll have to ask Samsung about that though
  2. Anything that might be relevant is always worth a look. You can send it to my email address and it'll remain private

Also, are these issues happening when you press a button on the app?

No, it happens on rendering start so we could be looking at two different issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A crash, misbehaviour, or other problem Need More Info More information is needed from a user to work on this issue
Projects
None yet
Development

No branches or pull requests

3 participants