Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NimBLEDevice::init() deadlock problem #584

Open
wants to merge 1 commit into
base: release/1.4
Choose a base branch
from

Conversation

CW-B-W
Copy link

@CW-B-W CW-B-W commented Aug 19, 2023

NimBLEDevice::host_task() has priority (configMAX_PRIORITIES - 4), which is defined at

xTaskCreatePinnedToCore(host_task, "nimble_host", NIMBLE_HS_STACK_SIZE,
NULL, (configMAX_PRIORITIES - 4), &host_task_h, NIMBLE_CORE);

When the priority of NimBLEDevice::init() is greater than (configMAX_PRIORITIES - 4), NimBLEDevice::init() will block NimBLEDevice::host_task() but still wait for NimBLEDevice::host_task() infinitely (waiting for m_synced to be set), resulting in the deadlock between NimBLEDevice::init() and NimBLEDevice::host_task().

This is caused by this while loop

while(!m_synced){
taskYIELD();
}

When the priority of NimBLEDevice::init() is greater than NimBLEDevice::host_task(), it cannot yield and let NimBLEDevice::host_task() be executed.

Using a binary semaphore can resolve this problem.

When the priority of NimBLEDevice::init() is greater than
(tskIDLE_PRIORITY+1), NimBLEDevice::init() will block NimBLE host
task and wait for NimBLE host task infinitely
@h2zero
Copy link
Owner

h2zero commented Aug 24, 2023

I've never heard of or experienced this issue before, do you have any examples where this can be reproduced?

@CW-B-W
Copy link
Author

CW-B-W commented Aug 24, 2023

Hello @h2zero,

Here is the minimum reproducible example (it tested it with bc333cc),
In this example, it will never print Initialized.
It got stuck inside NimBLEDevice::init(), and the NimBLEDevice was never sucessfully initialized.
And after a few seconds the system will be reset by watchdog.

#include <NimBLEDevice.h>

void InitNimbleTask(void* args)
{
    Serial.println("Initializing NimBLEDevice");
    NimBLEDevice::init("NimBLE-Arduino");
    Serial.println("Initialized");

    NimBLEDevice::startAdvertising();

    vTaskDelete(NULL);
}

void setup() {
    Serial.begin(115200);

    // host_task has priority (configMAX_PRIORITIES - 4)
    const uint32_t priority = (configMAX_PRIORITIES - 3); 

    // host_task is pinned at core 0
    xTaskCreatePinnedToCore(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL, 0); 
}


void loop() {

}

As stated in the PR, I think it's because host_task has priority (configMAX_PRIORITIES - 4)

xTaskCreatePinnedToCore(host_task, "nimble_host", NIMBLE_HS_STACK_SIZE,
NULL, (configMAX_PRIORITIES - 4), &host_task_h, NIMBLE_CORE);

When NimBLEDevice::init() has priority greater than (configMAX_PRIORITIES - 4), the host_task starves.

And if we set the priority of InitNimbleTask to (configMAX_PRIORITIES - 4) or lower, it works again.


Some even trickier situation,

  1. If we add a print in the problematic loop

    while(!m_synced){
    taskYIELD();
    }

    while(!m_synced){
        Serial.println("YIELD");
        taskYIELD();
    }

    It works again!
    This is because Serial.println() make the NimBLEDevice::init() task go to block state, which gives host_task some time to be executed.

  2. If we don't pin InitNimbleTask at core 0
    If we created the task with xTaskCreate instead of xTaskCreatePinnedToCore, it magically works.

    const uint32_t priority = (configMAX_PRIORITIES - 3);
    xTaskCreate(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL);

    But if we change the priority to (configMAX_PRIORITIES - 2) or (configMAX_PRIORITIES - 1), it gets stuck again.

    const uint32_t priority = (configMAX_PRIORITIES - 2);
    xTaskCreate(InitNimbleTask, "InitNimbleTask", 8192, NULL, priority, NULL);

    I don't really know the reason behind, but I guess maybe it's because ESP32 has 2 cores, so the host_task may not be starved.

@h2zero
Copy link
Owner

h2zero commented Aug 25, 2023

Due to the nature of the BLE stack I would highly recommend using a task priority for the application that is less than the host task. Is there a specific purpose to have your application run at a higher priority than the host task?

@CW-B-W
Copy link
Author

CW-B-W commented Aug 26, 2023

Due to the nature of the BLE stack I would highly recommend using a task priority for the application that is less than the host task. Is there a specific purpose to have your application run at a higher priority than the host task?

Actually, I don't really need the priority of application be higher, I just accidentally found this problem when I set different task priority for my application.
I was porting this project to an MCU (other than Arduino, ESP32), and it uses polling to read user command from UART, one of the UART command is to turn BLE on.
To make the UART command handler be as responsive as possible, I made its priority the highest configMAX_PRIORITIES - 1, then I found when I launch NimBLEDevice::init() inside this command handler, which has the highest priority, it gets stuck.

@h2zero
Copy link
Owner

h2zero commented Nov 25, 2023

@CW-B-W That is a great explanation, thank you for this. I have tested your example and propose a different fix.

Instead of the semaphore, could you just replace the taskYIELD() in the while loop with ble_npl_time_delay(1) instead, that should resolve the issue as well without the extra memory requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants