Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32 watchdog interrupt (IDFGH-12714) #13699

Open
3 tasks done
zzff-sys opened this issue Apr 26, 2024 · 19 comments
Open
3 tasks done

ESP32 watchdog interrupt (IDFGH-12714) #13699

zzff-sys opened this issue Apr 26, 2024 · 19 comments
Assignees
Labels
Status: Opened Issue is new

Comments

@zzff-sys
Copy link

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

I want to use TWDT, enter the interrupt function when a watchdog timeout occurs, and execute some of the logic I want in the interrupt function (without using the print function). The interrupt function is declared using esp_task_wdt_isr_user_handler (). However, it seems that the timeout behavior does not enter the function, and repeated restarts occur. How do I correctly define a watchdog interrupt @svenfuchs @paulreimer @kumekay @

@espressif-bot espressif-bot added the Status: Opened Issue is new label Apr 26, 2024
@github-actions github-actions bot changed the title ESP32 watchdog interrupt ESP32 watchdog interrupt (IDFGH-12714) Apr 26, 2024
@zzff-sys
Copy link
Author

As I said above, I want to execute the contents of my interrupt function on timeout. I think at this point you should stay in the watchdog interrupt and not reset the behavior (is that understood correctly) 。Attached is my error log and program
log.txt
827a6a2ac1717b1a76f00e707a36c6b
a5b0d3eb508f70512c9722a1aa21ebb

@nopnop2002
Copy link

nopnop2002 commented Apr 26, 2024

What happens if you enable this?

test

@zzff-sys
Copy link
Author

我认为启用此功能可以解决问题。

测试

I've actually enabled this feature. But I wanted it to execute my interrupt program on timeout, and it didn't @nopnop2002

@nopnop2002
Copy link

nopnop2002 commented Apr 27, 2024

I confirmed that esp_task_wdt_isr_user_handler() is called in the code below.

void esp_task_wdt_isr_user_handler(void)
{
    esp_rom_printf("called esp_task_wdt_isr_user_handler\n");
}

test

I think at this point you should stay in the watchdog interrupt and not reset the behavior (is that understood correctly)

No.
esp_task_wdt_isr_user_handler() is called when a TWDT timeout occurs.
as TWDT timeouts are considered fatal errors, so it will eventually be reset or halted.

@zzff-sys
Copy link
Author

@nopnop2002 I see that your results did indeed execute the interrupt content, but my results indicate that the interrupt content is still not executed. Can you help me check if there are any design issues with the code snippet above. My phenomenon is still constantly reporting the information shown in the log。

@zzff-sys
Copy link
Author

@nopnop2002 Perhaps what you mean is that when a timeout occurs, it will enter an interrupt, and at the same time, a reset behavior will still occur, rather than staying in an interrupt. Am I correct in understanding this way

@zzff-sys
Copy link
Author

My current phenomenon is that I am able to enter the interrupt to execute the corresponding program, but still experience restart behavior. It seems that no matter what program is in my interrupt program, it will restart when the time is up. Can't we avoid this problem by entering the interrupt execution program instead of constantly restarting.
If the interrupt continues to reset continuously, I think the interrupt will be meaningless

@ESP-Marius
Copy link
Collaborator

My current phenomenon is that I am able to enter the interrupt to execute the corresponding program, but still experience restart behavior.

Yes this is correct, the callback registered by esp_task_wdt_isr_user_handler is just for adding extra functionality, after that function is called the watchdog ISR will continue its execution and do what it is intended to do, restart the chip.

If you disable CONFIG_ESP_TASK_WDT_PANIC then you might get a behavior to closer to what you want, then after executing your callback function it will simply print a backtrace and exit the interrupt.

Maybe if you explain in a bit more details what you are trying to achieve and why then we might be able to give some suggestion on how to do this.

@zzff-sys
Copy link
Author

zzff-sys commented Apr 28, 2024

My current phenomenon is that I am able to enter the interrupt to execute the corresponding program, but still experience restart behavior.

Yes this is correct, the callback registered by esp_task_wdt_isr_user_handler is just for adding extra functionality, after that function is called the watchdog ISR will continue its execution and do what it is intended to do, restart the chip.

If you disable CONFIG_ESP_TASK_WDT_PANIC then you might get a behavior to closer to what you want, then after executing your callback function it will simply print a backtrace and exit the interrupt.

Maybe if you explain in a bit more details what you are trying to achieve and why then we might be able to give some suggestion on how to do this.

  1. Firstly, thank you very much for your reply. I'm trying to understand restart behavior.
  2. Now that my goal is to perform a loop action in a watchdog interrupt, I use the following form. The final performance is still able to restart, but the restart time is not the timeout time we set 5S, but about 20S.
    void esp_task_wdt_isr_user_handler(void) { while(1) { /*I want to do something*/ } }
  3. At the same time, I found that reducing the startup timeout time (as shown below) could reduce the 20S time I observed to a certain extent. I wonder if there's a connection. At the same time, I hope you can tell me whether there is dog feeding behavior in the above start-up stage, and where can I
    image

@ESP-Marius
Copy link
Collaborator

Now that my goal is to perform a loop action in a watchdog interrupt

I understand, but I would like to know why you want to do this, and what you want to do in this loop action. Since it is not a super-common use-case I suspect you might be trying to do something that would be better achieved in another way 😄

The final performance is still able to restart

If you stay in the WDT ISR for too long it will eventually trigger the second stage timeout, which will force a system reset. As shown in the picture above that timeout time is 2x the normal timeout time, so 10s by default

@zzff-sys
Copy link
Author

现在我的目标是在看门狗中断中执行循环操作

我明白,但我想知道你为什么要这样做,以及你想在这个循环操作中做什么。由于这不是一个超级常见的用例,我怀疑您可能正在尝试做一些可以通过其他方式更好地实现的事情😄

最终演出仍能重启

如果在 WDT ISR 中停留时间过长,最终将触发第二阶段超时,这将强制系统复位。如上图所示,超时时间是正常超时时间的2倍,所以默认10s
1、My goal is: When the timeout behavior occurs, I want to send the error message through the can message, by performing the send behavior in the watchdog interrupt, which is a real-time loop process. The form of the code is shown below
void esp_task_wdt_isr_user_handler(void) { while(1) { /*send message*/ send_message(CAN_ERR_ID, Data, CAN_MSG_DATA_SIZE_BYTES); } }
2、So because I was in the interrupt for a long time causing phase two to run out of time?

@ESP-Marius
Copy link
Collaborator

  1. I see. In this case you will have to be careful with what is happening in send_message, it should not have any blocking behavior, etc.
  2. Yes, the second stage timeout is there to ensure that even if something goes wrong in the WDT interrupt, the system will eventually be able to reset itself.

Regarding your use-case: do you want the system to reset itself after sending the message (1), or are you just using it to log an error (2) ? If 2. then you could consider just setting a WDT triggered-flag, e.g. giving a semaphore from the callback (xSemaphoreGiveFromISR) and then handle the logging in a high priority task instead.

@zzff-sys
Copy link
Author

  1. 我懂了。在这种情况下,您必须小心 中发生的事情send_message,它不应该有任何阻塞行为等。
  2. 是的,第二阶段超时是为了确保即使WDT中断出现问题,系统最终也能够自行复位。

关于您的用例:您是否希望系统在发送消息后自行重置(1),或者您只是使用它来记录错误(2)?如果 2. 那么您可以考虑仅设置一个WDT triggered-flag,例如从回调 ( xSemaphoreGiveFromISR) 中给出信号量,然后在高优先级任务中处理日志记录。

1.Thanks again for your reply
2. My main idea was to send multiple message packets in real time when entering a watchdog interrupt, so I chose to use loops in the interrupt. However, only one packet can be sent, which seems to be stuck in the loop. I want to say whether it is related to the implementation and application mechanism of interrupts.
3. I tried the method of using semaphore for the second point you proposed, but the same phenomenon occurred.
4, a little doubt, when I enter the watchdog interrupt, I wait to receive the semaphore task is still useful? Doesn't seem to fit the freertos principle

@ESP-Marius
Copy link
Collaborator

ESP-Marius commented Apr 29, 2024

  1. This is possible, but it depends on the implemention of send_message, it should not do anything you cannot do in interrupts. E.g. it should not rely on freertos tasks, take locks, block, need interrupts that cannot preempt the current interrupt etc. So this should mostly only be done with very simple bare-metal functions.
  2. No, in the WDT interrupt you should not wait to receive any freertos communication like that. But xSemaphoreGiveFromISR should be OK I think (it is intended to be used from interrupts), as long as freerots continues running afterwards

@zzff-sys
Copy link
Author

  1. 这是可能的,但这取决于 的实现send_message,它不应该做任何你在中断中不能做的事情。例如,它不应该依赖于 freertos 任务、获取锁、阻塞、需要无法抢占当前中断的中断等。因此,这通常只能通过非常简单的裸机函数来完成。
  2. 不,在 WDT 中断中,您不应该等待接收任何像这样的 freertos 通信。但 xSemaphoreGiveFromISR我认为应该没问题(它旨在从中断中使用),只要 freerots 之后继续运行
  1. I can understand your first point
  2. Second, I think that the semaphore established in the interrupt can be sent through xSemaphoreGiveFromISR. But for the reception of the semaphore, I put it in a receive task, but if there is a watchdog timeout entry interrupt, can my receive task continue to run
    3.My pseudo-code is as follows, whether my design strategy is correct
b8e22b9ea0bbabd3b45a7e352be2514 f6cde10278e40d58ba81453375edb11

@ESP-Marius
Copy link
Collaborator

but if there is a watchdog timeout entry interrupt, can my receive task continue to run

If you use CONFIG_ESP_TASK_WDT_PANIC then no, the chip will immediately restart. If you do not use it then after the WDT interrupt freertos will continue running tasks. And if this error log task has a higher priority than whatever task is starving the watchdog then it should get an opportunity to run.

Your example looks correct to me.

@nopnop2002
Copy link

nopnop2002 commented Apr 29, 2024

@ESP-Marius

If you do not use it then after the WDT interrupt freertos will continue running tasks.

you are right,

I didn't know this.

I (110356) MAIN: counter=496 --> from main task
E (115316) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (115316) task_wdt:  - IDLE0 (CPU 0)
E (115316) task_wdt: Tasks currently running:
E (115316) task_wdt: CPU 0: main
E (115316) task_wdt: CPU 1: IDLE1
called esp_task_wdt_isr_user_handler ---> from esp_task_wdt_isr_user_handler
E (115316) task_wdt: Print CPU 0 (current core) backtrace


Backtrace: 0x400D6B43:0x3FFB0E00 0x400D6F08:0x3FFB0E20 0x40082F15:0x3FFB0E50 0x4000C04D:0x3FFB4D70 0x400D527A:0x3FFB4D90 0x400D52A6:0x3FFB4DB0 0x400E2C3C:0x3FFB4DD0 0x40085FD5:0x3FFB4E00
0x400d6b43: task_wdt_timeout_handling at /home/nop/esp-idf/components/esp_system/task_wdt/task_wdt.c:441
0x400d6f08: task_wdt_isr at /home/nop/esp-idf/components/esp_system/task_wdt/task_wdt.c:515
0x40082f15: _xt_lowint1 at /home/nop/esp-idf/components/xtensa/xtensa_vectors.S:1240
0x4000c04d: _xtos_return_from_exc in ROM
0x400d527a: WaitForTimeout at /home/nop/rtos/watchdog-task/main/main.c:22
0x400d52a6: app_main at /home/nop/rtos/watchdog-task/main/main.c:32 (discriminator 1)
0x400e2c3c: main_task at /home/nop/esp-idf/components/freertos/app_startup.c:208
0x40085fd5: vPortTaskWrapper at /home/nop/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134


I (115356) MAIN: counter=496 ---> continue running tasks

@zzff-sys
Copy link
Author

@ESP-Marius
Why do I feel like there is always only one piece of logic executed in void esp_task_wdt_isr_user_handler(void), and is there a task in the watchdog design mechanism that interrupts it

@ESP-Marius
Copy link
Collaborator

Why do I feel like there is always only one piece of logic executed in void esp_task_wdt_isr_user_handler(void), and is there a task in the watchdog design mechanism that interrupts it

Not sure I understand what you mean here. I dont think anything should interrupt it (before the second stage timeout triggers, if you are taking too long)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Opened Issue is new
Projects
None yet
Development

No branches or pull requests

4 participants