Watchdogs are a critical and important component in developing a robust and fail-safe embedded system. In many circumstances, I run into development teams that either have not fully thought through their watchdog strategy or have their watchdog disabled. In order to improve system robustness and ensure that the watchdog can detect a system fault, here are ten simple tips developers can follow to improve their watchdog design.
- At start-up determine why the system is starting up. Was the start-up due to a watchdog timer reset? A brown out detection? An exception? User interaction or perhaps some other unknown cause? Logging this information can be crucial to debugging a system that occasionally misbehaves in sporadic and unpredictable ways.
- When selecting a microcontroller, select one that includes an independent watchdog. An independent watchdog has a clock that is generated independently from the system clock and improves the chances for detecting a failure if the system clock glitches or locks up
- Enable the watchdog timer early in the initialization sequence. The longer the watchdog is disabled, the greater the chances that when something goes wrong the watchdog will be unable to detect it
- Don’t blindly clear or pet the watchdog in an interrupt service routine! Create a watchdog task that monitors other software tasks and can determine whether the system health and wellness is acceptable before clearing the timer.
- For systems that are connected to the internet or that need to operate on their own without human intervention, consider adding an external watchdog that can periodically reset the microcontroller to clear any faults or errors that may have occurred.
- Use an external smart watchdog or supervisory processor to monitor that the microcontroller is behaving as expected.
- When using a smart watchdog, provide enough smarts so that it can monitor an external communication channel to receive a restart command and send a basic acknowledgement.
- Use a windowed watchdog whenever possible. A windowed watchdog will ensure that a system that fails by trying to continually clear the watchdog will trigger the watchdog. If the system fails by not clearing the watchdog within the specified window, the watchdog will reset the system.
- Set a specific period on external watchdogs and setup a heartbeat from the microcontroller that is generated only if the health and wellness for the processor is acceptable.
- Don’t forget that smart watchdogs are also microcontroller based systems and may need to have their own watchdog strategy such as enabling internal watchdog timers and adding external dumb watchdog timers to ensure they are operating correctly.
These are just a few simple steps developers can follow to ensure that their watchdog performs correctly. Can you think of any additional tips that developers should be following? If so, I would love to hear from you.