Recovering from a system failure or a software glitch can be no easy task. The longer the fault occurs the harder it can be to identify and recover. The use of an external watchdog is an important and critical tool in the embedded systems engineer toolbox. There are five tips that should be taken into account when designing a watchdog system.
Tip #1 – Monitor a heartbeat
The simplest function that an external watchdog can have is to monitor a heartbeat that is produced by the primary application processor. Monitoring of the heartbeat should serve two distinct purposes. First, the microcontroller should only generate the heartbeat after functional checks have been performed on the software to ensure that it is functioning. Second, the heartbeat should be able to reveal if the real-time response of the system has been jeopardized.
Monitoring the heartbeat for software functionality and real-time response can be done using a simple, “dumb” external watchdog. The external watchdog should have the capability to assign a heartbeat period along with a window that the heartbeat must appear within. The purpose of the heartbeat window is to allow the watchdog to detect that the real-time response of the system is compromised. In the event that either functional or real-time checks fail the watchdog then attempts to recover the system through a reset of the application processor.
Tip #2 – Use a low capability MCU
External watchdogs that can be to monitor a heartbeat are relatively low cost but can severely limit the capabilities and recovery possibilities of the watchdog system. A low capability microcontroller can cost nearly the same amount as an external watchdog timer so why not add some intelligence to the watchdog and use a microcontroller. The microcontroller firmware can be developed to fulfill the windowed heartbeat monitoring with the addition of so much more. A “smart” watchdog like this is sometimes referred to as a supervisor or safety watchdog and has actually been used for many years in different industries such as automotive. Generally a microcontroller watchdog has been reserved for safety critical applications but given the development tools and the cost of hardware it can be cost effective in other applications as well.
Tip #3 – Supervise critical system functions
The decision to use a small microcontroller as a watchdog opens nearly endless possibilities of how the watchdog can be used. One of the first roles of a smart watchdog is usually to supervise critical system functions such as a system current or sensor state. One example of how a watchdog could supervise a current would be to take an independent measurement and then provide that value to the application processor. The application processor could then compare its own reading to that of the watchdog. If there were disagreement between the two then the system would execute a fault tree that was deemed to be appropriate for the application.
Tip #4 – Observe a communication channel
Sometimes an embedded system can appear to be operating as expected to the watchdog and the application processor but from an external observer be in a non-responsive state. In such cases it can be useful to tie the smart watchdog to a communication channel such as a UART. When the watchdog is connected to a communication channel it not only monitor channel traffic but even commands that are specific to the watchdog. A great example of this is a watchdog designed for a small satellite that monitors radio communications between the flight computer and ground station. If the flight computer becomes non-responsive to the radio, a command could be sent to the watchdog that is then executed and used to reset the flight computer.
Tip #5 – Consider an externally timed reset function
The question of who is watching the watchdog is undoubtedly on the minds of many engineers when using a microcontroller for a watchdog. Using a microcontroller to implement extra features adds some complexity and a new software element to the system. In the event that the watchdog goes off into the weeds how is the watchdog going to recover? One option would be to use an external watchdog timer that was discussed earlier. The smart watchdog would generate a heartbeat to keep itself from being reset by the watchdog timer. Another option would be to have the application processor act as the watchdog for the watchdog. Careful thought needs to be given to the best way to ensure both processors remain functioning as intended.
The purpose of the smart watchdog is to monitor the system and the primary microcontroller to ensure that they operate as expected. During the design of a system watchdog it can be very tempting to allow the number of features supported to creep. Developers need to keep in mind that as the complexity of the smart watchdog increases so does the probability that the watchdog itself will contain potential failure modes and bugs. Keeping the watchdog simple and to the minimum necessary feature set will ensure that it can be exhaustively tested and proven to work.
Most modern microcontrollers have a built-in hardware watchdog which can be configured to reset the microcontroller if not serviced within a specified amount of time. This essentially automates Tip #1, and can be very effective for any applications where there’s an appropriate place to do the servicing. (Such as a “main loop”.)
Thanks for the comment.
I probably wasn’t clear but when I’m referring to a smart watchdog this is an external system that helps to recover the system. You can’t necessarily in some systems rely on the internal watchdog timer that is located inside the microcontroller. So the tip is to have the microcontroller generate a heartbeat that can be monitor by the external “Smart Watchdog” and verify that the system timing appears to be correct and not glitching.
Could you, please, detail, in which cases the internal watchdog isn’t good enough?
In any case where you may be in a high radiation environment where the onboard clocks could stop.