Setting a watchdog strategy is easy. Just enable the microcontrollers internal watchdog timer and setup an interrupt to occasionally clear the timer and keep the dog happy right? Not exactly. Watchdogs help ensure that the embedded system we are creating is robust and can detect if something runs amiss. The chances for something going wrong on the bench on a single unit are small but once production starts and thousands if not millions of devices are deployed into the field the chances that a product shattering hiccup will occur dramatically increases. There are quite a few decisions that need to go into selecting a watchdog strategy but the overall strategy can be determined by figuring out where the system needs lay in the figure below:
In the figure above, the x-axis represents the ability for the system to detect errors on its own with no error detection on the far left and high ability on the far right. The y-axis represents the expectation for the system to be able to recover from an error on its own. The further up the axis one goes, the higher the expectation that the system will recover from an error on its own. The lower on the axis, the higher the expectation that a human will intervene if something goes wrong.
Based on the general system needs, the chart can be broken up into four primary areas; autonomous, interactive, monitored and oblivious. Developers need to identify which region best represents their system in order to determine their watchdog strategy. Below are the definitions for each region.
Autonomous – these are system that are expected to operate on their own with no human interaction. These systems need to be able to detect if an error occurs reliably and recover from the error on its own.
Interactive – these are systems that are expected to detect errors but are not necessarily expected to recover on their own. In many circumstances an error may require a human inspection so the error must be detectable and then a human will interact with the system to resolve the issue.
Monitored – these are systems that will be watched constantly by a human while they are operating. In these systems, the human is the error detection system.
Oblivious – these are systems that are expected to recover on their own since a human being is not nearby but because of their design, they are unable to detect if an error has even occurred. No system should ever be in this category if a proper watchdog strategy is implemented. However, many teams that are in a hurry or don’t think through their design can accidentally end up in this category.
Once the desired watchdog strategy is determined, a developer can use a simple vien diagram to determine their options for watchdog implementation. The below diagram provides a basic idea for developers but the full details are beyond this basic discussion. Keep in mind that each strategy should follow watchdog best practices which can potentially become quite involved.
Using the two charts above, developers can get a feel for what they should be considering when developing their systems watchdog strategy. For example, an autonomous device will require a combination of all three strategies. Each strategy itself will go into much more detail and could consist of many layers in order to properly handle the possible error modes. Those details are for another time but for now, just identifying the right watchdog strategy is a step in the right direction.