7 Steps to Create a Stack Monitor

One of the most painstaking bugs to hunt down in an embedded system is when the stack overflows its boundaries and starts to overwrite memory regions that are nearby. The symptoms of stack overflow usually appear randomly when just the perfect storm of interrupts and function calls are occurring which leads them to be difficult to detect. To prevent stack overflow through the use of a stack monitor, there are seven steps that developers can take to ensure the stack remains in its allocated memory region.

Step #1 – Perform a worst case stack analysis

Many compilers and tool chains will automatically set the stack size to 0x400 bytes which is equivalent to a kilobyte of RAM.  A stack size of a kilobyte is usually sufficient for many applications but this is computer science and not a guessing game so how can an engineer be sure that the stack is properly sized?  The answer is to perform a worst case stack analysis.

A worst case stack analysis can be performed in many different ways and is beyond the scope of this article.  In general, a developer has a number of items that need to be fully understood.  First, an understanding of the call depth of their application is necessary; how many functions are calling functions that call functions before returning back up the chain.  Each of the return addresses is stored on the stack.  Second, the developer needs to understand the number and size of each variable within those functions to estimate how much stack space each function will use.  Finally, the developer will need to determine how many interrupts could fire simultaneously along with the size of each interrupt frame.

Step #2 – Set the stack size

The output of the worst case stack analysis will result in the size that the stack should be.  Calculating stack size can be difficult and hard to do so despite a careful analysis of the system, it doesn’t hurt to multiply the final number by 1.5 just to make sure that there is a reasonable buffer included for unforeseen circumstances.  The stack size can then be either changed through the project properties or through the linker file depending on preference and tool capabilities.

Step #3 – Select a protection method

Properly sizing the stack is good progress towards preventing the stack from overflowing and clobbering nearby memory regions but it still doesn’t allow for detection of such an overflow event.  In an embedded system there are a number of ways to detect such an event.  The first, is to use a memory protection unit and set the boundary of the stack so that if the stack crosses the boundary an interrupt is fired and the system can then log the issue and follow procedure to recover the system.  Second, if a RTOS is in use, a developer can enable stack overflow detection.  Many RTOSs by default have this detection enabled but I have seen articles recommending turning this feature off to improve performance!  It is NOT recommended that developers disable stack overflow detection or else you may feel the cold embrace of a stack overflow bug.  Finally, in a resource constrained system where an MPU isn’t available or a RTOS in use, a developer can very easily create their own stack monitor.

Step #4 – Add guard section to the linker

A developer can create a stack guard section in a number of different ways but one useful way to specify the guard size and location is to use the linker file.  The linker file can be updated to include a guard size and location.  The size is completely arbitrary.  A rule of thumb is to make it large enough so that if an overflow were to occur it wouldn’t overflow the guard area.  An example of what the guard section might look like can be seen in Figure 1.

GUARD_SIZE = DEFINED(__guard_size__) __guard_size__:0x00000100;

.guard:
{
  .=ALIGN(8);
  FILL(0xC0DE);
  .+=GUARD_SIZE-1;
  BYTE(0xE)
}>m_data

Figure 1 – Example of what the linker may look like

Step #5 – Populate guard space with pattern

Creating a guard section is great but it isn’t terribly useful unless there is a known pattern populated in it.  The guard pattern can then later be checked by the application code.  Any pattern can be placed in the guard area but I’ve found it useful to use a pattern that is human readable.  The use of the pattern 0xC0DE is one of my favorites to use.  Figure 1 shows an example of what the populated guard area might look like.  The exact implementation will vary based on the toolchain that this used.

Step #6 – Periodically check the pattern

The application code should be setup to periodically check that the entire guard section still contains the correct pattern.  A change in the pattern will be caused by a stack overflow.  Application code for this check is relatively simple.  A developer just needs to loop through each pattern and verify that it is still correct.  Figure 2 shows an example loop using a pointer that is checking the stack guard fill pattern.  If a change were detected the application could then branch off and try to log the system stack and begin recovery procedures.

void main(void)
{
    uint32_t * GuardPtr = (uint32_t*) GUARD_START;
    
    for(int Index=0; Index < GUARD_SIZE; Index++)
    {
        if(*GuardPtr == 0xC0DE)
        {
           // Do Nothing or signal OK
        }
        else
        {
           // Flag error! Attempt recovery ...
        }
    }
}

Figure 2 – Example of what the guard application may look like

Step #7 – Test the guard

The final step to creating a stack monitor is of course to test it!  One of the best ways to test it is to write a small piece of code that will modify the stack guard pattern.  The periodic check of the stack guard should detect that the pattern has changed, an indication that the stack has overflown.

A tested stack monitor goes a long way towards improving the reliability and robustness of the system.  Once the stack monitored is able to detect the overflow additional application code is necessary to decide what to do with that information.  Logging the call depth, register values and application state will help a developer repeat the overflow and discover the root cause.

Final thoughts

The stack is often overlooked by developers when they start software development.  Stack overflow is one of those difficult to find bugs unless developers make the effort to monitor for it.  Detecting a stack overflow isn’t difficult and the minor performance hit of a monitor is well worth it!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.