If you have spent time developing a product or a DIY project in the embedded systems industry, you know that a lot of time is spent debugging. It’s not uncommon for developers to spend as much as 50% or more debugging their application code! If you are a full-time developer, that’s almost a thousand hours a year! If you ask me, that sounds like hell. The good news is that there are a lot of strategies you can follow to help reduce your debug times. Honestly, early in my career, I spent 80% of my time debugging. Today, it’s closer to 10%. Let’s explore five strategies you can employ to minimize the amount of time you spend debugging.
Strategy #1 – Don’t put the bugs in your code from the start
If you don’t put bugs in your software, you won’t have to spend the time to remove them. Not putting bugs in your code is often mentioned by Jack Ganssle if you begin to talk about debugging techniques. Jack has a great point. The number one way to reduce debugging time is to start by not putting bugs into the code!
Keeping bugs out of code is, of course, challenging. You can’t keep them all out, but you can minimize how many get into the code. Minimizing bugs starts with having reasonable, traceable requirements. If developers know what they are building and how it should behave, it’s easier to create those features.
Knowing your requirements and keeping them locked can help, but that’s no guarantee. Developers need to have well-defined processes that are designed to prevent bugs. If a bug does occur, we want to catch them as soon as possible. If I just added a line of code and I detect that I broke the system, I know exactly which line of code to look at.
Perhaps this first strategy can be more clearly stated: “Develop the processes and discipline to avoid injecting bugs into code”.
Strategy #2 – Use Test-Driven Development
Test-Driven Development is an exciting technique that came out of the Agile movement. The core idea is that before we write any production code, we write a test, make it fail, and then write the production code to make it pass. The process then repeats itself indefinitely.
Having a test that we prove fails and can detect a failure is a potent debugging tool. With every test created, we can rerun the old tests. If our new code broke something, the test would fail and directly point to what we broke. At that point, we know what code we just wrote that broke it and what we broke. It’s easy to see that we should be able to decrease the time spent debugging in these cases dramatically. In fact, debugging is reduced significantly.
Test-Driven Development can be applied to embedded systems. I’ve generally used it the most to test application code that I have decoupled from the low-level hardware. I can easily inject data, probe results, and verify that the code is working as expected through the decoupling. Test-Driven Development can also be used on-target for low-level driver and middleware development. I’ve found that this goes much slower due to the erase/program cycles. However, it can be a great way to build out and prove that drivers work under various conditions. The ultimate result is that we spend less time debugging.
Strategy #3 – Use an Emulator or Simulator
Emulators and simulators can be extraordinarily useful in reducing debug time. Debugging time is often increased due to the erase/program cycles involved in programming the embedded target. An emulator or simulator can be executed on a host environment, removing the need to program the embedded target. The result is that developers can quickly iterate, test, and debug their code.
A great example of how to use a simulator to debug an embedded system was given by Dave Nadler at the 2021 Embedded Online Conference. In his talk, “How to Get the Bugs Out of your Embedded Product”, Dave demonstrated how he used WxWidgets to build a simulator around his customer aerospace application code. As a result, he could run FreeRTOS on Windows and quickly find the issues the customer was experiencing.
Simulations don’t just need to leverage your code. For example, in my development efforts, I’ve often created a Python-based application that simulates the behavior of the embedded controller. The Python application can be written quickly to prove out requirements and how the system should behave. With customer buy-in, I then move to the embedded application and use the simulator as the requirements for how the system should respond based on its inputs and outputs. A simulator also gives the customer something to work and interface with early in a more complex system.
Strategy #4 – Trace your application code
I can’t emphasize enough how helpful tracing technology is for understanding an embedded system. A trace tool, like Percepio’s Tracealyzer, can help you know your code timing, CPU utilization, states, and much more. For example, when a system starts behaving weirdly, developers often jump in and randomly poke around to see where the problem might be. With a trace tool, developers can visualize what’s executing over time and see if they reach a choke point like a priority inversion, task starvation, or other issues.
There are a lot of trace technologies that developers can leverage, such as the serial wire viewer on Arm® Cortex®-M processors. On these trace interfaces, developers can send debug information that can minimize real-time interactions and help get to the root of a problem much quicker.
When I develop code, I enable tracing early in my development cycle. I also monitor how my application is behaving periodically. At a minimum, I will trace my code before I commit it. Frequent tracing can help me to understand how the system is behaving. If I suddenly see a significant change from the new code I’ve added, it acts as a red flag for me to investigate further. The red flag might be nothing more than the feature being CPU heavy, which may be okay. On the other hand, it might point to an issue with the implementation or a bug that was injected into the code.
Strategy #5 – Know the CPU registers and instruction set
Now and then, developers will encounter a superbug. A bug appearing out of nowhere results in a hard fault or other catastrophic behavior. The bug could be caused by a stack overflowing or an out-of-whack pointer and trying to execute code in some memory region that doesn’t exist. When this happens, developers must usually roll up their sleeves and dig deep into the microcontroller hardware.
Understanding the CPU, peripheral registers, and instruction set can be critical to resolving tricky bugs like these. For example, in How to Debug a Hard Fault on an Arm Cortex-M, I walk the reader through an example debug session I did with a customer to resolve a hard fault that they were experiencing. The type of bug encountered could have taken weeks of debugging effort if I was unfamiliar with the low-level hardware details.
Developers are never going to implement software without bugs. The systems we design and build today are just too complex. However, that doesn’t mean that we don’t have strategies and tools that we can use to minimize the time we spend debugging. As we’ve seen in this post, we can put in place processes to prevent most bugs from getting into the software. Still, when they do, we can use Test-Driven Development, Tracing, Simulator, and other techniques to help us minimize the time spent debugging.