“Although many embedded engineers are full of hopes and dreams, high-reliability code is not achieved overnight. It is an arduous process that requires developers to maintain and manage every bit and byte of the system. When an application is confirmed as “success”, there is usually a sense of relief, but just because the software is running normally at that moment under controlled conditions does not mean that it will be running normally tomorrow or a year from now.
Although many embedded engineers are full of hopes and dreams, high-reliability code is not achieved overnight. It is an arduous process that requires developers to maintain and manage every bit and byte of the system. When an application is confirmed as “success”, there is usually a sense of relief, but just because the software is running normally at that moment under controlled conditions does not mean that it will be running normally tomorrow or a year from now.
From a well-regulated development cycle to strict implementation and system inspection, there are many technologies for developing high-reliability embedded systems. This article introduces 7 easy-to-operate and long-lasting techniques that are very helpful for ensuring that the system runs more reliably and catching abnormal behavior.
Tip 1. Fill the ROM with known values
Software developers are often very optimistic people, as long as their code runs faithfully for a long time, nothing more. It seems quite rare for a microcontroller to jump out of the application space and execute in an unexpected code space. However, the chance of this happening is no less than a buffer overflow or a wrong pointer losing reference. It will happen! The behavior of the system after this happens is uncertain, because the memory space is 0xFF by default, or because the memory area is usually not written, the value may only be known to God.
However, there are quite complete linker or IDE skills that can be used to help identify such events and recover the system from them. The trick is to use the FILL command to fill the unused ROM with a known bit pattern. To fill unused memory, there are many different possible combinations that can be used, but if you want to build a more reliable system, the most obvious choice is to place ISR fault handlers in these locations. If something goes wrong in the system and the processor starts to execute code outside of the program space, it will trigger the ISR and provide an opportunity to store the processor, registers, and system state before deciding on corrective actions.
Tip 2: Check the CRC of the application
A great advantage for embedded engineers is that our IDE and tool chain can automatically generate application or memory space checksums (Checksum), so as to verify whether the application is intact based on this checksum. Interestingly, in many of these cases, the checksum is only used when the program code is loaded into the device.
However, if the CRC or checksum is kept in memory, verifying that the application is still intact when it is started (or even periodically for a long-running system) is an excellent way to ensure that unexpected things will not happen. Nowadays, the probability of a programmed application changing is very small, but considering the billions of microcontrollers delivered each year and the potentially harsh working environment, the chance of application crashes is not zero. More likely, a defect in the system may cause a flash write or flash erase in a sector, thereby destroying the integrity of the application.
Tip 3: Perform a RAM check at startup
In order to build a more reliable and solid system, it is very important to ensure the normal operation of the system hardware. After all, the hardware will fail. (Fortunately, the software never fails, the software will only do what the code wants it to do, whether it’s right or wrong). Verifying that there is no problem with the internal or external RAM at startup is a good way to ensure that the hardware can operate as expected.
There are many different methods that can be used to perform RAM checks, but the common method is to write to a known pattern and then wait a short period of time to read it back. The result should be that what you read is what you write. The truth is that in most cases the RAM check is passed, which is the result we want. But there is also a very small possibility that the check will not pass, which provides an excellent opportunity for the system to indicate a hardware problem.
Tip 4. Use Stack Monitor
For many embedded developers, the stack seems to be a rather mysterious force. When strange things started to happen, the engineers were finally stumped, and they began to think about what might be happening in the stack. The result is to blindly adjust the size and position of the stack and so on. But the error is often not related to the stack, but how can it be so sure? After all, how many engineers have actually performed the worst-case stack size analysis?
The stack size is statically allocated at compile time, but the stack is used in a dynamic manner. As the code is executed, the variables, return addresses, and other information needed by the application are continuously stored on the stack. This mechanism causes the stack to grow continuously in its allocated memory. However, this growth sometimes exceeds the capacity limit determined at compile time, causing the stack to destroy data in adjacent memory areas.
One way to absolutely ensure that the stack is working properly is to implement a stack monitor as part of the system’s “healthcare” code (how many engineers do this?). The stack monitor creates a buffer area between the stack and the “other” memory area and fills it with a known bit pattern. Then the monitor will continuously monitor whether there are any changes in the pattern. If the bit pattern changes, it means that the stack has grown too much, and the system is about to be pushed to the dark hell! At this time, the monitor can record the occurrence of events, system status, and any other useful data for future problem diagnosis.
A stack monitor is provided in most real-time operating systems (RTOS) or microcontroller systems that implement a memory protection unit (MPU). The scary thing is that these functions are turned off by default, or are often turned off deliberately by developers. A quick search on the Internet reveals that many people suggest closing the stack monitor in the real-time operating system to save 56 bytes of flash memory space. Wait, this is not worth the gain!
Tip #5-Use MPU
In the past, it was difficult to find a memory protection unit (MPU) in a small and cheap microcontroller, but this situation has begun to change. Microcontrollers from high-end to low-end already have MPUs, and these MPUs provide embedded software developers with an opportunity to greatly improve the robustness of their firmware.
The MPU has been gradually coupled with the operating system in order to establish a memory space, where the processing is separated, or the task can execute its code without worrying about being stomached on. If something happens, the uncontrolled processing will be cancelled and other protective measures will be implemented. Please pay attention to the microcontroller with this kind of component, if there is, please make more use of its characteristic.
Tip #6-Build a powerful watchdog system
One of the always favorite watchdog implementations you will often find is where the watchdog is enabled (this is a good start), but you can also use a periodic timer to Where the watchdog is cleared; the activation of the timer is completely isolated from any situation that occurs in the program. The purpose of using the watchdog is to help ensure that if an error occurs, the watchdog will not be cleared, that is, when the work is suspended, the system will be forced to perform a hardware reset for recovery. Using a timer independent of system activity allows the watchdog to remain cleared even if the system has failed.
Embedded developers need to carefully consider and design how to integrate application tasks into the watchdog system. For example, there is a technique that may allow each task that runs within a certain period of time to indicate that they can successfully complete its task. In this event, the watchdog is not cleared and is forced to be reset. There are also some more advanced technologies, such as the use of an external watchdog processor, which can be used to monitor how the main processor behaves, and vice versa.
For a reliable system, it is very important to establish a powerful watchdog system. Because there are too many techniques, it is difficult to fully cover in these few paragraphs, but for this topic, the author will publish related articles in the future.
Tip #7-Avoid volatile memory allocation
Engineers who are not accustomed to working in resource-constrained environments may try to use the features of their programming language, which allows them to use volatile memory allocation. After all, this is a technique often used in calculator systems. In calculator systems, memory is allocated only when necessary. For example, when developing in C, engineers may tend to use malloc to allocate space on the heap. There is an operation that will be executed. Once completed, you can use free to return the allocated memory for the use of the heap.
In a resource-constrained system, this can be a disaster! One of the problems with using volatile memory allocation is that wrong or improper technology can cause memory leaks or memory fragmentation. If these problems occur, most embedded systems do not have the resources or knowledge to monitor the heap or handle it properly. And when they happen, what happens if the application makes a request for space, but there is no requested space to use?
The problems caused by the use of volatile memory allocation are very complicated. To properly handle these problems, it can be said to be a nightmare! An alternative method is to directly simplify the allocation of memory in a static manner. For example, simply create a 256-byte buffer in the program instead of requesting a memory buffer of this size via malloc. This allocated memory can be maintained during the entire application life cycle, and there are no concerns about heap or memory fragmentation issues.
These are just some of the ways that developers can start building more reliable embedded systems. There are many other techniques, such as using good coding standards, bit flipping monitoring, performing array and pointer boundary checks, and using assertions. All these technologies are the secrets that allow designers to develop more reliable embedded systems.