INDUSTRY WATCH
Embedded Data Integrity
Building Fault Tolerance into Embedded Data Management
Maintaining data integrity in embedded applications while also ensuring 24/7 operation is a complex challenge, especially when the constraints of real-time performance are added.
DUNCAN BATES, BIRDSTEP TECHNOLOGY
Small resource-constrained applications are getting so complex that we can’t even start comparing the requirements from 10 years back with what we see today. Nonetheless there has been a trend of implementing homegrown data managing solutions for both volatile and persistent storage media. Building fault tolerance into these devices was not much of a requirement back then, but it is now increasingly being demanded by the end customers. Today we expect embedded applications to recover from any power failure in addition to operating 24/7 without system downtime.
No matter the fault tolerance requirement, building transactional or replication capabilities into the data management solution is a complex task and should only be attempted if you have time to spare, money to waste, and can live with an inferior application. The fact of the matter is that data management can get complicated. You may want to manage parts of your data on disk and other parts in memory, add an efficient data indexing subsystem, manage concurrent access to the data across multiple threads and applications, and implement an elaborate data caching system to avoid I/O overhead and increase performance—to mention a few. By now this amounts to a sophisticated piece of software if you need to cope with the fault tolerance scenarios for the embedded applications in addition to the other data requirements.
The first piece of the fault tolerance puzzle is the ability to recover from any application or power failure. Making sure that the data is not corrupt or that the loss of data hasn’t been too great is usually the first requirement. Data management solutions implement this by supporting the Atomic, Consistent, Isolated and Durable (ACID) transaction model. As a familiar example of the “Atomic” concept, consider what happens when you enter the bank and instruct the teller to move money from your checking account to savings. The money transaction breaks down into two operations, deduction from one account and addition to another, and it’s important to both parties that both happen or that neither happens. This is what’s meant by Atomic and is normally implemented through a data journaling system.

Figure 1 shows the current state of the database. Some of the information is in the consistent database image and the rest is in the transactional journal. But also note that the atomic operations are wrapped in Begin and Commit marks. “A” in ACID is ensured by having flushed the Commit mark to persistent storage.
The property of consistency is ensured by the data engine aborting transactions that break with any defined rules. Say you define a rule that the checking account can’t drop below $0. If the money transaction above violates this rule the system would automatically invalidate the transaction and revert back to the previous state of the database through the journal.
Now let’s add your spouse to the equation, who is tapping into an ATM to view the balance of your accounts just as you’ve instructed the teller to move the money. We will need to make sure that only the state of the two accounts prior to your request or the states after are visible. This is called transaction isolation and is also achieved by only offering the state of the database based on the image and the committed transactions.
Lastly, the durability property means we can’t accept the money transfer if power was lost or an application crash happened at an inconvenient time. There are different ways to accomplish this but simply put, we must at all times ensure that we have the information to either replay transactions or reinstate the old state of the database image. This process relies on quite a bit of disk I/O and disk cache flushing to guarantee crash recovery.
Implementing the four properties becomes quite complex and you will additionally want to have the flexibility of relaxing some of these properties for some operations in order to trade off safety with speed, etc., adding even more complexity to the transaction system that needs to be in play to build fault tolerance into your application.

Kontron
Interphase