RTC Magazine

Proceed to Website >>

Close Advertisement Close Advertisement


BROWSE ARTICLES BY TECHNOLOGY

DIGITAL EDITION

RTC Magazine Digital Edition

INDUSTRY NEWS

RECENT COMMENTS

  • Excellent article. It really seems like the IF-MAP SCADA components can be part of a comprehensive security solution. I'd like to try this out.

    Mattes - See Article

  • Excellent article, Right now I'm working in the development of an mobile robot, using a single-board RIO, is a very useful tool... you can have you...

    Juan Tapiero - See Article

  • Hi Steve - I apologize for the delay. Please contact me at meghan.kerry@ni.com. Also, you might be interested in our new, NI Single-Board RIO based...

    Meghan Kerry - See Article

  • Hello. Are there known technics to reduce the edge contact reststance to 0.1 C/W as mentioned in the article? Standard wedge lock resistance is mu...

    Rony Hitron - See Article

  • Product concept and product design of the zypad was developed for and with Eurotech by me and my design company, Lineaguida. www.lineaguida.com ...

    giuseppe mincolelli - See Article

WHITEPAPERS

QUICK DOWNLOADS

SOLUTIONS ENGINEERING

Solid State Drives

Extend SSD Lifetime Using the Network Database Model

Solid-State Drives are emerging as a replacement storage device for traditional hard drives and flash systems in embedded devices. Efficiently managing data on these devices is increasingly important to meet the application needs without increasing the size of SSDs or recalling due to ‘bad blocks’.

BY JOHN PAI, RAIMA DIVISION OF BIRDSTEP TECHNOLOGY

  • Page 1 of 3
    Bookmark and Share

Article Media

Solid-State Drives (SSDs) have evolved to become a viable option to replace rotating Hard Disk Drives (HDDs) in many embedded systems. This is because SSDs eliminate the single largest failure mechanism in many embedded systems—the moving parts of HDDs.

Despite the obvious need for these new technology trends, designers are already beginning to face a number of challenges as next-generation devices find their way into embedded applications. The most significant challenges include endurance, limited storage and storage management issues that affect product life and space utilization. Consequently, designers must properly arm themselves with accurate knowledge of these concerns and guidance for how to overcome the limited lifetime of a flash-based SSD and limited capacity of a RAM-based SSD due to the RAM cost.

Device Lifespan and Performance

When deciding on the appropriate SSD for a project, system designers basically have two practical options, the flashed-based SSD or RAM-based SSD.

System designs with flash-based SSD have various strategies to deal with write endurance management, but have the common issue of scoring how many times a block of memory has been written to, and then dynamically and transparently reallocating physical blocks to logical blocks in order to spread the load across the disk. In a well-designed flash SSD, the system would have to write the endurance number of cycles to the whole disk for it to be in any danger. 

Flash SSDs are not likely to continue performing at the same level as when first operated. That’s important to know, given the speed with which SSDs have proliferated in the marketplace amid claims that they’re faster, use less power and can be more reliable since there are no moving parts. Flash SSD performance and endurance are related because the management overhead of a flash SSD is related to how many writes and erases to the drive take place. The more write/erase cycles there are, the shorter the drive’s service life.

Flash memory cells are nominally guaranteed for only one million write cycles. Once the quota is reached, the disk can become unreliable. Special firmware or flash SSD controller chips help mitigate this problem with dynamic reallocation rather than rewriting files to a single location. 

Although less popular than its flash counterpart, the RAM-based SSD is significantly faster at both read and write operations. A typical RAM SSD does not face the same write cycle limitation as flash SSD because most of the I/O is performed in SSD RAM. The data is then copied from volatile memory to nonvolatile memory when instructed or when powering down. RAM SSDs are usually armed with their own batteries, which last long enough to preserve data in case the system unexpectedly powers off.

Two Data Management Strategies

Embedded system designers have a few basic options when deciding on data management strategy for embedded SSD devices. Currently, the most widespread data management model is a relational model. 

The relational model stores data in tables composed of columns and rows. When data from more than one table is needed, a joint operation relates these different data using a duplicate column from each table (Figure 1). While the relational model is flexible, performance is limited by the need to create new tables holding the results from relational operations, and storing redundant columns. Even when designed efficiently, there are several sources of overhead. The main source of overhead comes in the form of data duplication to help preserve the relational database integrity, and a need for a foreign key to efficiently manage relationships. The overhead results in excess in file size and extra I/O needed to perform basic database operation. Such overhead is especially expensive in both flash- and RAM-based SSD devices.

Figure 1
Relational Model (top). The cost of Relational Model as the database grows. (bottom)

Embedded systems designers can exploit the network database model for significant advancements in data management to mitigate the lifespan limitations on solid-state drives. The network model is conceived as a flexible way of representing objects and their relationships.  The network model predates the relational model and can be viewed as a superset. This implies that anything expressed in the relational model can be expressed in the network model, even SQL support. The main advantage is the way the relationships are modeled.

Discuss

  • Juergen Unruh
  • November 20, 2009
  • 12:21am

This article is full of inaccuracies about the supposed disadvantages of the relational database model. This is not surprising, given that the source is a vendor of a network model database. The author writes “…performance is limited by the need to create new tables holding the results from relational operations…” Hogwash. There is no such need implicit in the relational model. The fact that a database vendor would write this really casts doubt on their authority as any kind of subject matter expert. “…overhead comes in the form of data duplication to help preserve database integrity…” Relative to the overhead introduced by network model database pointers, this is a non-issue. Data duplication doesn’t “help preserve” database integrity, it is the manifestation of database integrity (i.e. referential integrity), which is completely _absent_ in the network model, and database integrity goes out the window along with the integrity of the database pointers. “Embedded systems designers can exploit the network database model for significant advancements in data management to mitigate the lifespan limitations on solid-state drives.” This is also a lot of baloney. Adding a row to a one-to-many relationship with the network model is guaranteed to require at least 3 write operations, and sometimes 4 (owner pointer, previous member’s ‘next’ pointer, the new ‘current’ member, and sometimes the next member’s ‘previous’ pointer). This assumes that the added row has no associated index, which is rarely, if ever, the case. Adding a row to a relationship with the relational model is guaranteed to require 2 write operations, maybe 4 if an index node has to be split (the row itself, and a slot of a node in the index b-tree). Reorganizing the entire tree is exceptionally rare in the real-world of random writes; of course you can cook up a worst-case scenario in the test lab... So, exactly how is it that the network model is capable of extending the life of the SSD when it exacerbates write cycles? The network model is a superset of the relational model, but pre-dates the relational model? Does this mean that the inventors of the network model were prescient about the relational model, and incorporated all of it into the network model? Of course, that’s ridiculous, and ignores the small problem that the network model violates several of Codd’s twelve rules. “The main advantage is the way the relationships are modeled.” Ja? Nein. Relationships are modeled exactly the same way. They are implemented very differently, and the implementation of the relations in the network model (via hard-wired database pointers) is its great disadvantage and the reason that relational databases dominate the market. “Cache optimization customizes the cache to be large enough…” Come on, that’s not optimization (i.e. some fancy thing the DBMS is doing), that’s simply a run-time configuration setting. Further, “…write the updated pages in each file in ascending order by offset in the file, which may also lengthen the service life of a flash SSD.” How so? A write is a write is a write. The SSD doesn’t care if you write blocks 5, 16, 438 or 16, 5, 438. Not one sentence of the paragraph on in-memory databases makes sense. “…keeping unnecessary write cycles in main memory.” Huh? What’s an unnecessary write cycle? I don’t think I want a DBMS that is executing unnecessary write cycles, in main memory or anywhere else.

  • Duncan Bates
  • December 14, 2009
  • 11:38pm

Dear Juergen, Thank you for your enthusiastic interest in this article. Please allow me the opportunity to address some of your concerns. Without going into great detail John is discussing the relational and network model in practical terms. When John writes “…performance is limited by the need to create new tables holding the results from relational operations…” he’s addressing the limitation that the relational model imposes in terms of table relationships. The relational model only involves two tables in a relationship, the primary and foreign table. Enforcing the relationship and maintaining its performance the foreign key is defined through mapping values back to the primary table. If the application use case needs to unite information from two relationships, something that’s fairly common, either the application or the access language most often are required to use temporary storage to execute the request. Allow me to illustrate this through a practical example, an iPod user wants’ to list all works contributed by an artist ordered alphabetically. Works may include music albums, movies, or other contributions. For the sake of this discussion let’s limit works to music and movies. Doing so will provide us with three tables; artist, music, and movies. In the relational model you’d need two relationships, one between the artist and music table and another between the artist and movie table both based on the artist primary key fields. Since the primary key fields need to be unique it’s not uncommon to represent this as an auto generate integer that sequentially increments. The result being that you now have an integer primary key on the artist table, an integer foreign key on the music table and a second integer foreign key on the movie table. Running the query expressed in the use case will break down to two queries, one on both relationships, where the results are unioned together and sorted in temporal storage. The use of temporal storage, RAM or disk, results in undesired performance degradation and resource consumption which was the point John was trying to making. With the network model you can express a single relationship between all three tables. This single relationship is pre-sorted simplifying the query which now just needs to scan the relationship and return the results. You also commented on John’s statement, “…overhead comes in the form of data duplication to help preserve database integrity…” You say that “Relative to the overhead introduced by network model database pointers, this is a non-issue. “ You are correct that there is additional overhead in the network model with the introduction of pointers. However since relationships are done through pointers vs. values the amount of overhead consumed between the two models is significantly different. Measurements done against public domain relational databases show space savings of 29% on 64 byte keys, and 29% space savings is huge when dealing with flash based SSD’s due to space reclaiming. It’s also huge in the sense that the 29% increase is there for a reason, it needs to be written and read, it needs to circle through the database cache etc. resulting in an overall effect on the application performance. Now putting a number in the equation is always risky since it’s down to the specific implementation, not the model itself, and in many cases a 64 byte key may be un-normal if you turn to the trick of replacing the actual unique information with an integer key as described above. In any case with the relational model the database will need to duplicate the primary information into the foreign table, and once again into the foreign key as long as it’s not implemented as a sparse index (and sparse indexing has another performance implication we’d like to avoid). “Completely _absent_ in the network model, and database integrity goes out the window along with the integrity of the database pointers.”, which is not true. Enforcing relationships through the network sets is just as prevalent as primary/foreign key relationships. Please also note that the network data model does not preclude the definition of unique keys. You go on saying; “this is also a lot of baloney. Adding a row to a one-to-many relationship with the network model is guaranteed to require at least 3 write operations, and sometimes 4 (owner pointer, previous member’s ‘next’ pointer, the new ‘current’ member, and sometimes the next member’s ‘previous’ pointer)”. In any database implementation the I/O intensive cycle is spent when writing and flushing pages of data, and in some implementations the change log information. Most network model databases will implement the owner, first and next pointers as part of the record structure, so when a new record and it’s relationship data is added to a table it requires a fixed number of page writes to the system. This being 2-4 pages depending on the distribution of its related records (previous, next and owner record). In the relational model in best case you have the same amount; you page the table, page 0 or statistics page for SQL optimization (as long as the engine support SQL and a heuristic query optimizer), and the single FK B-tree node that it belongs to. But as you are alluding to B-tree nodes split and rotate, which triggers a few more pages to be written and flushed. These splits and rotations are usually local involving a fewer number of pages, but in certain cases lager splits and rotations are done and the number of pages needing to be changed are a function of the size of the tree. So what John is getting at is that maintenance of a foreign key has a O(log(n)) performance characteristic while the network model relationship has a constant O(n) characteristic. I would agree that in real world modeling random writes triggers less B-tree reorganization, but in all practical terms it’s more common than uncommon to use auto incremented integer values as primary keys. Why? Well if you use the real world data you’ll end up with larger index trees since the size of the indexed fields increase. E.g. if you where to have a real world foreign key on a library book, in best case you end up with the authors social security number and the publisher company name which translates into approximately 40 bytes of foreign key information per book. Most relational database engines will not implement the key information as sparse by referencing the index information, they will duplicate them. The reason for this is performance, another level of indirection, especially on non-clustered data, is extremely inefficient when scanning flash based SSD’s. The drawback of duplication is decreased space utilization triggering other practical problems like RAM consumption, reclaiming cycles etc. This is the main reason for the widely accepted use of auto increment integers as identifications. I think we can both agree that auto incremented index values is the worst case scenario that you’re refereeing to as a “cooked up, in the lab scenario”. In other words, you can reduce the page writes by indexing random real-world data, but then again you’re increasing the amount of data that you need to manage due to duplication which results in added page writes, or you can auto generate sequence numbers reducing the storage utilization but triggering much more frequent B-tree rotations. This is exactly why John is stating that the network model can extend the lifetime of a Flash based SSD’s by reducing the number of erase and re-write cycles. Another thing you point out is; “Does this mean that the inventors of the network model were prescient about the relational model, and incorporated all of it into the network model? Of course, that’s ridiculous, and ignores the small problem that the network model violates several of Codd’s twelve rules.” No of course that’s not what John is stating, he’s simply stating that what can be modeled in the relational model can also be done in the network model, basically having primary/foreign key relationships in both. The advantage of the network model is that in addition to the key based relationships you can model with pointers, and it’s not uncommon to see SQL engines on top of network databases as you see with relational databases. You also say that “it’s a great disadvantage and the reason that relational databases dominate the market.” What’s a great disadvantage, that you’ve got more flexible modeling capabilities? I totally disagree that this is the reason why relational databases dominate the market, it has nothing to do with how relationships are being made but rather the close relationship the model has to SQL. The power of SQL is undisputable, guaranteeing a reply from a database on any dynamic SQL statement, is extremely powerful. But nothing in the SQL spec guarantees response times and resource utilization. In real-time embedded applications it’s common to find all or most use cases defined at development time where SQL in-fact adds an unnecessary layer of abstraction. ISAM and proprietary access methods are much more commonly used, even though most embedded database system offer a SQL interface (even if this is a network database). And yes I agree that “a write is a write is a write”, but where the write happens is important if you’re concerned about performance or flash erase cycles. A hybrid in-memory database allows you to split its data management between persistent storage and RAM storage. A good use case for this is if you have the room for a database index to be maintained in RAM while the data is on disk. The point John is making is that to ensure transactional consistency, ACID compliancy, we’re forced to get all the transactions onto disk. But as discussed above a B-Tree index is simply a ordered duplication of the data. If the database engine can support it, moving index management into RAM vs. managing it on disk will reduce unnecessary flash write cycles replacing them with RAM write cycles. This matters because now you just increased the application performance tremendously by utilizing RAM vs. Flash on information that can be re-created at application startup. I invite you to have a look at the following BLOG entry, http://www.raima.com/blog/, which discusses next generation databases. Duncan

LEAVE A COMMENT