XADM: Exchange DB and Caching Hard Disks and Controllers

ID: Q188589


The information in this article applies to:


SUMMARY

This article explains in detail how the use of write-caching hard disk and hard disk controllers can affect the transactional integrity of the Microsoft Exchange Server Exchange database engine.

Use of a write-caching disk controller can seriously jeopardize the normally reliable the Exchange database data integrity. Significant data corruption can result from a system failure when a write-caching controller without a extremely reliable battery backup is used. This type of controller can compromise the normally reliable Exchange database recovery mechanism.

Recent advances in hardware design coupled with a need for high disk performance on server platforms make it increasingly likely that an Exchange Server hardware platform uses a write-caching disk controller. It is advisable to determine whether a given Exchange Server computer has a write-caching controller, or whether the disk drives themselves contain a write cache. You should check with the hardware vendor for this information. Explain to the vendor that your system is to be used as a messaging server, and the write-ahead log mechanism of the message store's database generally requires that writes not be cached. If you plan to turn caching on the controller on for performance reasons, you must ensure that cached writes will not be lost in the case of a system failure. The controller must provide battery backup and other fault tolerance measures.

Generally to meet these criteria, the hardware write caching mechanism on the server must be designed with a messaging/database server in mind. It is technically possible for a hardware write cache to be safe for Exchange Server, but only if certain criteria is met by the hardware write cache design. Essentially all possible conditions that could result in the discarding of dirty or updated pages in the write cache must be considered and protected against.

Disk drive write caching is always considered dangerous and is not recommended.


MORE INFORMATION

The Exchange database engine's data modification statements generate logical page writes. This stream of writes can be pictured as going two places: the log and the database itself. For performance reasons, the Exchange database defers writes to the database through its own cache buffer system. Writes to the log are only momentarily deferred until COMMIT time. They are not cached in the same manner as writes to the database. Because log writes for a given page always precedes the page's writes to the database, the log is sometimes referred to as a "write-ahead" log.

Transactional integrity is one of the fundamental concepts of a relational database system. Transactions are considered to be atomic units of work that are either totally applied or totally rolled back. The Exchange database write-ahead transaction log is a vital component in implementing transactional integrity.

Any relational database system must also deal with a concept closely related to transactional integrity, which is recovery from unplanned system failure. A variety of non-ideal, real-world effects may cause this failure. On many database management systems, system failure may result in a lengthy human-directed manual recovery process.

In contrast, the Exchange database recovery mechanism is completely automatic and operates without human intervention. For example, Exchange Server could be supporting a mission-critical production application, and experience a system failure due to a momentary power fluctuation. Upon restoration of power, the server hardware would restart, networking software would load and initialize, and Exchange Server would restart. As Exchange Server[ASCII 146]s Exchange database initializes, it will automatically run its recovery process based on data in the transaction log. This entire process occurs without human intervention. Whenever the client workstations are restarted, users will find all of their data present, up to the last transaction they entered.

Exchange database transactional integrity and automatic recovery constitute a very powerful time-and-labor saving capability. Unfortunately, use of a write-caching disk controller can compromise the ability of the Exchange database to recover. Such a controller intercepts Exchange database transaction log writes, buffering them in a hardware cache on the controller board. This improves performance significantly, but if system failure occurs for any reason, the volatile data in the hardware cache may be lost, jeopardizing data integrity.

Most caching controllers perform write caching. The write caching function cannot always be disabled.

Even if the server uses an uninterruptible power supply (UPS), this does not guarantee the security of the cached writes. Many types of system failures can occur that a UPS does not address. For example, a memory parity error, an operating system trap, or a hardware glitch that causes a system reset can produce an uncontrolled system interruption. A memory failure in the hardware write cache can also result in the loss of vital log information.

Another possible problem related to a write-caching controller may occur at system shutdown. It is not uncommon, if the operating system is taking a long time to shutdown gracefully, to become impatient and "cycle" the machine manually. When the power to the machine is turned off or the RESET button is pressed before the operating system has shutdown completely, cached writes can be discarded, potentially damaging the database.

It is possible to design a hardware write cache that takes into account all possible causes of discarding dirty cache data, which would thus be safe for use by a database server. Some of these design features would include intercepting the RST bus signal to avoid uncontrolled reset of the caching controller, on-board battery backup, and mirrored or ERC (error checking and correcting) memory. Check with your hardware vendor to ensure that the write cache includes these and any other features necessary to avoid data loss.


Keywords          : XADM 
Version           : WINDOWS:4.0,5.0,5.5
Platform          : WINDOWS 
Issue type        : kbinfo 

Last Reviewed: April 20, 1999