INF: Understanding Bufwait and Writelog Timeout Messages

ID: Q167711


The information in this article applies to:


SUMMARY

When examining the SQL Server errorlog, you may periodically see the following types of messages:

bufwait: timeout, BUF_IO, bp 0xd37ab0, pg 0x5e74e, stat 0x1008/0x400003, obj 0x6d4f8c1b, bpss 0x2069e1f0
-or-
writelog: timeout, dbid 2, dbstat2 0x22(0x22), q1/q2 0/1, owner=3, waittype=0x81

This article explains the meaning of bufwait and writelog timeout messages.


MORE INFORMATION

SQL Server uses the asynchronous I/O calls ReadFile(), ReadFileEx(), WriteFile(), and WriteFileEx() when performing reads and writes. These Win32 API calls submit an I/O request to Windows NT Server, and then allow the application to continue with other operations while waiting for the operation to complete. At completion, it signals the application that the write has completed by means of a synchronization event or completion routine.

Both the bufwait and logwrite messages are actually warnings, and may not necessarily be a sign of any problem with the computer or SQL Server. If SQL Server has waited longer than the number of seconds configured in the 'resource timeout' setting (which by default is 10 seconds), it prints an error message indicating the timeout. However, it must continue waiting on the I/O request to complete before it can continue. The bufwait message corresponds to a delay in either reading a page from disk to put in cache, or flushing a page in cache out to disk. The logwrite message is specific to reading or writing log records.



SQL Server uses normal thread synchronization methods to guarantee ordered access to the transaction log. In a writelog timeout message, the database ID is supplied in the dbid field, as well as the process ID (owner) of the other connection on which it is waiting. In the sample message above, notice that the current process was using tempdb (dbid 2), and that it is waiting on the checkpoint process (owner=3). When investigating the cause of writelog timeouts, it may be beneficial to periodically query sysprocesses to gather information about what types of processing are being performed by the "owner" spid listed in the writelog message.

Heavy rollback activity adds additional log and disk I/O contention, and such activity has been observed as contributing to writelog timeouts. For best performance, attempt to minimize the number of transactions that must be rolled back; as activity on the server increases, using a query timeout that is too short may further exacerbate the problem by causing unnecessary rollbacks. Beginning with SQL Server 6.5 Service Pack 2, a new configuration option, LogLRU buffers, was added that may improve performance in environments requiring heavy log reads, such as would be required in processing a rollback or trigger. Note that using this option reserves space to hold log pages in cache, reducing the amount of data cache available. See the Readme.txt file included in the service pack for more information on determining an optimum value for this parameter.

If these messages appear only intermittently, they may not be indicative of any problem. However, if they appear on a fairly regular basis, it may be a sign that the disk subsystem is reaching capacity, and that you may benefit from spreading the I/O load over a larger number of controllers or disks, to reduce the length of time you have to wait for the operation to complete.

To determine what load the disk subsystem is handling, you can use Performance Monitor to watch the disk counters appropriate to your system. Monitor the %Disk Time counter for LogicalDisk and/or PhysicalDisk that correspond to your log and data devices; if the value is consistently high, consider ways to spread the I/O load, and look for indications that the disk subsystem is reaching its limits. You can perform additional fine tuning by using the %Disk Read Time and %Disk Write Time counters, which would allow finer control in spreading the I/O load across the disks and controllers. A large Disk Queue Length value indicates a large backlog in I/O requests, and you may want to investigate adding further disks or additional controllers to spread the I/O load. You can find additional information about the disk counters in the Windows NT Server documentation, the Windows NT Server Resource Kit, and the following article in the Microsoft Knowledge Base:

Q102020 : How to Monitor Disk Performance with Performance Monitor

While these messages result from delay in performing I/O, several SQL Server configuration values have a direct impact on the overall I/O throughput of the system: max async I/O; max lazywrite I/O; logwrite sleep; and resource timeout.

The 'max async I/O' configuration value determines the maximum number of outstanding asynchronous I/O requests that SQL Server will allow, throttling the number of pending requests. When an asynchronous I/O request is made, Windows NT Server must queue the request, which has an associated overhead. While asynchronous I/O requests can increase performance when used appropriately, a limit can be reached at which point further use may actually begin to degrade system performance. This limit largely depends on the controller, driver, and associated disk subsystem. If you persistently receive any of the above errors and have changed this setting from its default value, consider setting it back to its default value.



The 'max lazywrite I/O' parameter is closely related to the 'max async I/O' setting, but specifically throttles the number of outstanding requests made by the Lazywriter process. The Lazywriter process attempts to flush changed pages from cache to disk so that there is always some free space in data cache. The amount of space that the process attempts to keep free is controlled by the 'free buffers' configuration option.

The 'logwrite sleep' configuration setting is an advanced configuration option that forces an intentional delay before writing log pages to disk. Because the log pages must be physically committed to disk before data modifications are made, you generally want these writes to be performed as quickly as possible. In certain circumstances, you can enhance performance by intentionally forcing a delay so that multiple processes can "pack" log records on a single page and thus reduce the number of write operations. The amount of time used in this intentional delay is counted as part of the timeout period for the overall write operation, so a writelog timeout may be artificially induced by increasing the 'logwrite sleep' configuration setting. Changes to this configuration value should only be made in a controlled fashion, to ensure that performance is not hindered.



The 'resource timeout' configuration setting controls the overall timeout on the asynchronous I/O operations. By default, the setting is set to 10 (seconds), so any operation that takes longer than 10 seconds to complete results in a timeout warning. Setting this value too low may cause timeouts in an unrealistically short time. If you are frequently seeing bufwait or writelog timeouts, you can increase this setting to reduce the frequency with which these errors are generated.

Because SQL Server stresses the I/O subsystem so extensively, it is also important to verify that you are running with the latest versions of the drive and controller firmware, as well as the latest driver. Also consult your hardware manufacturer for any diagnostics they may have for stressing the disk subsystem.

If the above information does not help resolve the issue, you may want to consider stressing the system in a manner similar to that done by SQL Server. To assist you in doing this, a utility called SQLHDTST has been written. For more information about the utility and locations from which it can be downloaded, see the following article in the Microsoft Knowledge Base:
Q135582 : INF: SQL Server Utility Files Available

Additional query words: error message bufwait writelog configuration sp_configure


Keywords          : kbenv kbhw SSrvAdmin SSrvErr_Log SSrvGen 
Version           : WINNT:6.0 6.5
Platform          : winnt 
Issue type        : kbinfo 

Last Reviewed: April 9, 1999