INF: How to Determine When SQL Server Causes a Windows NT Blue ScreenID: Q170576
|
Infrequently Windows NT may halt with a STOP screen, commonly called a
"blue screen", or it may hard hang, where the console is completely frozen
and non-responsive. This may sometimes happen on a computer where SQL
Server is running, or may coincide with a particular SQL Server operation
such as a DUMP or LOAD, BCP, a long-running query, and so on.
The vast majority of time, this indicates an operating system, device
driver, or hardware problem and should be pursued as such. Windows NT user
or kernel mode process isolation ensures that a user mode application
problem will not cause the operating system to stop responding. This
article discusses exceptions to this and ways to determine whether to
troubleshoot the problem at the system or application layer.
Sometimes the cause of a machine hard hang or blue screen may be an
NMI (non-maskable interrupt) error. This is sometimes visible as an error
code stating NMI, parity check or I/O parity check. NMI errors are almost
always hardware. Usually they are caused by a memory failure but can
originate in other hardware subsystems such as video boards. Even if the
NMI error only happens during certain SQL Server operations, and if the
system passes initial hardware diagnostics, it should still be considered
a hardware problem and pursued as such. It may be necessary to use a
dedicated memory SIMM testing device which can often find a transient
memory error that eludes software-based diagnostics. For more information
see the Windows NT Resource Kit under the heading "Memory Problems", and
the following article in the Microsoft Knowledge Base:
Q101272 : "Memory Parity Errors: Causes and Suggestions"
Processes exist on Windows NT in either user mode or kernel mode (sometimes
called supervisor or privileged mode). In the Intel i386 architecture, user
mode maps to ring 3 and kernel mode to ring 0 of the 4-ring protection
system. The i386 architecture has been carried forward with little change
in all Intel and compatible processors to date, including the Pentium Pro
and Pentium II. RISC processors such as the Alpha AXP likewise typically
have unprivileged and privileged modes.
Kernel mode is a privileged processor mode in which a thread has access to
system-wide memory (including that of all user-mode processes) and to
hardware. By contrast, user mode is a nonprivileged processor mode in which
a thread can only access system resources by calling system services.
A user mode process cannot access kernel mode memory, nor can it access
memory of another user mode process. This is enforced by processor
hardware, in conjunction with kernel mode data structures such as Page
Tables. For information on this see the 80386 Programmer's Reference
Manual, the 80386 System Software Writer's Guide, or equivalent Alpha AXP
documentation.
As a result of this protection system, a user mode application generally
cannot stop responding, cause a blue screen, or otherwise cause a failure
in the Windows NT operating system. Such problems should be primarily
pursued at the system layer as an operating system, device driver, or
hardware issue.
While an application error cannot cause a failure in the operating system,
an operating system error can cause an application to stop responding. This
is because of the general rule: applications must call inward (to kernel
mode), but the operating system can reference outward to user mode freely
at any time. A microkernel-influenced architecture like Windows NT may in
turn dispatch certain work to a user-mode system process rather than
perform the work in kernel mode. However, the overall principal remains the
same: processor hardware enforces process context isolation, which prevents
one process from causing a failure in another, whether one or both are in
user mode.
If a user mode application passes an invalid parameter in a Win32 API call,
it is the operating system's responsibility to validate this parameter. In
very rare cases, passing an invalid parameter may cause a Windows NT blue
screen error. However, this is an operating system issue, and should be
debugged and pursued as such.
There are a few narrow exceptions to the above guidelines. These exceptions
can be easily and quickly eliminated:
Q111405 : SQL Server and Windows NT Thread Scheduling
Q166967 : Proper SQL Server Configuration Settings
Additional query words: hang hung bulk copy program crash app winnt perfmon
Keywords : kbenv kbinterop SSrvGen
Version : WINNT:6.5
Platform : winnt
Issue type : kbhowto
Last Reviewed: April 14, 1999