Memory Parity Errors: Causes and Suggestions
ID: Q101272
|
The information in this article applies to:
-
Microsoft Windows NT operating system version 3.1
-
Microsoft Windows NT Advanced Server version 3.1
-
Microsoft Windows NT Workstation versions 3.5, 3.51, 4.0
-
Microsoft Windows NT Server versions 3.5, 3.51, 4.0
WARNING: The information in this article includes suggestions regarding the
examination and cleaning of hardware. If you do not have chip maintenance
experience, Microsoft recommends that you closely examine your hardware
warrantee information to avoid invalidating any warrantee you may have and
seek help from a trained hardware technician to avoid any damages to the
hardware. ANY USE BY YOU OF THE INFORMATION PROVIDED IN THIS ARTICLE IS AT
YOUR OWN RISK. Microsoft provides this information "as is" without warranty
of any kind, either express or implied, including but not limited to the
implied warranties of merchantability and/or fitness for a particular
purpose.
SUMMARY
This article discusses the extensive study in determining the causes of
some NMI Memory Parity Errors in Windows NT with the aid of a high tech
SIMM tester. The results are not conclusive and the research into this is
ongoing.
MORE INFORMATION
Both IBM OS/2 2.x and Window NT seem to experience problems which appear to
be associated with system memory in some circumstances. It can be
frustrating to have a system that is able to run DOS, Windows 3.1 or OS/2
1.x and suddenly find it cannot run Windows NT due to this problem. The
first issue to clear up is that not all NMI errors are due to memory. Other
boards in the system can cause this problem and components directly on the
system motherboard can be at fault.
When memory is at fault, it is usually for the following reasons:
- The memory is not functioning at the specified access rate as
required by the system board. If the system specification calls for
80 ns access rate, Windows NT most likely fails if memory is
accessing at a slower rate such as 90 ns. Even though the chips may
be marked as 80 ns, in testing, some fail to meet this access rate.
Quite often memory chips run at a slower speed when they reach
operating temperature. This produces an effect called "speed
drift." The symptoms are a system which runs Windows NT when first
turned on; however, after 15 minutes or so, the system starts having
memory errors. A high quality SIMM tester can cycle the chips
through various voltage and heat cycles, so this is fairly easy to
see.
- The memory meets the system specifications, but the speeds are
different between individual SIMM modules. The average access rate
may be 70 ns on one SIMM module while the next is running at 60 ns.
We have found SIMMs stamped at the factory to be rated at a 70 ns
average access rate to actually be running as fast as 50 ns.
Although the SIMMs are obviously well under the system required
access specification, the difference of 10 ns or more between them
can often cause problems on some systems. An interesting note here
is that you can move these to a different system board which is
using a different BIOS and chip set, and it may not have any memory
problems. This is because each BIOS and chip set regulate the
"refresh wait states" used for timing, and this difference often
allows for variance in speed to be acceptable. If your system's
BIOS allows you to adjust the "wait states" for memory refresh,
this often will allow the system to run with SIMMs or DRAM memory
chips which are running at different access rates. The downside to
increasing the number of wait states is a slower system.
- The individual chips on the SIMM module are running at different
access rates. This requires a sensitive memory testing device to
determine. It must be able to gauge the access rate of each
individual bit (chip) on the module. A difference of 10 ns or more
between bits has been known to cause problems. This once again can
be regulated somewhat by the BIOS and chip set of the system board
if it allows you to lengthen the refresh wait states for memory
access.
- One of the memory chips is being affected by "cell leakage." This
ends up being a true parity error and is also known as a "soft
error." This occurs when the change in the state of an individual
cell (a zero or one) electrically leaks into a neighboring cell
changing it's state. When the memory is read back, it no longer
matches the parity bit's checksum value and an NMI is issued to the
processor signaling a parity error has occurred. This memory SIMM
must be replaced. If problems persist with replacement chips, there
is quite possibly a voltage or heat anomaly occurring with the
socket or circuitry which is damaging the chips.
- Cache memory is another thing to suspect. We have seen instances
where the Cache memory access rates were too slow and caused
enormous problems. On most Intel-based 486 computers, a 15 ns to
25 ns is normal. You will most likely have problems if it is slower
than 25 ns. The system manufacturer can provide the specifications
and locations of these chips.
In general, you should first carefully clean the system of dust. This
includes the areas allowing ventilation so that heat does not build up
abnormally. The contacts of all boards and SIMMs should be cleaned. You can
use the eraser of a pencil to do this, thus ensuring good contacts. Be
certain that all boards are firmly seated in their slots or sockets. It may
be necessary to replace old cabling which may degrade over time and under
high temperatures. Power supplies can also cause many problems, thus, if
possible, have the output voltages checked. Monitors can cause strange
behaviors on your system as well. It is also highly recommended that
computers be placed on some type of Surge Suppression power strip since
after a power outage occurs, the return of power back on is usually a
fairly high surge and can permanently damage sensitive electrical
components of your system.
Additional query words:
prodnt
Keywords : kbhw nthw
Version : 3.10 3.50 3.51 4.00
Platform : WINDOWS
Issue type :
Last Reviewed: January 13, 1999