Memory Parity Errors: Causes and Suggestions

ID: Q101272


The information in this article applies to:

WARNING: The information in this article includes suggestions regarding the examination and cleaning of hardware. If you do not have chip maintenance experience, Microsoft recommends that you closely examine your hardware warrantee information to avoid invalidating any warrantee you may have and seek help from a trained hardware technician to avoid any damages to the hardware. ANY USE BY YOU OF THE INFORMATION PROVIDED IN THIS ARTICLE IS AT YOUR OWN RISK. Microsoft provides this information "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

SUMMARY

This article discusses the extensive study in determining the causes of some NMI Memory Parity Errors in Windows NT with the aid of a high tech SIMM tester. The results are not conclusive and the research into this is ongoing.


MORE INFORMATION

Both IBM OS/2 2.x and Window NT seem to experience problems which appear to be associated with system memory in some circumstances. It can be frustrating to have a system that is able to run DOS, Windows 3.1 or OS/2 1.x and suddenly find it cannot run Windows NT due to this problem. The first issue to clear up is that not all NMI errors are due to memory. Other boards in the system can cause this problem and components directly on the system motherboard can be at fault.

When memory is at fault, it is usually for the following reasons:

  1. The memory is not functioning at the specified access rate as required by the system board. If the system specification calls for 80 ns access rate, Windows NT most likely fails if memory is accessing at a slower rate such as 90 ns. Even though the chips may be marked as 80 ns, in testing, some fail to meet this access rate. Quite often memory chips run at a slower speed when they reach operating temperature. This produces an effect called "speed drift." The symptoms are a system which runs Windows NT when first turned on; however, after 15 minutes or so, the system starts having memory errors. A high quality SIMM tester can cycle the chips through various voltage and heat cycles, so this is fairly easy to see.


  2. The memory meets the system specifications, but the speeds are different between individual SIMM modules. The average access rate may be 70 ns on one SIMM module while the next is running at 60 ns. We have found SIMMs stamped at the factory to be rated at a 70 ns average access rate to actually be running as fast as 50 ns. Although the SIMMs are obviously well under the system required access specification, the difference of 10 ns or more between them can often cause problems on some systems. An interesting note here is that you can move these to a different system board which is using a different BIOS and chip set, and it may not have any memory problems. This is because each BIOS and chip set regulate the "refresh wait states" used for timing, and this difference often allows for variance in speed to be acceptable. If your system's BIOS allows you to adjust the "wait states" for memory refresh, this often will allow the system to run with SIMMs or DRAM memory chips which are running at different access rates. The downside to increasing the number of wait states is a slower system.


  3. The individual chips on the SIMM module are running at different access rates. This requires a sensitive memory testing device to determine. It must be able to gauge the access rate of each individual bit (chip) on the module. A difference of 10 ns or more between bits has been known to cause problems. This once again can be regulated somewhat by the BIOS and chip set of the system board if it allows you to lengthen the refresh wait states for memory access.


  4. One of the memory chips is being affected by "cell leakage." This ends up being a true parity error and is also known as a "soft error." This occurs when the change in the state of an individual cell (a zero or one) electrically leaks into a neighboring cell changing it's state. When the memory is read back, it no longer matches the parity bit's checksum value and an NMI is issued to the processor signaling a parity error has occurred. This memory SIMM must be replaced. If problems persist with replacement chips, there is quite possibly a voltage or heat anomaly occurring with the socket or circuitry which is damaging the chips.


  5. Cache memory is another thing to suspect. We have seen instances where the Cache memory access rates were too slow and caused enormous problems. On most Intel-based 486 computers, a 15 ns to 25 ns is normal. You will most likely have problems if it is slower than 25 ns. The system manufacturer can provide the specifications and locations of these chips.


In general, you should first carefully clean the system of dust. This includes the areas allowing ventilation so that heat does not build up abnormally. The contacts of all boards and SIMMs should be cleaned. You can use the eraser of a pencil to do this, thus ensuring good contacts. Be certain that all boards are firmly seated in their slots or sockets. It may be necessary to replace old cabling which may degrade over time and under high temperatures. Power supplies can also cause many problems, thus, if possible, have the output voltages checked. Monitors can cause strange behaviors on your system as well. It is also highly recommended that computers be placed on some type of Surge Suppression power strip since after a power outage occurs, the return of power back on is usually a fairly high surge and can permanently damage sensitive electrical components of your system.

Additional query words: prodnt


Keywords          : kbhw nthw 
Version           : 3.10 3.50 3.51 4.00
Platform          : WINDOWS 
Issue type        : 

Last Reviewed: January 13, 1999