ID: Q46749
6.00 6.00a 6.00ax 7.00 | 1.00 1.50
MS-DOS | WINDOWS
kbtool
The information in this article applies to:
- Microsoft C for MS-DOS, versions 6.0, 6.0a, and 6.0ax
- Microsoft C/C++ for MS-DOS, version 7.0
- Microsoft Visual C++ for Windows, versions 1.0 and 1.5
This article discusses some reasons why programs might produce different floating-point results when compiled with different compiler options.
The program below produces different results when complied using
cl -AM -FPi prog.c
than when using the following:
cl -AM -FPa prog.c
Part of the reason for the different results is that /FPa and /FPi generate
math routines that don't work the same. /FPi math emulates the 80x87, to
the point of actually converting 8-byte doubles to 10-byte internal format
and doing the math in internal format. /FPa uses an 8-byte format for
calculations; therefore, it is less accurate. This often accounts for
differences in results.
Also, the second number printed in the /FPi case is smaller than DBL_MIN, as defined in FLOAT.H. This situation is also correct because DBL_MIN is the smallest possible NORMALIZED value. (Normalized means that the high- order bit of the mantissa is a one.)
"Denormals" (numbers where there are zeros in some of the high-order bits of the mantissa), however, can represent numbers "x" in the ranges + DBL_MIN > x > 0 and 0 > x > -DBL_MIN. Although this is an unusual situation, it is not an error. A denormal is less precise than a normalized number; however, a denormal is still more precise than 0 (zero) (which is the next best representation). By allowing use of denormal numbers, we make our floating-point result slightly more accurate. The alternate math library (/FPa) represents denormal numbers as 0 (zero).
Another possible cause of differences in floating-point results is the inclusion or omission of the /Op option. When /Op is omitted, the compiler may skip storing intermediate results as 64-bit objects in memory, leaving them instead in the 80-bit registers of the 80x87 (or emulator package). This increases the speed and accuracy of the calculation. However, this can decrease the consistency of the calculations because other intermediate results may have been stored in 64-bit objects in memory anyway. Including /Op forces all intermediate results to be stored in memory, giving more consistent results. This option is often handy in programs involving complicated floating-point calculations.
The program and its output follow:
#include <stdio.h> // START OF PROG.C
#include <float.h>
void main(void)
{
double a,b,c,prod1,prod2;
_fpreset();
a=9.5788979e-283;
b=8.050847e-1;
c=9.5588526e-28;
prod1=a*b;
printf("\n product1 = %1.15le \n",prod1);
prod2=c*prod1;
printf("\n product2 = %1.15le \n",prod2);
} // END OF PROG.C
// RESULTS OBTAINED USING CL -AM -FPi PROG.C
product1 = 7.711824142152130e-283
product2 = 7.371619025195353e-310 // This value is less than DBL_MIN
// RESULTS OBTAINED USING CL -AM -FPa PROG.C
product1 = 7.711824142152130e-283
product2 = 0.000000000000000e+000
Additional reference words: kbinf 1.00 1.50 6.00 6.00a 6.00ax 7.00 8.00
8.00c
KBCategory: kbtool
KBSubcategory: CLIss
Keywords : kb16bitonly
Last Reviewed: July 18, 1997