Extended Characters Different Under Windows

Last reviewed: January 15, 1998
Article ID: Q83461
3.00 3.10 WINDOWS kbprg

The information in this article applies to:

  • Microsoft Windows Software Development Kit (SDK) for Windows versions 3.1 and 3.0

SUMMARY

Applications in the Windows environment must typically deal with two different character sets: the ANSI (American National Standards Institute) character set and the OEM (original equipment manufacturer) character set. Conversely, applications in the MS-DOS environment must deal only with the OEM character set. This article describes how Windows deals with the ANSI and OEM character sets.

  1. When ALT+xxx is used to enter a character from the OEM character set into an application in the Windows environment that uses the ANSI character set, Windows displays the character in the ANSI character set that most closely matches the entered character.

  2. When a character from the OEM character set is entered into a file using a text editor under MS-DOS and the file is displayed under Windows, the character from the ANSI character set that has the same character code number as the OEM character is displayed.

MORE INFORMATION

OEM and ANSI Character Sets

MS-DOS uses the OEM character set. This character set varies between computers and depends on the code page ROM (read-only memory) installed by the computer manufacturer. For example, personal computers manufactured for use in the United States use a character set called code page 437, while computers manufactured for use in Portugal use code page 860. MS-DOS uses the OEM character set in applications and to create files and filenames.

For the most part, Windows uses fonts organized according to the ANSI character set (called ANSI-set fonts, in this article). Windows also supports fonts that use the same OEM character set that MS-DOS uses (called OEM-set fonts, in this article).

Character positions 32 through 127 are identical in the ANSI and OEM character sets for most code pages (including code pages 437, 850, 852, 860, 861, and 865). The remaining characters of the OEM character set (character positions 0 through 31 and 128 through 255) either do not appear in the ANSI character set, or exist at a different position in the ANSI character set. Therefore, some characters in the OEM character set cannot be displayed in Windows using an ANSI-set font. If an application must display such characters under Windows, an OEM- set font is required.

Typing ANSI and OEM Characters in Windows

In the Windows environment, a user can enter any character in the character set by holding down the ALT key and typing 0xxx, where "xxx" is the decimal number of the desired character position in the font. For example, with an ANSI-set font in use, ALT+0123 will display the 123rd character in the ANSI character set. Similarly, with an OEM-set font in use, ALT+0123 will display the 123rd character in the OEM character set.

In the MS-DOS environment, a user can enter any character in the OEM character set by holding down the ALT key and typing xxx (no leading zero), where "xxx" is the decimal number of the desired character position in the font.

If a user enters an MS-DOS OEM character set code (ALT+xxx) in an application for Windows that uses an ANSI-set font, Windows converts the OEM-set character to the character that most closely matches in the ANSI set. This conversion is governed by a mapping table that is installed with Windows. Because some OEM-set characters with positions greater than 127 do not exist in the ANSI character set, the result of the conversion in Windows may differ from the character in the OEM set. The OemToAnsi function uses the same mapping table to perform its character conversions.

For example, while OEM character-set code page 437 contains a square- root symbol at position 251, the ANSI character set does not contain this character. Consequently, when the user types an ALT+251 in an edit control that uses the ANSI character set, an underscore character appears because Windows defines the character mapping in this manner. As another example, the C-cedilla character exists in both the ANSI character set and in the OEM character-set code page 437. Therefore, typing ALT+128 in an edit control creates the desired C-cedilla character. Note that while the character exists in both character sets, its position is different in each set (128 in the OEM character set and 199 in ANSI). The alternative method to request a C-cedilla is to type ALT+0199, which specifies the character's position in the ANSI character set.

An edit control that uses the ES_OEMCONVERT style and a combo box that uses the CBS_OEMCONVERT style have a different behavior from that described above. These two styles cause their text contents to be converted from lowercase letters to uppercase letters, then from the ANSI set to the OEM set and then back to the ANSI set for display. This behavior is important for an edit control in which the user specifies a filename. If the user enters characters that do not exist in the underlying OEM character set, the name of the file will differ from the name specified by the user, which would be confusing. Because the characters are mapped into characters that exist in the OEM character set, the filename specified always matches the filename actually used. The contents are converted to uppercase characters because it is customary in some languages to eliminate diacritical marks when a character is in uppercase, and the OEM character set does not contain uppercase characters with these diacritical marks.

Displaying a String Containing OEM-Set Characters

in an Application that Uses the ANSI Character Set

Text editors running under MS-DOS use the OEM character set for display and in the files they create. When a Windows-based text editor loads a file that uses the OEM character set, the editor interprets the characters according to the ANSI character set. Character positions 32 through 127 are not affected under most code pages because both the ANSI and OEM character sets have identical characters. However, character positions greater than 127 may be displayed differently than in the MS-DOS-based text editor because the character positions represent different characters in the ANSI character set.

The solution to this difficulty is to use a Windows-based text editor that uses the ANSI character set when the text contains characters in both the OEM and ANSI character sets. A Windows-based editor accepts ANSI-set characters directly and converts OEM-set characters to the closest matching ANSI-set characters. The resulting text contains only ANSI-set characters, which can be displayed by any application running under the Windows environment. If an application must display OEM-set characters that are not in the ANSI character set, it must use an OEM- set font.

Consider the following example: An MS-DOS-based text editor is used to edit a application's resource file on a system with OEM character-set code page 437 installed. The user types ALT+129 as part of the static text to label a button in a dialog box. However, when the dialog box is displayed, the text is not as expected but contains a black rectangle where the u-umlaut character belongs. The black rectangle is used to signify character positions that are not defined in the ANSI character set.

To workaround to this problem is to edit the resource file with a Windows-based text editor that uses the ANSI character set. Typing ALT+129 will create a u-umlaut as desired because the editor will convert the OEM-set character to the closest matching ANSI-set character. In this case OEM-set character position 129 maps to ANSI- set character position 252. The alternative method to specify u-umlaut in the Windows-based editor is to type ALT+0252, using its ANSI character set character position directly.

As another example, an application requires the square-root symbol, which does not exist in the ANSI character set, as part of a button label. Assuming the code page 437 is installed, and that the resource file is edited under Windows, enter ALT+0251 in the button label because the square-root symbol is the 251st character of the OEM character set. When the application is run, send a WM_SETFONT message to the control, specifying an OEM-set font. An OEM-set font is always available from the GetStockObject function through its OEM_FIXED_FONT index.

For more information on code pages and character sets under Windows, query on the following words in the Microsoft Knowledge Base:

   prod(winsdk) and code and pages and character and sets

For a reference to a Windows Developer's Note regarding this subject, query on the following word in the Microsoft Knowledge Base:

   INTLAPPS


Additional reference words: 3.00 3.10 folding
KBCategory: kbprg
KBSubcategory: UsrLoc KBIntlDev
Keywords : kb16bitonly


THE INFORMATION PROVIDED IN THE MICROSOFT KNOWLEDGE BASE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. MICROSOFT DISCLAIMS ALL WARRANTIES, EITHER EXPRESS OR IMPLIED, INCLUDING THE WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL MICROSOFT CORPORATION OR ITS SUPPLIERS BE LIABLE FOR ANY DAMAGES WHATSOEVER INCLUDING DIRECT, INDIRECT, INCIDENTAL, CONSEQUENTIAL, LOSS OF BUSINESS PROFITS OR SPECIAL DAMAGES, EVEN IF MICROSOFT CORPORATION OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES SO THE FOREGOING LIMITATION MAY NOT APPLY.

Last reviewed: January 15, 1998
© 1998 Microsoft Corporation. All rights reserved. Terms of Use.