Uniocode and Notepad (Windows)

Opening Files

When Notepad opens a file it attempts to determine the encoding using an algorithm. The following chart lists test results observed when opening files with various encodings.

Encoding Byte Order Mark Test Data Result
cp850 abc-àèìòù© Fail
windows-1252 abc-àèìòù© Pass
UTF-8 abc-àèìòù©-뮻뮼뮽 Pass
UTF-8 UTF-8 BOM abc-àèìòù©-뮻뮼뮽 Pass
UTF-16LE abc-àèìòù©-뮻뮼뮽 Pass
UTF-16 LE abc-àèìòù©-뮻뮼뮽 Pass
UTF-16BE abc-àèìòù©-뮻뮼뮽 Pass
UTF-16 BE abc-àèìòù©-뮻뮼뮽 Pass

Note that a bug in the algorithm causes some files containing certain patterns of characters to be incorrectly opened as UTF-16  files (see Notepad (Windows) Unicode detection).

Saving New Files

By default files created from Notepad are saved using the system's current ANSI code page. However files can be saved in other encodings by changing the "Encoding:" field of the "Save As" dialog.

Windows "Save As" Dialog

The four choices listed map to these standard encoding names.

Windows Encoding Name
Standard Encoding Name
Byte Order Mark
active ANSI code page none
Unicode big endian

Files saved with encodings other than ANSI will have a Byte Order Mark (BOM) added to the beginning of the file. According to the Unicode standard a BOM is optional in UTF-8 files. However, some programs only expect UTF-8 files with no BOM. When such programs open a UTF-8 file that includes a BOM they may incorrectly display the BOM as these three printable characters: .

