Unicode: Unicode and SQL*Plus (Windows)

NLS_LANG

To display characters properly SQL*Plus (character mode version) needs to know the character set being used to display its output on the client. A setting called NLS_LANG controls this. NLS_LANG has three components -- language, territory, and character set. Any one, two, or three of these components can be specified in an NLS_LANG settings. Here are some examples.

NLS_LANG Setting	Components That Are Set
American	Language
_America	Territory
.AL32UTF8	Character Set
American_America	Language, Territory
American.AL32UTF8	Language, Character Set
American_America.AL32UTF8	Language, Territory, Character Set

It is important to note the following.

NLS_LANG identifies the character set used by the client, not the server
NLS_LANG cannot be changed via an ALTER SESSION command (though the language and territory components can be overridden with ALTER SESSION SET NLS_LANGUAGE or ALTER SESSION SET NLS_TERRITORY)

To find the current NLS_LANG setting SQL*Plus uses a three step process upon startup. The process stops wherever a setting is found.

if an environment variable named NLS_LANG is set in the shell that SQL*Plus is called from then that setting is used
if NLS_LANG is set in the Windows registry then that value is used
- NLS_LANG registry settings are automatically created when certain Oracle products are installed
- the character set in NLS_LANG registry settings is typically an ANSI code page character set (see Unicode on Windows XP) for more information about Windows code pages
the default value US7ASCII is used

Windows Command Prompt

When you open a Windows XP Command Prompt window it displays characters using the sytsem's active OEM code page. If NLS_LANG is not set in the session's environment then SQL*Plus sessions started in this window will either use an ANSI code page character set as defined in the windows registry's NLS_LANG setting or, failing that, it will use US7ASCII. Neither choice is acceptable when working with Unicode characters. In the following example the query should return the string "abc-àèìòù©", but because NLS_LANG is not set in either the environment or the registry only US7ASCII characters are returned.

D:\Work>set NLS_LANG
Environment variable NLLANG not defined

D:\Work>chcp
Active code page: 850

D:\Work>sqlplus 

SQL> select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9') as text from dual ;

TEXT
----------
abc-aeiou?

To resolve this problem we need to manually set an NLS_LANG environment variable. It should be set to the OEM character set that matches the session's active OEM code page. In the following example the session is using code page 850. So, after consulting the Windows Code Pages table at Unicode on Windows XP, we know we need to set NLS_LANG to the WE8PC850 character set.

D:\Work>set NLS_LANG
Environment variable NLS_LANG not defined

D:\Work>chcp
Active code page: 850

D:\Work>set NLS_LANG=.WE8PC850

D:\Work>sqlplus

SQL> select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9') as text from dual ;

TEXT
----------
abc-àèìòù©

Now we see the accents and copyright symbol properly.

Displaying Unicode Characters

Unfortunately OEM character sets like WE8PC850 only define 255 characters, typically from a single language group. The query in the following example attempts to return the string "abc-àèìòù©-뮻뮼뮽". While the Western European characters are printed properly with WE8PC850 the Asian characters are not.

D:\Work>set NLS_LANG=.WE8PC850

D:\Work>sqlplus

SQL> select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9-\BBBB\BBBC\BBBD') as text from dual ;

TEXT
--------------
abc-àèìòù©-■■■

If we want to display more than 255 distinct characters at a time from a mix of languages no single OEM or ANSI character set will suffice. Since Unicode character sets contain more than 255 distinct characters one would expect that switching to a Unicode character set would resolve our problem. To make the switch we would need to do the following however.

find a monspaced Unicode font compatible with cmd.exe and then configure cmd.exe to use it
configure cmd.exe for Unicode encodings with the chcp command
set NLS_LANG to an Oracle Unicode character set

While steps 2 and 3 are possible there is no known Unicode font for step 1. Lucida Console is the only non-Raster choice on a default installation and its range of Unicode characters is far from complete.

As we will see shortly the Windows Powershell ISE provides a better alternative.

Cutting and Pasting Unicode Characters

Despite the fact that we cannot see all Unicode characters in a cmd.exe window we can still manipulate them in other ways. Here is how. First we display some Unicode characters on the screen.

D:\Work>chcp 65001

D:\Work>set NLS_LANG=.AL32UTF8

D:\Work>sqlplus

SQL> select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9-\BBBB\BBBC\BBBD') as text from dual ;

TEXT
--------------
abc-àèìòù©-⌷⌷⌷

Even though the query results appear garbled we can still cut and paste them into Unicode aware clients like Notepad or Excel 2007. Doing so yields text that looks like this in Notepad.

abc-àèìòù©-뮻뮼뮽

Note how the Asian characters display properly. To save the file consult the appropriate "Saving New Files" sections in either Unicode and Excel 2007 (Windows) or Uniocode and Notepad (Windows).

Saving Unicode Characters in Spool Files

Spool files will use whichever encoding is related to the NLS_LANG character set, regardless of the Command Prompt's CHCP setting. Since SQL*Plus cannot be started with NLS_LANG set to a 16 bit character set spool files can only be created with 8 bit encodings.

To better understand how SQL*Plus stores Unicode characters in spool files consider the following example.

D:\Work>chcp 65001

D:\Work>set NLS_LANG=.AL32UTF8

D:\Work>sqlplus
SQL>set trimout on
SQL>set trimspool on
SQL>spool test-utf-8.txt
SQL>select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9-\BBBB\BBBC\BBBD') as text from dual ;

TEXT
--------------
abc-àèìòù©-⌷⌷⌷

SQL>spool off

In the example above the file test-utf-8.txt contains UTF-8 encoded characters with no BOM character at the start of the file. The fact that the Asian characters are displayed as boxes in the Command Prompt windows has no effect on the spool file contents. They will be stored with the proper UTF-8 encoding and will be visible in UTF-8 aware tools.

Windows Powershell ISE

In the Unicode on Windows XP topic we learned that Powershell ISE displays Unicode characters better than cmd.exe windows. Unfortunately it appears Powershell ISE cannot run applications like SQL*Plus in interactive mode. It can however run SQL*Plus scripts that do not require user input.

Before we see how though it is important to note that .sql script files should be created and changed using an editor like Notepad with an ANSI encoding. Do not use Powershell ISE's Script Pane to create .sql files because it will create a UTF-16 (BE BOM) encoded file which SQL*Plus cannot read.

As with cmd.exe windows, to see Unicode characters properly in Powershell ISE we need to set the acitve OEM code page and the NLS_LANG environment variable before invoking SQL*Plus. Also, as with cmd.exe, spool files will be created using the NLS_LANG character set's encoding. Here is an example of a Powershell ISE session that displays some Unicode characters and saves them in a UTF-8 (no BOM) spool file.

PS D:\work\unicode> type test-powershell.sql
set trimspool on
set trimout   on
set echo      off
spool test-powershell.txt
select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9-\BBBB\BBBC\BBBD') as text from dual ;
spool off
exit ;

PS D:\work\unicode> chcp 65001
Active code page: 65001

PS D:\work\unicode> $env:NLS_LANG=".AL32UTF8"

PS D:\work\unicode> sqlplus -s scott/tiger "@test-powershell.sql"

SQL*Plus: Release 11.1.0.7.0 - Production on Tue Nov 3 16:39:59 2009

TEXT
--------------------------------------------------------
abc-àèìòù©-뮻뮼뮽

GUI SQLPlus and iSQLPlus

In past versions of Oracle two types of SQL*Plus other than the character mode version were available, GUI SQL*Plus (aka sqlplusw.exe, plus80W.exe, or plus33W.exe) and iSQL*Plus. GUI SQL*Plus used ANSI code pages and iSQL*Plus was fully Unicode enabled. Both are now obsolete. They were replaced by SQL Developer.

SQL Snippets ™: Tutorials for Oracle Developers

Unicode

Unicode and SQL*Plus (Windows)

NLS_LANG

Windows Command Prompt

Displaying Unicode Characters

Cutting and Pasting Unicode Characters

Saving Unicode Characters in Spool Files

Windows Powershell ISE

GUI SQLPlus and iSQLPlus

Linking to SQL Snippets ™

SQL Snippets ™: Tutorials for Oracle Developers

Unicode

Unicode and SQL*Plus (Windows)

NLS_LANG

Windows Command Prompt

Displaying Unicode Characters

Cutting and Pasting Unicode Characters

Saving Unicode Characters in Spool Files

Windows Powershell ISE

GUI SQL*Plus and iSQL*Plus

Linking to SQL Snippets ™

GUI SQLPlus and iSQLPlus