To display characters properly SQL*Plus (character mode version) needs to know the character set being used to display its output on the client. A setting called NLS_LANG controls this. NLS_LANG has three components -- language, territory, and character set. Any one, two, or three of these components can be specified in an NLS_LANG settings. Here are some examples.
|NLS_LANG Setting||Components That Are Set|
Language, Character Set
Language, Territory, Character Set
It is important to note the following.
- NLS_LANG identifies the character set used by the client, not the server
- NLS_LANG cannot be changed via an ALTER SESSION command (though the language and territory components can be overridden with ALTER SESSION SET NLS_LANGUAGE or ALTER SESSION SET NLS_TERRITORY)
To find the current NLS_LANG setting SQL*Plus uses a three step process upon startup. The process stops wherever a setting is found.
- if an environment variable named NLS_LANG is set in the shell that SQL*Plus is called from then that setting is used
if NLS_LANG is set in the Windows registry then that value is used
- NLS_LANG registry settings are automatically created when certain Oracle products are installed
- the character set in NLS_LANG registry settings is typically an ANSI code page character set (see Unicode on Windows XP) for more information about Windows code pages
- the default value US7ASCII is used
Windows Command Prompt
When you open a Windows XP Command Prompt window it displays characters
using the sytsem's active OEM code page. If NLS_LANG is not set in the
session's environment then SQL*Plus sessions started in this window will either
use an ANSI code page character set as defined in the windows registry's
NLS_LANG setting or, failing that, it will use US7ASCII. Neither choice is
acceptable when working with Unicode characters.
In the following example the query should return the string
abc-àèìòù©", but because
NLS_LANG is not set in either the environment or the registry only US7ASCII
characters are returned.
D:\Work>set NLS_LANG Environment variable NLLANG not defined D:\Work>chcp Active code page: 850 D:\Work>sqlplus SQL> select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9') as text from dual ; TEXT ---------- abc-aeiou?
To resolve this problem we need to manually set an NLS_LANG environment variable. It should be set to the OEM character set that matches the session's active OEM code page. In the following example the session is using code page 850. So, after consulting the Windows Code Pages table at Unicode on Windows XP, we know we need to set NLS_LANG to the WE8PC850 character set.
D:\Work>set NLS_LANG Environment variable NLS_LANG not defined D:\Work>chcp Active code page: 850 D:\Work>set NLS_LANG=.WE8PC850 D:\Work>sqlplus SQL> select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9') as text from dual ; TEXT ---------- abc-àèìòù©
Now we see the accents and copyright symbol properly.
Displaying Unicode Characters
Unfortunately OEM character sets like WE8PC850 only define 255 characters, typically from a single language group. The query in the following example attempts to return the string "abc-àèìòù©-뮻뮼뮽". While the Western European characters are printed properly with WE8PC850 the Asian characters are not.
SQL> select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9-\BBBB\BBBC\BBBD') as text from dual ;
If we want to display more than 255 distinct characters at a time from a mix of languages no single OEM or ANSI character set will suffice. Since Unicode character sets contain more than 255 distinct characters one would expect that switching to a Unicode character set would resolve our problem. To make the switch we would need to do the following however.
- find a monspaced Unicode font compatible with cmd.exe and then configure cmd.exe to use it
- configure cmd.exe for Unicode encodings with the chcp command
- set NLS_LANG to an Oracle Unicode character set
While steps 2 and 3 are possible there is no known Unicode font for step 1. Lucida Console is the only non-Raster choice on a default installation and its range of Unicode characters is far from complete.
As we will see shortly the Windows Powershell ISE provides a better alternative.
Cutting and Pasting Unicode Characters
Despite the fact that we cannot see all Unicode characters in a cmd.exe window we can still manipulate them in other ways. Here is how. First we display some Unicode characters on the screen.
D:\Work>chcp 65001 D:\Work>set NLS_LANG=.AL32UTF8 D:\Work>sqlplus SQL> select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9-\BBBB\BBBC\BBBD') as text from dual ; TEXT -------------- abc-àèìòù©-⌷⌷⌷
Even though the query results appear garbled we can still cut and paste them into Unicode aware clients like Notepad or Excel 2007. Doing so yields text that looks like this in Notepad.
Saving Unicode Characters in Spool Files
Spool files will use whichever encoding is related to the NLS_LANG character set, regardless of the Command Prompt's CHCP setting. Since SQL*Plus cannot be started with NLS_LANG set to a 16 bit character set spool files can only be created with 8 bit encodings.
To better understand how SQL*Plus stores Unicode characters in spool files consider the following example.
D:\Work>chcp 65001 D:\Work>set NLS_LANG=.AL32UTF8 D:\Work>sqlplus SQL>set trimout on SQL>set trimspool on SQL>spool test-utf-8.txt SQL>select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9-\BBBB\BBBC\BBBD') as text from dual ; TEXT -------------- abc-àèìòù©-⌷⌷⌷ SQL>spool off
In the example above the file
test-utf-8.txt contains UTF-8
encoded characters with no BOM character at the start of the file. The fact
that the Asian characters are displayed as boxes in the Command Prompt windows
has no effect on the spool file contents. They will be stored with the proper
UTF-8 encoding and will be visible in UTF-8 aware tools.
Windows Powershell ISE
In the Unicode on Windows XP topic we learned that Powershell ISE displays Unicode characters better than cmd.exe windows. Unfortunately it appears Powershell ISE cannot run applications like SQL*Plus in interactive mode. It can however run SQL*Plus scripts that do not require user input.
Before we see how though it is important to note that .sql script files should be created and changed using an editor like Notepad with an ANSI encoding. Do not use Powershell ISE's Script Pane to create .sql files because it will create a UTF-16 (BE BOM) encoded file which SQL*Plus cannot read.
As with cmd.exe windows, to see Unicode characters properly in Powershell ISE we need to set the acitve OEM code page and the NLS_LANG environment variable before invoking SQL*Plus. Also, as with cmd.exe, spool files will be created using the NLS_LANG character set's encoding. Here is an example of a Powershell ISE session that displays some Unicode characters and saves them in a UTF-8 (no BOM) spool file.
PS D:\work\unicode> type test-powershell.sql set trimspool on set trimout on set echo off spool test-powershell.txt select unistr('abc-\00E0\00E8\00EC\00F2\00F9\00a9-\BBBB\BBBC\BBBD') as text from dual ; spool off exit ; PS D:\work\unicode> chcp 65001 Active code page: 65001 PS D:\work\unicode> $env:NLS_LANG=".AL32UTF8" PS D:\work\unicode> sqlplus -s scott/tiger "@test-powershell.sql" SQL*Plus: Release 126.96.36.199.0 - Production on Tue Nov 3 16:39:59 2009 TEXT -------------------------------------------------------- abc-àèìòù©-뮻뮼뮽
GUI SQL*Plus and iSQL*Plus
In past versions of Oracle two types of SQL*Plus other than the character mode version were available, GUI SQL*Plus (aka sqlplusw.exe, plus80W.exe, or plus33W.exe) and iSQL*Plus. GUI SQL*Plus used ANSI code pages and iSQL*Plus was fully Unicode enabled. Both are now obsolete. They were replaced by SQL Developer.