Note: If you are unfamiliar with Unicode see the brief primer at this link before continuing.
Oracle Character Sets
Unicode characters are stored in an Oracle database using one of Oracle's
Unicode character sets. An Oracle character set defines a group of characters
with the following common features.
- same region of the world
- same encoding
- same code point map
The following table contains a sample of Unicode character sets available in a Oracle database as well as some older character sets commonly used before Unicode was invented.
|Oracle Character Set Name
Character Set Name
||Can be Used as
|Database Character Set
||National Character Set
Database Character Set
Much of the following information was taken from Oracle Database Globalization Support Guide - Choosing a Character Set.
Oracle uses the "database character" set for:
- storing data in SQL CHAR datatypes (CHAR, VARCHAR2, CLOB, and LONG)
- storing table names, column names, and PL/SQL variable names
- storing SQL and PL/SQL source code.
The database character set is defined using the CREATE DATABASE statement. The database parameter NLS_CHARACTERSET identifies the database character set used on a given database.
The database character set must be a superset of either EBCDIC or 7-bit ASCII, whichever is the native character set on the platform. It is therfore not possible to use a fixed-width, multibyte character set as the database character set.
In some cases it is possible to change the database character set after the database has been created. In most of these cases a full export/import will be required to do the conversion.
National Character Set
The national character set is used for data stored inside NCHAR, NVARCHAR, NCLOB columns. It is defined using the CREATE DATABASE statement. The database parameter NLS_NCHAR_CHARACTERSET identifies the national character set used on a given database.
The national character set can be either UTF8 or AL16UTF16. The default is AL16UTF16.
Values returned by Oracle's UNISTR() function are always in the national character set.
Determining the Character Sets in Use for a Database
To determine the database character set and the national character set for a given database use the following query.
select parameter, value from nls_database_parameters where parameter like 'NLS%CHARACTERSET' ;
PARAMETER VALUE ------------------------------ --------------- NLS_CHARACTERSET AL32UTF8 NLS_NCHAR_CHARACTERSET AL16UTF16
The CHR(n) Function
Oracle's CHR function takes a numeric argument n and returns the VARCHAR2 value that is the binary equivalent of n in the database character set (or the national character set if USING NCHAR_CS is specified). Since one character set's binary representation of a character can differ from another character set's binary representation CHR is a poor choice for representing Unicode characters in SQL or PL/SQL code.
For example, say we wanted to write a query that returned values containing the copyright symbol © (U+00A9). If we were working on a database whose database character set was WE8ISO8859P1 a WHERE clause like this would work.
where description like '%' || chr(169) || '%'
However, if the code were later moved to a database with a database character set of AL32UTF8 the WHERE clause would no longer function properly. On an AL32UTF8 database the copyright symbol has a binary representation of 49833, not 169.
In place of CHR programmers can use the UNISTR function. This function's return value depends on an encoded string, not a binary value. Thus, a WHERE clause like the following will work on any database regardless of its database character set.
where description like unistr( '%\00A9%' )
The following query returns text in many popular languages. It can be used to test an application's support for Unicode characters.
select 'Arabic : '|| unistr( '\0627\0644\0639\0631\0628\064A\0629' ) || ' Chinese : ' || unistr( '\4E2D\6587' ) || ' English : ' || unistr( 'English' ) || ' French : ' || unistr( 'Fran\00E7ais' ) || ' German : ' || unistr( 'Deutsch' ) || ' Greek : ' || unistr( '\0395\03BB\03BB\03B7\03BD\03B9\03BA\03AC' ) || ' Hebrew : ' || unistr( '\05E2\05D1\05E8\05D9\05EA' ) || ' Japanese : ' || unistr( '\65E5\672C\8A9E' ) || ' Korean : ' || unistr( '\D55C\AD6D\C5B4' ) || ' Portuguese : ' || unistr( 'Portugu\00EAs' ) || ' Russian : ' || unistr( '\0420\0443\0441\0441\043A\0438\0439' ) || ' Spanish : ' || unistr( 'Espa\00F1ol' ) || ' Thai : ' || unistr( '\0E44\0E17\0E22' ) as unicode_test_string from dual ;
In an application that fully supports all the test characters the output will look like this.
Arabic : العربية Chinese : 中文 English : English French : Français German : Deutsch Greek : Ελληνικά Hebrew : עברית Japanese : 日本語 Korean : 한국어 Portuguese : Português Russian : Русский Spanish : Español Thai : ไทย
If any characters are missing or garbled in your application it may indicate the application does not use font substitution. Alternatively it may indicate the application does use font substitution but requires additional fonts.
You can also cut and paste the sample output above into non-database applications such as editors or terminal emulators to test the range of characters they support.