Unicode

Unicode on UNIX

Note: If you are unfamiliar with Unicode see the brief primer at this link before continuing.

Information in this document was gathered from a Red Hat Enterprise Linux AS release 3 system. The behaviour of other systems may vary.

Locale

A UNIX session's character encoding is controlled via it's locale plus a set of Internationalization Variables that identify a users language, location, character encoding, and local preferences. The locale command will display a session's current locale settings. Sample output for a typical session follows.

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Values for locale environment variables are in the following format.

language[_territory][.codeset]

language is an ISO 639-1 code like "en" for English or "it" for Italian.

territory is an ISO 3166-1 country code like "US" for United States or "IN" for India.

codeset is a character encoding or character set name such as "UTF-8" or "ISO-8859-1". Many UNIX/Linux systems use UTF-8 as the default codeset for most locales.

Byte Order Mark

UNIX/Linux Unicode files do not use Byte Order Mark (BOM) characters.




Linking to SQL Snippets ™

To link to this page in Oracle Technology Network Forums or OraFAQ Forums cut and paste this code.

  • [url=http://www.sqlsnippets.com/en/topic-13408.html]SQL Snippets: Unicode - Unicode on UNIX[/url]

To link to this page in HTML documents or Blogger comments cut and paste this code.

  • <a href="http://www.sqlsnippets.com/en/topic-13408.html">SQL Snippets: Unicode - Unicode on UNIX</a>

To link to this page in other web sites use the following values.

  • Link Text : SQL Snippets: Unicode - Unicode on UNIX
  • URL (href): http://www.sqlsnippets.com/en/topic-13408.html