Unicode and Internationalization
The effect of Unicode on Tcl scripts is actually very limited. There is a new backslash sequence, \uXXXX, that specifies a 16-bit Unicode character. There are also facilities to work with character set encodings and message catalogs.
The Tcl I/O system supports character set translations. It automatically converts files to Unicode when it reads them in, and it converts them to the native system encoding during output. The fconfigure -encoding option can be used to specify alternate encodings for files. This option is described on page 209.
The encoding Command
The encoding command provides access to the basic encoding mechanism used in Tcl. The encoding convertfrom and convertto operations convert strings between different encodings. The encoding system operation queries and sets the encoding used by the operating system. The encoding command is described on page 212.
The msgcat Package
Message catalogs are implemented by the msgcat package, which is described on page 216. A message catalog stores translations of user messages into other languages. Tcl makes message catalogs easy to use.
UTF-8 and Unicode C API
The effects of Unicode on the Tcl C API is more fundamental. Tcl uses UTF-8 to represent Unicode internally. This encoding is compatible with ASCII, so Tcl extentions that only pass ASCII strings to Tcl continue to work normally. However, to take advantage of Unicode, Tcl extensions need to translate strings into UTF-8 or Unicode before calling the Tcl C library. There is a C API for this. An example of its use is shown on page 629.