Chapter 15. Internationalization
This chapter describes features that support text processing for different character sets such as ASCII and Japanese. Tcl can read and write data in various character set encodings, but it processes data in a standard character set called Unicode. Tcl has a message catalog that lets you generate different versions of an application for different languages. Tcl commands described are: encoding and msgcat.
Different languages use different alphabets, or character sets. An encoding is a standard way to represent a character set. Tcl hides most of the issues associated with encodings and character sets, but you need to be aware of them when you write applications that are used in different countries. You can also write an application using a message catalog so that the strings you display to users can be in the language of their choice. Using a message catalog is more work, but Tcl makes it as easy as possible.
Most of the hard work in dealing with character set encodings is done "under the covers" by the Tcl C library. The Tcl C library underwent substantial changes to support international character sets. Instead of using 8-bit bytes to store characters, Tcl uses a 16-bit character set called Unicode, which is large enough to encode the alphabets of all languages. There is also plenty of room left over to represent special characters like and .
In spite of all the changes to support Unicode, there are few changes visible to the Tcl script writer. Scripts written for Tcl 8.0 and earlier continue to work fine with Tcl 8.1 and later versions. You only need to modify scripts if you want to take advantage of the features added to support internationalization.
This chapter begins with a discussion of what a character set is and why different codings are used to represent them. It concludes with a discussion of message catalogs.