Chapter 11. Regular Expressions
This chapter describes regular expression pattern matching and string processing based on regular expression substitutions. These features provide the most powerful string processing facilities in Tcl. Tcl commands described are: regexp and regsub.
Regular expressions are a formal way to describe string patterns. They provide a powerful and compact way to specify patterns in your data. Even better, there is a very efficient implementation of the regular expression mechanism due to Henry Spencer. If your script does much string processing, it is worth the effort to learn about the regexp command. Your Tcl scripts will be compact and efficient. This chapter uses many examples to show you the features of regular expressions.
Regular expression substitution is a mechanism that lets you rewrite a string based on regular expression matching. The regsub command is another powerful tool, and this chapter includes several examples that do a lot of work in just a few Tcl commands. Stephen Uhler has shown me several ways to transform input data into a Tcl script with regsub and then use subst or eval to process the data. The idea takes a moment to get used to, but it provides a very efficient way to process strings.
Tcl 8.1 added a new regular expression implementation that supports Unicode and advanced regular expressions (ARE). This implementation adds more syntax and escapes that makes it easier to write patterns, once you learn the new features! If you know Perl, then you are already familiar with these features. The Tcl advanced regular expressions are almost identical to the Perl 5 regular expressions. The new features include a few very minor incompatibilities with the regular expressions implemented in earlier versions of Tcl 8.0, but these rarely occur in practice. The new regular expression package supports Unicode, of course, so you can write patterns to match Japanese or Hindu documents!