www.gibmonks.com

  Previous section   Next section

Practical Programming in Tcl & Tk, Third Edition
By Brent B. Welch

Table of Contents
Chapter 11.  Regular Expressions


Syntax Summary

Table 11-1 summarizes the syntax of regular expressions available in all versions of Tcl:

Table 11-1. Basic regular expression syntax.
.Matches any character.
*Matches zero or more instances of the previous pattern item.
+Matches one or more instances of the previous pattern item.
?Matches zero or one instances of the previous pattern item.
( )Groups a subpattern. The repetition and alternation operators apply to the preceding subpattern.
|Alternation.
[ ]Delimit a set of characters. Ranges are specified as [x-y]. If the first character in the set is ^, then there is a match if the remaining characters in the set are not present.
^Anchor the pattern to the beginning of the string. Only when first.
$Anchor the pattern to the end of the string. Only when last.

Advanced regular expressions, which were introduced in Tcl 8.1, add more syntax that is summarized in Table 11-2:

Table 11-2. Additional advanced regular expression syntax.
{m}Matches m instances of the previous pattern item.
{m}?Matches m instances of the previous pattern item. Nongreedy.
{m,}Matches m or more instances of the previous pattern item.
{m,}?Matches m or more instances of the previous pattern item. Nongreedy.
{m,n}Matches m through n instances of the previous pattern item.
{m,n}?Matches m through n instances of the previous pattern item. Nongreedy.
*?Matches zero or more instances of the previous pattern item. Nongreedy.
+?Matches one or more instances of the previous pattern item. Nongreedy.
??Matches zero or one instances of the previous pattern item. Nongreedy.
(?:re)Groups a subpattern, re, but does not capture the result.
(?=re)Positive look-ahead. Matches the point where re begins.
(?!re)Negative look-ahead. Matches the point where re does not begin.
(?abc)Embedded options, where abc is any number of option letters listed in Table 11-5.
\cOne of many backslash escapes listed in Table 11-4.
[: :]Delimits a character class within a bracketed expression. See Table 11-3.
[. .]Delimits a collating element within a bracketed expression.
[= =]Delimits an equivalence class within a bracketed expression.

Table 11-3 lists the named character classes defined in advanced regular expressions and their associated backslash sequences, if any. Character class names are valid inside bracketed character sets with the [:class:] syntax.

Table 11-3. Character classes.
alnumUpper and lower case letters and digits.
alphaUpper and lower case letters.
blankSpace and tab.
cntrlControl characters: \u0001 through \u001F.
digitThe digits zero through nine. Also \d.
graphPrinting characters that are not in cntrl or space.
lowerLowercase letters.
printThe same as alnum.
punctPunctuation characters.
spaceSpace, newline, carrage return, tab, vertical tab, form feed. Also \s.
upperUppercase letters.
xdigitHexadecimal digits: zero through nine, a-f, A-F.

Table 11-4 lists backslash sequences supported in Tcl 8.1.

Table 11-4. Backslash escapes in regular expressions.
\aAlert, or "bell", character.
\AMatches only at the beginning of the string.
\bBackspace character, \u0008.
\BSynonym for backslash.
\cXControl-X.
\dDigits. Same as [[:digit:]]
\DNot a digit. Same as [^[:digit:]]
\eEscape character, \u001B.
\fForm feed, \u000C.
\mMatches the beginning of a word.
\MMatches the end of a word.
\nNewline, \u000A.
\rCarriage return, \u000D.
\sSpace. Same as [[:space:]]
\SNot a space. Same as [^[:space:]]
\tHorizontal tab, \u0009.
\uXXXXA 16-bit Unicode character code.
\vVertical tab, \u000B.
\wLetters, digit, and underscore. Same as [[:alnum:]_]
\WNot a letter, digit, or underscore. Same as [^[:alnum:]_]
\xhhAn 8-bit hexidecimal character code. Consumes all hex digits after \x.
\yMatches the beginning or end of a word.
\YMatches a point that is not the beginning or end of a word.
\ZMatches the end of the string.
\0NULL, \u0000
\xWhere x is a digit, this is a back-reference.
\xyWhere x and y are digits, either a decimal back-reference, or an 8-bit octal character code.
\xyzWhere x, y and z are digits, either a decimal back-reference or an 8-bit octal character code.

Table 11-5 lists the embeded option characters used with the (?abc) syntax.

Table 11-5. Embedded option characters used with the (?x) syntax.
bThe rest of the pattern is a basic regular expression (a la vi or grep).
cCase sensitive matching. This is the default.
eThe rest of the pattern is an extended regular expression (a la Tcl 8.0).
iCase insensitive matching.
mSynonym for the n option.
nNewline sensitive matching . Both lineanchor and linestop mode.
pPartial newline sensitive matching. Only linestop mode.
qThe rest of the pattern is a literal string.
sNo newline sensitivity. This is the default.
tTight syntax; no embedded comments. This is the default.
wInverse partial newline-sensitive matching. Only lineanchor mode.
xExpanded syntax with embeded white space and comments.


      Previous section   Next section
    Top