Syntax Summary
Table 11-1 summarizes the syntax of regular expressions available in all versions of Tcl:
Table 11-1. Basic regular expression syntax.| . | Matches any character. | | * | Matches zero or more instances of the previous pattern item. | | + | Matches one or more instances of the previous pattern item. | | ? | Matches zero or one instances of the previous pattern item. | | ( ) | Groups a subpattern. The repetition and alternation operators apply to the preceding subpattern. | | | | Alternation. | | [ ] | Delimit a set of characters. Ranges are specified as [x-y]. If the first character in the set is ^, then there is a match if the remaining characters in the set are not present. | | ^ | Anchor the pattern to the beginning of the string. Only when first. | | $ | Anchor the pattern to the end of the string. Only when last. |
Advanced regular expressions, which were introduced in Tcl 8.1, add more syntax that is summarized in Table 11-2:
Table 11-2. Additional advanced regular expression syntax.| {m} | Matches m instances of the previous pattern item. | | {m}? | Matches m instances of the previous pattern item. Nongreedy. | | {m,} | Matches m or more instances of the previous pattern item. | | {m,}? | Matches m or more instances of the previous pattern item. Nongreedy. | | {m,n} | Matches m through n instances of the previous pattern item. | | {m,n}? | Matches m through n instances of the previous pattern item. Nongreedy. | | *? | Matches zero or more instances of the previous pattern item. Nongreedy. | | +? | Matches one or more instances of the previous pattern item. Nongreedy. | | ?? | Matches zero or one instances of the previous pattern item. Nongreedy. | | (?:re) | Groups a subpattern, re, but does not capture the result. | | (?=re) | Positive look-ahead. Matches the point where re begins. | | (?!re) | Negative look-ahead. Matches the point where re does not begin. | | (?abc) | Embedded options, where abc is any number of option letters listed in Table 11-5. | | \c | One of many backslash escapes listed in Table 11-4. | | [: :] | Delimits a character class within a bracketed expression. See Table 11-3. | | [. .] | Delimits a collating element within a bracketed expression. | | [= =] | Delimits an equivalence class within a bracketed expression. |
Table 11-3 lists the named character classes defined in advanced regular expressions and their associated backslash sequences, if any. Character class names are valid inside bracketed character sets with the [:class:] syntax.
Table 11-3. Character classes.| alnum | Upper and lower case letters and digits. | | alpha | Upper and lower case letters. | | blank | Space and tab. | | cntrl | Control characters: \u0001 through \u001F. | | digit | The digits zero through nine. Also \d. | | graph | Printing characters that are not in cntrl or space. | | lower | Lowercase letters. | | print | The same as alnum. | | punct | Punctuation characters. | | space | Space, newline, carrage return, tab, vertical tab, form feed. Also \s. | | upper | Uppercase letters. | | xdigit | Hexadecimal digits: zero through nine, a-f, A-F. |
Table 11-4 lists backslash sequences supported in Tcl 8.1.
Table 11-4. Backslash escapes in regular expressions.| \a | Alert, or "bell", character. | | \A | Matches only at the beginning of the string. | | \b | Backspace character, \u0008. | | \B | Synonym for backslash. | | \cX | Control-X. | | \d | Digits. Same as [[:digit:]] | | \D | Not a digit. Same as [^[:digit:]] | | \e | Escape character, \u001B. | | \f | Form feed, \u000C. | | \m | Matches the beginning of a word. | | \M | Matches the end of a word. | | \n | Newline, \u000A. | | \r | Carriage return, \u000D. | | \s | Space. Same as [[:space:]] | | \S | Not a space. Same as [^[:space:]] | | \t | Horizontal tab, \u0009. | | \uXXXX | A 16-bit Unicode character code. | | \v | Vertical tab, \u000B. | | \w | Letters, digit, and underscore. Same as [[:alnum:]_] | | \W | Not a letter, digit, or underscore. Same as [^[:alnum:]_] | | \xhh | An 8-bit hexidecimal character code. Consumes all hex digits after \x. | | \y | Matches the beginning or end of a word. | | \Y | Matches a point that is not the beginning or end of a word. | | \Z | Matches the end of the string. | | \0 | NULL, \u0000 | | \x | Where x is a digit, this is a back-reference. | | \xy | Where x and y are digits, either a decimal back-reference, or an 8-bit octal character code. | | \xyz | Where x, y and z are digits, either a decimal back-reference or an 8-bit octal character code. |
Table 11-5 lists the embeded option characters used with the (?abc) syntax.
Table 11-5. Embedded option characters used with the (?x) syntax.| b | The rest of the pattern is a basic regular expression (a la vi or grep). | | c | Case sensitive matching. This is the default. | | e | The rest of the pattern is an extended regular expression (a la Tcl 8.0). | | i | Case insensitive matching. | | m | Synonym for the n option. | | n | Newline sensitive matching . Both lineanchor and linestop mode. | | p | Partial newline sensitive matching. Only linestop mode. | | q | The rest of the pattern is a literal string. | | s | No newline sensitivity. This is the default. | | t | Tight syntax; no embedded comments. This is the default. | | w | Inverse partial newline-sensitive matching. Only lineanchor mode. | | x | Expanded syntax with embeded white space and comments. |
|