www.gibmonks.com

  Previous section   Next section

Practical Programming in Tcl & Tk, Third Edition
By Brent B. Welch

Table of Contents
Chapter 4.  String Processing in Tcl


The binary Command

Tcl 8.0 added support for binary strings. Previous versions of Tcl used null-terminated strings internally, which foils the manipulation of some types of data. Tcl now uses counted strings, so it can tolerate a null byte in a string value without truncating it.

This section describes the binary command that provides conversions between strings and packed binary data representations. The binary format command takes values and packs them according to a template. For example, this can be used to format a floating point vector in memory suitable for passing to Fortran. The resulting binary value is returned:

binary format template value ?value ...?

The binary scan command extracts values from a binary string according to a similar template. For example, this is useful for extracting data stored in binary format. It assigns values to a set of Tcl variables:

binary scan value template variable ?variable ...?

Format Templates

The template consists of type keys and counts. The types are summarized in Table 4-6. In the table, count is the optional count following the type letter.

Table 4-6. Binary conversion types.
aA character string of length count. Padded with nulls in binary format.
AA character string of length count. Padded with spaces in binary format. Trailing nulls and blanks are discarded in binary scan.
bA binary string of length count. Low-to-high order.
BA binary string of length count. High-to-low order.
hA hexadecimal string of length count. Low-to-high order.
HA hexadecimal string of length count. High-to-low order. (More commonly used than h.)
cAn 8-bit character code. The count is for repetition.
sA 16-bit integer in little-endian byte order. The count is for repetition.
SA 16-bit integer in big-endian byte order. The count is for repetition.
iA 32-bit integer in little-endian byte order. The count is for repetition.
IA 32-bit integer in big-endian byte order. The count is for repetition.
fSingle-precision floating point value in native format. count is for repetition.
dDouble-precision floating point value in native format. count is for repetition.
x

Pack count null bytes with binary format.

Skip count bytes with binary scan.

XBackup count bytes.
@Skip to absolute position specified by count. If count is *, skip to the end.

The count is interpreted differently depending on the type. For types like integer (i) and double (d), the count is a repetition count (e.g., i3 means three integers). For strings, the count is a length (e.g., a3 means a three-character string). If no count is specified, it defaults to 1. If count is *, then binary scan uses all the remaining bytes in the value.

Several type keys can be specified in a template. Each key-count combination moves an imaginary cursor through the binary data. There are special type keys to move the cursor. The x key generates null bytes in binary format, and it skips over bytes in binary scan. The @ key uses its count as an absolute byte offset to which to set the cursor. As a special case, @* skips to the end of the data. The X key backs up count bytes.

Numeric types have a particular byte order that determines how their value is laid out in memory. The type keys are lowercase for little-endian byte order (e.g., Intel) and uppercase for big-endian byte order (e.g., SPARC and Motorola). Different integer sizes are 16-bit (s or S), 32-bit (i or I), and possibly 64-bit (l or L) on those machines that support 64-bit integers. Note that the official byte order for data transmitted over a network is big-endian. Floating point values are always machine-specific, so it only makes sense to format and scan these values on the same machine.

There are three string types: character (a or A), binary (b or B), and hexadecimal (h or H). With these types the count is the length of the string. The a type pads its value to the specified length with null bytes in binary format and the A type pads its value with spaces. If the value is too long, it is truncated. In binary scan, the A type strips trailing blanks and nulls.

A binary string consists of zeros and ones. The b type specifies bits from low-to-high order, and the B type specifies bits from high-to-low order. A hexadecimal string specifies 4 bits (i.e., nybbles) with each character. The h type specifies nybbles from low-to-high order, and the H type specifies nybbles from high-to-low order. The B and H formats match the way you normally write out numbers.

Examples

When you experiment with binary format and binary scan, remember that Tcl treats things as strings by default. A "6", for example, is the character 6 with character code 54 or 0x36. The c type returns these character codes:

set input 6
binary scan $input "c" 6val
set 6val
=> 54

You can scan several character codes at a time:

binary scan abc "c3" list
=> 1
set list
=> 97 98 99

The previous example uses a single type key, so binary scan sets one corresponding Tcl variable. If you want each character code in a separate variable, use separate type keys:

binary scan abc "ccc" x y z
=> 3
set z
=> 99

Use the H format to get hexadecimal values:

binary scan 6 "H2" 6val
set 6val
=> 36

Use the a and A formats to extract fixed width fields. Here the * count is used to get all the rest of the string. Note that A trims trailing spaces:

binary scan "hello world " a3x2A* first second
puts "\"$first\" \"$second\""
=> "hel" " world"

Use the @ key to seek to a particular offset in a value. The following command gets the second double-precision number from a vector. Assume the vector is read from a binary data file:

binary scan $vector "@8d" double

With binary format, the a and A types create fixed width fields. A pads its field with spaces, if necessary. The value is truncated if the string is too long:

binary format "A9A3" hello world
=> hello    wor

An array of floating point values can be created with this command:

binary format "f*" 1.2 3.45 7.43 -45.67 1.03e4

Remember that floating point values are always in native format, so you have to read them on the same type of machine that they were created. With integer data you specify either big-endian or little-endian formats. The tcl_platform variable described on page 182 can tell you the byte order of the current platform.

Binary Data and File I/O

When working with binary data in files, you need to turn off the newline translations and character set encoding that Tcl performs automatically. These are described in more detail on pages 114 and 209. For example, if you are generating binary data, the following command puts your standard output in binary mode:

fconfigure stdout -translation binary -encoding binary
puts [binary format "B8" 11001010]

      Previous section   Next section
    Top