www.gibmonks.com

Table of Contents

Previous Page
Next Page

Recipe 2.26. Encoding Chunks of Data

Problem

You need to encode some data; however, you will be receiving it in blocks of a certain size, not all at once. Your encoder needs to be able to append each block of data to the previous one to reconstitute the entire data stream.

Solution

Use the Convert method on the Encoder class. The following method, ConvertBlocksOfData, will accept character arrays and keep passing them into the Convert method until there are no more character arrays to process. The final output of the Convert method is a byte array of a particular encoding, in this case UTF7.

	public static void ConvertBlocksOfData()
	{
	    // Create encoder.
	    Encoding encoding = Encoding.UTF7;
	    Encoder encoder = encoding.GetEncoder();

	    // Set up static size byte array.
	    // In your code you may want to increase this size
	    // to suit your application's needs.
	    byte[] outputBytes = new byte[20];

	    // Set up loop to keep adding to buffer until it's full
	    // or the inputBuffer is finished.
	    bool isLastBuffer = false;
	    int startPos = 0;
	    while (!isLastBuffer)
	    {
	        // Get the next block of character data.
	        // GetInputBuffer is defined at the end of the Solution section.
	        char[] inputBuffer = GetInputBuffer(out isLastBuffer);

	        // Check to see if we will overflow the byte array.
	        if ((startPos + inputBuffer.Length) >= outputBytes.Length)
	        {
	            Console.WriteLine("RESIZING ARRAY");

	            // Resize the array to handle the extra data.
	            byte[] tempBuffer = new byte[outputBytes.Length];
	            outputBytes.CopyTo(tempBuffer, 0);
	            outputBytes = new byte[tempBuffer.Length * 2];
	            tempBuffer.CopyTo(outputBytes, 0);
	        }

	        // Copy the input buffer into our byte[] buffer
	        // where the last copy left off.
	        int charsUsed;
	        int bytesUsed;
	        bool completed;
	        encoder.Convert(inputBuffer, 0, inputBuffer.Length, outputBytes, startPos,
	                        inputBuffer.Length, isLastBuffer, out charsUsed,
	                        out bytesUsed,
	                        out completed);

	        // Increment the starting position in the byte[]
	        // in which to add the next input buffer.
	        startPos += inputBuffer.Length;
	    }

	    // Display data.
	    Console.WriteLine("isLastBuffer == " + isLastBuffer.ToString());
	    foreach (byte b in outputBytes)
	    {
	        if (b > 0)
	            Console.Write(b.ToString() + " -- ");
	    }
	    Console.WriteLine();
	}

The following code simply creates a text string of the alphabet and returns character arrays containing incremental blocks of characters from this string. The character arrays returned are of a particular size, in this case six characters. Note that this code is used only to exercise the ConvertBlocksOfData method; in your code you may have a stream of data that originates from a network or local filesystem and arrives in chunks rather than as one long continuous stream of data.

	const int size = 6;     // The amount of data that we will return in the char[]
	static int index = 0;   // Where we are in the original text string

	// Dummy method to pass data into the calling method in chunks
	public static char[] GetInputBuffer(out bool isLastBuffer)
	{

	    // The input string
	    string text = "abcdefghijklmnopqrstuvwxyz";

	    char[] inputBuffer = null;
	    
	    if ((index + size) < text.Length)
	    {

	        // Create the buffer to return (we are not finished).
	        inputBuffer = text.ToCharArray(index, size);
	        isLastBuffer = false;
	    }
	    else
	    {
	        // Create the buffer to return (we are finished).
	        inputBuffer = text.ToCharArray(index, text.Length - index);
	        isLastBuffer = true;
	    }

	    // Increment the index to the next chunk of data in text.
	    index += size;

	    return (inputBuffer);
	}

Discussion

In this recipe you use the GetInputBuffer method to pass chunks of data, in this case character arrays of size six, back to the ConvertBlocksOfData method. In this method the chunks of data are fed into the Convert method, which keeps accepting chunks of data until the GetInputBuffer method returns TRue in its isLastBuffer out parameter. This signals the Convert method that it is finished creating the byte array and it is time to clean up. The result of this is that the Convert method creates a single continuous byte array converted to a particular encoding from individual chunks of data in the form of character arrays.

The Convert method was chosen because it was designed to be used to encode data of an unspecified size, as well as data that arrives in chunks as opposed to all at once. In certain situations, such as when a server application returns data over the network in packets of a specific size and the data is too large to fit into a single packet, a data stream object will not be able to return the complete stream. Instead, the stream object will return chunks of the entire stream until there is no more to return.

The Convert method and its parameters are defined as follows:

	public virtual void Convert(char[] chars, int charIndex, int charCount,
	                            byte[] bytes, 
	                            int byteIndex, int byteCount,
	                            bool flush, out int charsUsed,
	                            out int bytesUsed, out bool completed)

This method's parameters are defined here:


chars

The character array used as the input.


charIndex

At what index to start encoding data in the chars array.


charCount

How much data to encode in the chars array.


bytes

The byte array that will hold the encoded chars character array.


byteIndex

At which position in the bytes array to start storing the encoded characters.


byteCount

The maximum allowable characters that will be converted and stored in this array.


flush

A false value should be passed into this parameter until the last chunk of data is converted. Upon receiving the last chunk of data, a true value should be passed into this parameter.


charsUsed

An out parameter that indicates how many characters were converted.


bytesUsed

An out parameter that indicates how many bytes were created and stored in the bytes array as a result of the encoding process.


completed

An out parameter returning true if all characters totaling (charCountcharIndex) were encoded and a false if they were not all encoded.

Of these parameters the flush parameter deserves a bit more discussion. This parameter is used to tell the Convert method that the current character array is the final bit of data that is being passed in. Only at this point should you pass in the value true to this parameter. This tells the Encoder object on which the Convert method was called to finish encoding the current character array that was passed in and then to clean up after itself. At this point you should not pass in any more data to the Convert method.

The Convert method will throw an ArgumentException if you accidentally overflow the inputBuffer byte array. To prevent this from happening, you can resize this inputBuffer to allow it to hold this additional data. The code to do this is in the ConvertBlocksOfData method and is shown here:

	      // Check to see if we will overflow the byte array.
	      if ((startPos + inputBuffer.Length) >= outputBytes.Length)
	      {

	          Console.WriteLine("RESIZING ARRAY");

	          // Resize the array to handle the extra data.
	          byte[] tempBuffer = new byte[outputBytes.Length];
	          outputBytes.CopyTo(tempBuffer, 0);
	          outputBytes = new byte[tempBuffer.Length * 2];
	          tempBuffer.CopyTo(outputBytes, 0);
	      }

This code simply stores the original outputBytes buffer into a temporary buffer called tempBuffer and then resizes the outputBytes buffer by twice the original size. The tempBuffer data is then copied back into the outputBytes buffer, where it is eventually passed into the Convert method. We chose to double the size of the buffer since that is the normal behavior of .NET collections, such as the ArrayList. You may want to look at this code and determine for yourself if this is the optimal size for your application or if this value needs to be tweaked.

If you want to change the encoding type to another type such as Encoding.Unicode, which takes up twice as many bytes per character as UTF7, you will need to fix up the starting position and lengths for your byte array. The following code shows the changes needed to the ConvertBlocksOfData method in order for it to work with the Unicode encoding:

	public static void ConvertBlocksOfData()
	{
	    // Create encoder.
	    Encoding encoding = Encoding.Unicode;
	    Encoder encoder = encoding.GetEncoder();

	    // Set up static size byte array.
	    // In your code you may want to increase this size
	    // to suit your application's needs.
	    byte[] outputBytes = new byte[20];

	    // Set up loop to keep adding to buffer until it's full
	    // or the inputBuffer is finished.
	    bool isLastBuffer = false;
	    int startPos = 0;
	    while (!isLastBuffer)
	    {

	        // Get the next block of character data.
	        char[] inputBuffer = GetInputBuffer(out isLastBuffer);

	        // Check to see if we will overflow the byte array.
	        if (((startPos * 2) + (inputBuffer.Length * 2)) >= outputBytes.Length)
	        {
	            Console.WriteLine("RESIZING ARRAY");
	            
	            // Resize the array to handle the extra data.
	            byte[] tempBuffer = new byte[outputBytes.Length];
	            outputBytes.CopyTo(tempBuffer, 0);
	            outputBytes = new byte[tempBuffer.Length * 2];

	            tempBuffer.CopyTo(outputBytes, 0);
	        }

	        // Copy the input buffer into our byte[] buffer
	        // where the last copy left off.
	        int charsUsed;
	        int bytesUsed;
	        bool completed;
	        encoder.Convert(inputBuffer, 0, inputBuffer.Length, outputBytes,
	                                   startPos * 2,
	                                   inputBuffer.Length * 2, isLastBuffer,
	                                   out charsUsed,
	                                   out bytesUsed, out completed);

	        // Increment the starting position in the byte[]
	        // in which to add the next input buffer.
	        startPos += inputBuffer.Length;
	    }

	    // Display data.
	    Console.WriteLine("isLastBuffer == " + isLastBuffer.ToString());
	    foreach (byte b in outputBytes)
	    {
	        if (b > 0)
	            Console.Write(b.ToString() + " -- ");
	    }
	    Console.WriteLine();
	}

The highlighted lines indicate the changes that are needed. These changes simply take into account the larger size of the Unicode-encoded characters that will be placed in the outputBytes buffer.

See Also

See the "Encoder.Convert Method" topic in the MSDN documentation.


Previous Page
Next Page