10.8. Data Abstraction and Information Hiding
A class normally hides its implementation details from its clients. This is called information hiding. As an example of information hiding, let us consider the stack data structure introduced in Section 6.11.
Stacks can be implemented with arrays and with other data structures, such as linked lists. (We discuss stacks and linked lists in Chapter 14, Templates and Chapter 21, Data Structures.) A client of a stack class need not be concerned with the stack's implementation. The client knows only that when data items are placed in the stack, they will be recalled in last-in, first-out order. The client cares about what functionality a stack offers, not about how that functionality is implemented. This concept is referred to as data abstraction. Although programmers might know the details of a class's implementation, they should not write code that depends on these details. This enables a particular class (such as one that implements a stack and its operations, push and pop) to be replaced with another version without affecting the rest of the system. As long as the public services of the class do not change (i.e., every original public member function still has the same prototype in the new class definition), the rest of the system is not affected.
Many programming languages emphasize actions. In these languages, data exists to support the actions that programs must take. Data is "less interesting" than actions. Data is "crude." Only a few built-in data types exist, and it is difficult for programmers to create their own types. C++ and the object-oriented style of programming elevate the importance of data. The primary activities of object-oriented programming in C++ are the creation of types (i.e., classes) and the expression of the interactions among objects of those types. To create languages that emphasize data, the programming-languages community needed to formalize some notions about data. The formalization we consider here is the notion of abstract data types (ADTs), which improve the program development process.
What is an abstract data type? Consider the built-in type int, which most people would associate with an integer in mathematics. Rather, an int is an abstract representation of an integer. Unlike mathematical integers, computer ints are fixed in size. For example, type int on today's popular 32-bit machines is typically limited to the range 2,147,483,648 to +2,147,483,647. If the result of a calculation falls outside this range, an "overflow" error occurs and the computer responds in some machine-dependent manner. It might, for example, "quietly" produce an incorrect result, such as a value too large to fit in an int variable (commonly called arithmetic overflow). Mathematical integers do not have this problem. Therefore, the notion of a computer int is only an approximation of the notion of a real-world integer. The same is true with double.
Even char is an approximation; char values are normally eight-bit patterns of ones and zeros; these patterns look nothing like the characters they represent, such as a capital Z, a lowercase z, a dollar sign ($), a digit (5), and so on. Values of type char on most computers are quite limited compared with the range of real-world characters. The sevenbit ASCII character set (Appendix B) provides for 128 different character values. This is inadequate for representing languages such as Japanese and Chinese that require thousands of characters. As Internet and World Wide Web usage becomes pervasive, the newer Unicode character set is growing rapidly in popularity, owing to its ability to represent the characters of most languages. For more information on Unicode, visit www.unicode.org.
The point is that even the built-in data types provided with programming languages like C++ are really only approximations or imperfect models of real-world concepts and behaviors. We have taken int for granted until this point, but now you have a new perspective to consider. Types like int, double, char and others are all examples of abstract data types. They are essentially ways of representing real-world notions to some satisfactory level of precision within a computer system.
An abstract data type actually captures two notions: A data representation and the operations that can be performed on those data. For example, in C++, an int contains an integer value (data) and provides addition, subtraction, multiplication, division and modulus operations (among others)division by zero is undefined. These allowed operations perform in a manner sensitive to machine parameters, such as the fixed word size of the underlying computer system. Another example is the notion of negative integers, whose operations and data representation are clear, but the operation of taking the square root of a negative integer is undefined. In C++, the programmer uses classes to implement abstract data types and their services. For example, to implement a stack ADT, we create our own stack classes in Chapter 14, Templates and Chapter 21, Data Structures, and we study the standard library stack class in Chapter 23, Standard Template Library (STL).
10.8.1. Example: Array Abstract Data Type
We discussed arrays in Chapter 7. As described there, an array is not much more than a pointer and some space in memory. This primitive capability is acceptable for performing array operations if the programmer is cautious and undemanding. There are many operations that would be nice to perform with arrays, but that are not built into C++. With C++ classes, the programmer can develop an array ADT that is preferable to "raw" arrays. The array class can provide many helpful new capabilities such as
We create our own array class with many of these capabilities in Chapter 11, Operator Overloading; String and Array Objects. Recall that C++ Standard Library class template vector (introduced in Chapter 7) provides many of these capabilities as well. Chapter 23 explains class template vector in detail. C++ has a small set of built-in types. Classes extend the base programming language with new types.
Software Engineering Observation 10.12
New classes created in C++ environments can be proprietary to an individual, to small groups or to companies. Classes can also be placed in standard class libraries intended for wide distribution. ANSI (the American National Standards Institute) and ISO (the International Organization for Standardization) developed a standard version of C++ that includes a standard class library. The reader who learns C++ and object-oriented programming will be ready to take advantage of the new kinds of rapid, component-oriented software development made possible with increasingly abundant and rich libraries.
10.8.2. Example: String Abstract Data Type
C++ is an intentionally sparse language that provides programmers with only the raw capabilities needed to build a broad range of systems (consider it a tool for making tools). The language is designed to minimize performance burdens. C++ is appropriate for both applications programming and systems programmingthe latter places extraordinary performance demands on programs. Certainly, it would have been possible to include a string data type among C++'s built-in data types. Instead, the language was designed to include mechanisms for creating and implementing string abstract data types through classes. We introduced the C++ Standard Library class string in Chapter 3, and in Chapter 11 we will develop our own String ADT. We discuss class string in detail in Chapter 18.
10.8.3. Example: Queue Abstract Data Type
Each of us stands in line from time to time. A waiting line is also called a queue. We wait in line at the supermarket checkout counter, we wait in line to get gasoline, we wait in line to board a bus, we wait in line to pay a highway toll, and students know all too well about waiting in line during registration to get the courses they want. Computer systems use many waiting lines internally, so we need to write programs that simulate what queues are and do.
A queue is a good example of an abstract data type. Queues offer well-understood behavior to their clients. Clients put things in a queue one at a timeinvoking the queue's enqueue operationand the clients get those things back one at a time on demandinvoking the queue's dequeue operation. Conceptually, a queue can become infinitely long. A real queue, of course, is finite. Items are returned from a queue in first-in, first-out (FIFO) orderthe first item inserted in the queue is the first item removed from the queue.
The queue hides an internal data representation that somehow keeps track of the items currently waiting in line, and it offers a set of operations to its clients, namely, enqueue and dequeue. The clients are not concerned about the implementation of the queue. Clients merely want the queue to operate "as advertised." When a client enqueues a new item, the queue should accept that item and place it internally in some kind of first-in, first-out data structure. When the client wants the next item from the front of the queue, the queue should remove the item from its internal representation and deliver it to the outside world (i.e., to the client of the queue) in FIFO order (i.e., the item that has been in the queue the longest should be the next one returned by the next dequeue operation).
The queue ADT guarantees the integrity of its internal data structure. Clients may not manipulate this data structure directly. Only the queue member functions have access to its internal data. Clients may cause only allowable operations to be performed on the data representation; operations not provided in the ADT's public interface are rejected in some appropriate manner. This could mean issuing an error message, throwing an exception (see Chapter 16), terminating execution or simply ignoring the operation request.