Joined: 04 Mar 2003
|Posted: Fri Nov 18, 2005 11:09 pm Post subject: Article - C++ for C Programmers Part II
C++ For C Programmers – A Partial Solution To The Buffer Overflow Problem - Part Two
Author: Andrew J. Bennieston aka Stormhawk (formerly known as Technetium)
The C Array
In C, an array is a sequential region of memory assigned to store several objects, one after the other. The name of the array serves as a pointer to its first element, that is... array, array and &array are different ways of saying the same thing. Arrays in C may be created statically by declaring their size at compile-time, or dynamically by calling malloc() with the size of the array, and using realloc() and free() to alter it. The C array does not "know" its own size, the programmer must remember the size of the array. Furthermore, C does not attempt to protect against writing beyond the end of an array,thus causing the buffer overflow problems seen for character arrays (C strings).
Writing beyond the end of an array may cause a segmentation fault (raising SIGSEGV) or the program may proceed, but the data after the array ends will have been changed. In this situation, the program may behave unexpectedly, and may be exploitable if it is possible to execute arbitrary code written into, or beyond, an array.
The C++ Array
C++ supports arrays in an identical manner to C, these arrays have the same properties, and the same weaknesses. They are supported as C++ provides direct support of (the majority of) the C standard. This is much the same as the ability to use char* style C strings in C++, but, as with the std::string, C++ provides alternatives to the array.
The C++ Vector
The vector is the primary alternative to the C array. A vector is a sequential region of memory in which objects are stored. Vectors can be used almost identically to the C-style array, but they have many additional properties which make programming using vectors easier, and more secure, than programming using arrays.
The C++ vector also provides operations to reserve storage for elements yet to be created, resize the vector, change the order of objects within the vector, as well as many more operations provided through compatibility with the standard algorithms provided by C++.
Transformations I - Vectors To Arrays
It is possible to use a vector directly in place of a C-style array, including within function calls which expect an array, as the pointer semantics for arrays still work with vectors. That is, if myVec is a vector object, then &myvec + n is the same as &myVec[n], where n is an integer. As such, it is possible to pass &myVec wherever a function expects an array.
Transformations II - Arrays To Vectors
Using a vector in place of an array is easily done by simply passing the address of element 0, (&vectorname). Creating a vector from an array can be achieved through the use of a vector constructor which takes a range from another container, with which to initialise the vector. Code sample 4.1 shows this.
/* 4.1 - creating a vector from an array */
/* ... code to set values of arr..arr */
vector<int> vec(arr, arr+sizeof(arr)/sizeof(arr));
The constructor takes a beginning and an end of the range with which to take values to fill the new vector. arr is a pointer to the first element of the array (by definition) and so this serves as the beginning point of the range. The expression for the end point must be one beyond the last element; the range is [beg,end) (i.e. beginning (inclusive) to end (exclusive). In order to determine this value, the size of the array (in bytes) is divided by the size of an array element (in bytes) to obtain the number of elements in the array. This integer is added to the pointer for the beginning of the array, providing a "one past the end" count.
The std::vector can be constructed empty, with items added by resizing the vector through the resize() method and then using the array-subscript indexing to assign values. You may also push values onto the end of an array (much like a stack) using the push_back() method. It is possible to insert() elements into an arbitrary position within the vector (although this is a computationally more expensive operation than appending to the back of the vector), and to delete() elements from arbitrary positions within the vector.
Vectors also provide a way to reserve storage for future use, with the reserve() method. This does not change the size of the array, but it allocates memory for more elements. Reserving memory in advance of using it prevents expensive reallocation at a later date, much like using malloc() to allocate a large amount of memory and then using it later.
The C++ vector class is actually a template class; it is designed to allow many types to be stored within the vector. In order to create a vector to store a certain type, you must include the <vector> header, then name the type within the <> after the vector type. Code sample 5.1 shows the creation of a vector of int, a vector of char, a vector of string, and a vector of a user-defined class Person.
/* 5.1 - template instantiation */
using namespace std;
vector<int> numbers(20); /* 20-element integer vector
initialised with default int value */
vector<char> characters; /* empty character vector */
vector<string> strings; /* empty string vector */
vector<Person> people2(people1, people1+sizeof(people1)/sizeof(people1)); /* vector of class Person, initialised with the contents of the people1 array */
Range-Checked Vector Access
The std::vector provides the at() method, which performs range checked access and throws the out_of_range exception if the element is beyond the range of the vector. This can be used to improve security of array-manipulation code by ensuring that the code cannot write beyond the end of the vector (though the unchecked  style element access may still write beyond the end of the vector!). The vector also provides front(), which returns the first element, and back(), which returns the last element.
Iterators may be used in place of pointers, when manipulating vectors, and the C++ standard library algorithms expect iterators for ranges. Vectors provide begin() and end() to return iterators for the first element and the position immediately after the last element. rbegin() and rend() are provided for reverse iteration. Iterators may be invalidated by operations which insert or delete elements before the iterator, or reallocate the vectors memory.
The iterators are used when inserting and deleting elements with the insert() and delete() methods.
Stack Operations, Resizing, Clearing And Erasing
The methods push_back() and pop_back() exist to append an item to the end of a vector, and to remove it again, providing stack-like operations.
The resize() method resizes the vector, constructing new elements if the size grows, and removing the end elements if the size is reduced. clear() removes all of the elements of the vector, whilst erase() removes elements within a range specified by iterators. A variant of resize() exists which initialises new elements as copies of the specified element (for instance, vec.resize(20,vec) makes the vector 20 elements long, with any new elements being copies of the first element).
std::string As A Container
The discussion of the std::string in part one of this article covered the use of strings as "drop-in" replacements for the C string, but it was not mentioned that a string is simply a very specialised sequence container; as such, it provides most of the methods the vector class provides, as well as many specifically tailored to string operations. Using strings in this way is unusual, a string object often has meaning only as the sequence of characters, in order, but iterating through these characters may be necessary under certain circumstances, and the standard algorithms provided by C++ may help here; the for_each() function, for instance, may make light work of an operation to make the entire string uppercase, since you could write a function which works on a single character, and use str.begin() and str.end() as iterators to the for_each() standard algorithm.
C++ provides several other containers. Each one differs from the std::vector in ways which allow it to be optimised for certain tasks. Perhaps the most unusual to the C programmer is the std::map, for which no direct analog exists in C. std::map is an associative array class, designed to provide constant-time lookup of a value, given a unique key. The keys and values many be of any type which supports a few basic operations, virtually all standard types provide these, and with the operator overloading facilities of C++ it is easy to add such support to classes created by the programmer.
Whilst the map does not directly improve the security of a program, over and above the provisions of C, it has a subtle implication; it removes the necessity for the programmer to use a third party associative array data type, or to create their own. The standard map should be well-tested, and thoroughly documented, and this provides better security guarantees than a map written by the developer, which would use ordinary arrays in C, as it is easy to overlook one case where it may be possible to write beyond the end of an array, thus rendering the entire data structure potentially vulnerable to buffer-overflow based exploits. For this reason, the std::map (and, indeed, the other specialised containers provided by C++) provide an improved security over the provisions made by the core C language.
C++ provides the sequence containers vector, deque (a double-ended queue) and list, and the associative containers set (unique elements), multiset, map (associative array) and multimap. Each container is detailed fully, along with all member functions, constructors and specifications, in the standard library documentation.
C++ From The Perspective Of A C Programmer
The information relating to C++ types which provide extra safety over and above C types and structures should not be thought of as a way of forcing OOP (object oriented programming) upon the unsuspecting C programmer. C++ is a language which provides the benefits of OOP, but does not force them. As both a C and C++ programmer, I personally use the OOP features of C++ alongside a more conventional C-style programming approach, in situations where that approach is useful, but the safer C++ containers provide more security. At the very least, I would consider using C++ in a C-like manner, and using the std::string for user input, increasing security with little extra effort, and retaining the rest of the program in a C-like form.
In this article I have presented mechanisms to use C++ strings in place of C strings, and C++ vectors instead of C arrays, and, indeed, methods of switching between the two types of programming.
Clearly, if a C programmer checks every attempt at access to an array, checks the length of their char* prior to using string operations, and ensures that every other programmer working on the project does the same, it is of little benefit to use C++ features in a C-like way, but such ideal situations are rare, and the use of C++ objects, at the very least in the portions of a program which deal with user input, should improve the robustness of the program when faced with unexpected input conditions.
The intention of this article was to inform. The std::vector and std::string may be marginally slower than pure arrays (though that will depend on the implementation), but modern computers are fast enough that a few clock cycles can be sacrificed for improved security. It is that balance between speed and security which must be obtained, and adding checks to C code would probably result in a similar decrease in speed. What, if anything, the reader chooses to take away from this article is their own decision.
Last edited by alt.don on Sat Nov 19, 2005 4:58 pm; edited 1 time in total