• RSS
  • Twitter
  • FaceBook

Security Forums

Log in

FAQ | Search | Usergroups | Profile | Register | RSS | Posting Guidelines | Recent Posts

Article - C++ For C Programmers Part I

Users browsing this topic:0 Security Fans, 0 Stealth Security Fans
Registered Security Fans: None
Post new topic   Reply to topic   Printer-friendly version    Networking/Security Forums Index -> News // Columns // Articles

View previous topic :: View next topic  
Author Message
alt.don
SF Boss
SF Boss


Joined: 04 Mar 2003
Posts: 16777079


Offline

PostPosted: Thu Oct 27, 2005 1:17 am    Post subject: Article - C++ For C Programmers Part I Reply with quote

C++ For C Programmers – A Partial Solution To The Buffer Overflow Problem - Part One

Author: Andrew J. Bennieston

Overview

This article introduces some aspects of C++ from the perspective of a C programmer. In particular, C strings (char*) are replaced with the C++ std::string, and arrays are replaced with the C++ std::vector container. Checked access to containers is discussed, along with algorithms for converting arrays to vectors and vectors to arrays.

The C String

In C, a string is simply a sequence of characters (a character array). C strings are often represented as a char* in source code (since the name of an array is a pointer to its first element, the representation char* string; is identical to char string[];In this article I will use char* when referring to character arrays, and will usually add the array brackets, [], to arrays of other objects, except in cases where pointers are explicitly required.

The C++ String

In C++, a string can be represented as a char*, as in C, or by using a standard class (actually a template class allowing a number of different character types to be used in creating strings, such as ASCII characters, Unicode characters, Japanese characters and many more). The std::string is a typedef, a construct allowing a type to be referred to by another name. std::string in fact refers to std::basic_string<char>. The angle-brackets are a template construct, an advanced C++ feature which will not be discussed in detail here. Put simply, an std::string is a form of std::basic_string (a class providing string storage and related functions) which stores ordinary char data (as opposed to wchar_t, which typically holds 16-bit Unicode characters).

We'll look now at the differences in creating a string in C and in C++. Code sample 1.1 shows one way of creating a string in C, sample 1.2 shows the use of the C++ std::string.

Code:


/* 1.1 – C string usage */
char mystring[] = “Hello!”;

/* 1.2 – C++ std::string usage */
#include <string>
std::string mystring = “Hello!”;



As you can see, the only difference is to include a header which declares the std::string name, and to change the syntax of the assignment. Note that the code in 1.3 would create an array of std::strings, and that trying to assign a single string constant to that array would result in a compile-time type mismatch error.

Code:


/* 1.3 – array of std::string */
std::string mystring[] = “Hello!”; // Note this won't compile!



Whenever you use the std::string you must include the <string> C++ header file. The string is defined in namespace std, so if you do not wish to type std::string you can add a using declaration, and simply refer to std::string as string. This is illustrated in code sample 1.4

Code:


/* 1.4 – Importing namespace std into the global namespace */
#include <string>
using namespace std;
string mystring = “Hello!”;



Of course, the std namespace contains many more names than just string. If you do not wish to “pollute” the global namespace with all of these, you can import individual names, as shown in code sample 1.5.

Code:


/* 1.5 – Importing std::string into the global namespace */
#include <string>
using std::string;
string mystring = “Hello!”;



For large projects, you are likely to be using many of the features provided under the std namespace, and so it is worthwhile simply importing the entire namespace into global scope.

The C++ string class provides a wealth of string manipulation and I/O functions, some of which will be discussed later, but right now an important one is the c_str() member function. Calling the c_str() method of a string object returns a pointer to a char array – a C string representation of the data in the std::string. One use of this is illustrated in code sample 1.6, where the system() function requires a char* command to execute, but the command is provided within the program as an std::string.

Code:


/* 1.6 – Use of c_str() */
string command = “cd /home/andrew/”;
system(command.c_str());



Of course, in this rather contrived example, using a string literal as an argument to the system() function would have sufficed in both C and C++, but it is easy to see how the flexibility of C++ strings does not prevent backward-compatibility with interfaces requiring a char*.

C++ String Input

Getting input from the user is an issue in C where security is paramount. You allocate a fixed-size buffer (an array of characters, char) into which the input is read. The C standard library provides two sets of input functions. One set which read input until a newline or end of file, and one set which read up to a newline or end of file, but at most n characters, where n is usually one less than the buffer size. These second set of functions are supposed to prevent buffer overflows by stopping reading input just before a buffer overflow would occur. They require however, two things; first, that the programmer knows of their existence, and uses them in every occasion input is read, and secondly, that the programmer knows (or can be bothered to determine) the length of every array into which data is to be read. These two criteria are fairly restrictive, as they require the programmer to be paying attention constantly, not to slip up and use the unchecked form of the input function, and not to forget to check the length of an array before reading into it.

C++ effectively eliminates these problems by providing input directly to string objects. A string object grows to accommodate its input (up to a maximum size, at which point an exception (length_error) is thrown; there is never a chance for a buffer overflow to occur when reading directly into a string, provided that the underlying standard library, and operating system, implement buffer length checking (the langauge is only as secure as the platform on which it is used!)).

The C gets() function reads the next input line, from standard input. It stores the data it reads in a char* passed to it as an argument. This function does not take the length of the array as an argument, and is therefore not safe – input longer than the array would overflow. The fgets() function is similar, but reads from a file stream, and takes an argument specifying the maximum number of bytes to read. Code sample 2.1 illustrates gets() and fgets().

Code:


/* 2.1 – gets() and fgets() */
#define BUFFERSIZE 1024
char* buffer = malloc(BUFFERSIZE); /* a 1024-character buffer */
char* s;
s = gets(buffer); /* reads into buffer until newline.
               UNSAFE! */
s = fgets(buffer, BUFFERSIZE, stdin); /* reads into buffer until
               newline -or- until 1023 characters have
               been read. SAFER! */



This type of programming, where you must think carefully about the function you are using, makes it easy to write insecure code, especially as many books which teach C programming use the first mechanism, gets() rather than explaining why gets() is insecure and endorsing the use of fgets() at all times!

In C++, this confusion is eliminated. Because a string grows to accept its input, C provides a function std::getline() which reads the next line into an std::string.

Code:


/* 2.2 – std::getline() */
string buffer;
getline(cin, buffer); /* reads from cin (the C++ stream
               equivalent to C's stdin) into string
               buffer, expanding buffer as needed */



Ideally you would enclose getline() in a try ... catch block to catch the length_error exception which may be thrown in the (unlikely) event that the input is larger than the maximum string size.

Furthermore, the C strcpy() function is responsible for a large number of buffer overflows. strcpy() takes a source char* and a destination char*, and copies characters from the source to the destination, until it reaches a \0 byte in the source. Clearly if the source is larger than the destination this can result in a buffer overflow. C solves this with the strncpy() function, but again you have to remember that two different functions exist for the same purpose, and remember to use strncpy(). In C++, this problem is eliminated as the = assignment operator can perform string copy (see code sample 2.3).

Code:


/* 2.3 - strcpy, strncpy and C++ string copy */
char p[] = "Hello\n";
char* q = malloc(4);
strcpy(q, p); /* Dangerous! */
strncpy(q, p, 4); /* safe! */
string s = "Hello\n";
string t;
t = s; /* safe! */



When you combine the power of the std::string to grow and accommodate its input, with the large number of functions provided by the <string> header, such as searching, replacing, substring search and replace, element access (accessing a character in the string, much as you could perform array subscripting on a C char* string), conversion to C style strings, comparisons (it is possible to compare two std::strings directly using ==, in C++) and the many other features provided, it is hard to argue for the use of char* in software which must read data from external sources.

One situation where reading data as char* is required, however, is with the use of the UNIX sockets API. As the sockets API was designed to be implemented in C, char* is the data format used when sending and receiving data. We have already seen that the c_str() method of string objects can create a char* suitable to send to C functions, but how can we safeguard against buffer overflows when receiving data over a socket?

Receiving Socket Data In C++

The C recv() function is responsible for receiving data from a socket. We'll assume, for the purposes of this article, that we have already established a TCP socket connection and we have a valid socket descriptor, sockfd. Here, we will be concerned only with reading data from a socket into an std::string.

Obviously this cannot be done directly; the recv() function takes a pointer to a character array, which it fills with the data it receives. You can, however, specify a maximum number of characters to read in a single call to recv(). You can then create versions of recv() which return strings. Below I present a version of recv() which acts as a simple wrapper to the C recv() call, but working with std::string objects.

Code sample 3.1 shows an overloaded function (a function with the same name but different arguments; C++ resolves the correct function at compile time!) for recv(), which takes a reference to a string (similar to a pointer to a character array) and calls the C recv() function. The other arguments, and the return type, are identical to the C recv() function.

Code:


/* 3.1 - std::string based recv() */
#include <string>
#include <socket.h>
#include <sys/types.h>

using std::string;

ssize_t recv(int s, string& buf, size_t len, int flags)
{
   char* buffer = new char[len];
   ssize_t bytes_recv = recv(s, (void*) buffer, len, flags);
   buf = string(buffer);
   delete[] buffer;
   return bytes_recv;   
}



This wrapper function is easy to understand; it creates a character array in memory using the C++ new operator, which is analogous to the malloc() function in C. The recv() call proceeds as normal, using this buffer, then a string object is created using a special constructor provided by the string class, which takes a char* and returns a newly constructed std::string object. The char* buffer is then deleted (analogous to free() in C) and the number of bytes actually received is returned. As the std::string buf was passed to the function as a reference, changes made to the string within the function change the string outside of the function's scope, also. This recv() function is identical to the C recv() function, but provides the data in an std::string.

A std::string based send() function would be even easier to write (see code sample 3.2).

Code:


/* 3.2 - std::string based send() */
#include <string>
#include <socket.h>
#include <sys/types.h>
using std::string;
ssize_t send(int s, std::string& buf, int flags)
{
   return send(s, buf.c_str(), buf.length(), flags);
}



Note that the std::string based send() function does not need the length of the data to send passing as an argument - it can determine this from the std::string object itself, as string objects know their own length.

Conclusion

This completes the discussion of the C++ std::string features, for this article. In the second part, we look at the std::vector container, and how it can be used alongside C-style arrays, including looking at using vectors as arrays, and creating vectors from arrays.

This interview is copyright 2005 by the author and Security-Forums Dot Com, and may not be reproduced in any form in any media without the express permission of the author, or Security-Forums Dot Com.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   

Post new topic   Reply to topic   Printer-friendly version    Networking/Security Forums Index -> News // Columns // Articles All times are GMT + 2 Hours
Page 1 of 1


 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Community Area

Log in | Register