Cee Language And Buffer Overflows

BufferOverflows have made the C/C++ language a very vulnerable internet programming language. This is a discussion of the problem and how it can be solved.


CeeLanguage and CeePlusPlus use character buffers (aka CharStars) to represent strings. The character buffers have no intrinsic size. They are an area in memory. A common problem with applications written in C is that malicious users can provide string data that's longer than the program expects, and thus write to areas of memory which should be used for other purposes. In some cases, this technique can be used to insert executable code into the running program and get the program to run it. This is the all-too-common buffer overflow exploit. Creating programs in C and C++ which are not vulnerable to buffer overflows requires great care on the part of the programmer, and experience tells us that they won't catch everything.

This has a grain of truth, but is overstated.

Really? How so? The only way to get a buffer overflow in a Python, Java or Ruby program is if there's a bug in the interpreter or there's a particularly badly behaved C library that they use. I am not aware of a buffer overflow exploit ever appearing in a program written in one of these languages in the wild. If anyone knows of an example, please put it here.

See GatedCommunityPattern. The C/C++ world has been dealing with this and many, many other potential abuses for a generation. That's why we're so durn good at error detection and correction...


Q: Couldn't someone write a library or a patch to fix that problem?

A: No. It is part of the C and C++ language standards that literal string expressions, such as "hello world", evaluate to a null-terminated character buffer (not a pointer to one). There are C libraries that use alternate, safer implementations for strings, but these don't seem to be used all that often, and you still have string literals to contend with.

A: Yes. Someone can (and many have) written libraries in C that can never overflow a buffer. The fact that C standard library uses null terminated character arrays for strings, or that a lot of programmers use those doesn't change that fact. Everyone is free to use any string representation in C, though string literals are always simple (dangerous) null-terminated buffers. (See NonNullTerminatedString for more information)

A: The C++ standard library, the StandardTemplateLibrary, and third-party alternatives have classes and templates to manage automatically resizable buffers and range-checked buffers. Use of "raw" C-style strings and arrays in C++ programs is usually considered to be poor style.

A: Easily. I would start with the following:

typedef struct {
int size;
char* self;
} String;

String str(char* input) { String tmp {0; input}; while(*input++) tmp.size++; return tmp; }; String* StringCopy?(String* to, String* from) { if ((*to).size < (*from).size) return (String*)Null; strcpy((*to).self, (*from).self); };
Or something like that.


Q: Do all C programs have buffer overflows?

A:No. Clearly, there are examples of programs which provably do not have buffer overflows or are free of them for all practical purposes, yet every C or C++ program which is widely used on the Internet has had one or more buffer overflow problems in its history. Every single one, except possibly for qmail, authored by secure coding practices guru DanBernstein.

Someone here asserted that the Java runtime is an example of a C/C++ program with no buffer overflows. Java certainly has had buffer overflows in the past. (http://lists.netsys.com/pipermail/full-disclosure/2002-November/002642.html). Does it now? Your guess is as good as mine. Additionally, the Java VM is exactly the kind of program that will have fewer problems with buffer overflows, because the bulk of its work involves trundling through Java bytecode. It doesn't work with arbitrary string data very much. The Java compiler, which works with string data a great deal, is typically written in Java, and isn't a useful attack vector anyway.


Q: Why then are these buffer problems so important on internet and not so important to program local, non-internet applications. It's the same problem after all!

A: Essentially because no one in his right mind would fool around with his own program?

These buffer problems are important in all sorts of applications, they just get more press when they happen on the internet.

In typical applications, users can only fill buffers from the user interface and it is quite easy to restrict data size at this point. In internet applications, a malicious user can put whatever he wants into a series of packets. It is quite easy to force buffers of almost unimaginable size using this technique.


Q: Also can't the programmer write a line evaluating the length of the string data so that "malicious users cannot provide string data that's longer than the program expects, and thus writing to areas of memory which should be used for other purposes"?.

A: Yes, of course. As the introduction above states, the problem is not having to introduce such a check in hundreds of places, but the one place in which the programmer forgot to introduce the check. However, there must surely be string libraries and automatic code auditing tools that handle this automagically and perfectly, and I would think that no programmer in his right mind would write security critical programs without such aids? Could a CeeProgrammer? explain, please?

One technique is to use a malloc() routine during development which allocates a little extra memory than the programmer requests and adds magic characters to the beginning and end of the allocated buffer. The program can later check that those magic characters have not been overwritten. This helps, but it doesn't solve the problem entirely.

Another technique is to avoid functions in the C library which are known to be dangerous and replace them with bounds checking versions. The interfaces are different, though, and again, it's not foolproof. See this paper for at least one example: http://openbsd.org/papers/strlcpy-paper.ps. Smilarly, C++'s STL fixes a lot of the problems inherent in the standard C library, but again, it's not foolproof, and you still have those nasty free pointers and string literals to contend with.


Q: If all other programmers who have written other languages have solved the buffer overflow problems, why can't their solutions be applied to C?

A: You can't evaluate its length until it's in the buffer--at which point it's too late. The best you can do is use input widgets/functions which you can supply a length to, and which reject input beyond that length.

[You can always evaluate the length. Every low level read and copy method can take a buffer length.]

The solution in other languages is to, among other things, use references instead of free pointers and to check bounds on array operations at runtime, such that there's no way for a programmer to accidentally write to/read from the wrong piece of memory. Adding these things to C++ yields C# or Java.

More specifically, a user running his own program doesn't gain any privilege by hacking into it in this manner. A user who uses this technique to crack into a program run as another user can gain the privileges of that user--or of root.

Of course, part of the problem is that some operating systems (LinuxOs, for instance) allow data on the stack to be executable. That is, if you create a set of valid opcodes in memory (including on the stack) and jump to it, it will work. Most programs don't need this feature--programs that do include things like virtual machines with just-in-time compilation.

Perhaps it would be useful if executable-stacks became an optional feature--a VM or similar program could enable them, but the vast majority of root programs that don't need executable stacks would thus be immune from at least a subset of the buffer overflow exploits.


Q: How does Perl manage the buffer overflow problem? Can't the solution adopted by Perl and Python be ported to C++?

A: See above. Perl and Python (and Java and Ruby and (to some extent) C# and Smalltalk and Lisp and . . . ) are more restrictive languages than C/C++. They disallow free pointers, so programmers can't access arbitrary memory locations, and bounds check all array and string operations at runtime. An attempt to write 4097 bytes into a 4096 byte string will either raise an exception or automagically grow the buffer. You could write a C++ buffer class that checks bounds on buffer accesses at runtime, but you'd have to convert all of these to dumb, low-level buffers when dealing with the operating system or third party libraries. Using a high-level buffer/string class is clearly a step in the right direction, and many programmers do this, but it's not a 100% solution.

Also, Perl, Python, Java, Ruby, et al, are free of buffer overflow problems to the extent that their interpreters (which are typically written in C/C++) are free of buffer overflows. I think we can be pretty confident, though. All such programs have high-level strings and arrays in their toolbox, because they use them to support the client language, and they get used extensively within the interpreters in place of null terminated strings and simple buffers.

This is another case of GreenspunsTenthRuleOfProgramming, though we can specialize it as: Any sufficiently secure C/C++ program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of CommonLisp's bounds-checked arrays and buffers.


Q: There is a sure weakness of C++ well spotted and well identified. I am sure the programmer who will correct the problem, come up with a solution and put the solution in a book will make a lot of green dollars + he will enable hundreds of C programmers to write programs for internet.

A: There is always a solution in C/C++ for every problem. Usually a difficult solution but a solution nevertheless.

This is not really a "problem" with C that needs to be "corrected". A basic principle of the design of C was to provide efficient low-level bare-metal access to memory, and to provide simple efficient representations of basic types (like null-terminated strings). This may be unsafe, but is necessary for writing the low-level code (operating system kernels, device drivers, etc.) that C was designed for. Changing the definition of C to prevent low-level memory access or to require a safer string representation would make the language less useful for its niche. Making C safer by eliminating these capabilities would be like making aircraft safer by eliminating the wings.

If you don't need that efficiency, or don't know how to do it safely, then don't use C. That's why we have Java, Perl, .NET, Python, C++ STL, Lisp, ML, Smalltalk, VB, ...

That's exactly the point. Overrun is a possibility not just with C but with assembler and any low level language, including C and Forth. You don't hear about problems with assembler or Forth programs because their programmers are already handling so many gotchas! that avoiding input buffer overflows is just part of the routine. --mt


Q: Can a cracker who fools around with the buffer overflow actually modify the source code and the exe file? How many crackers have that knowledge? Aren't we dramatizing a problem here?

A: A cracker can't directly modify the source code or the EXE via a buffer overflow. What a cracker can do is insert machine code into the process's memory space and get that code to run. So a cracker can insert code that will modify the EXE, find and change the source code, e-mail your credit card numbers to him, or whatever s/he wants.

Only a small percentage of "crackers" bother to find these security holes, but they love to brag. In any case, it only takes one cracker to do it and post a generalized exploit on the web. Many of the debilitating Internet worms have used buffer overflows as an attack vector.


"every C or C++ program which is widely used on the Internet has had one or more buffer overflow problems in its history. Every single one, except possibly for qmail''

That sounds like hyperbole to me. Sticking with MTAs, for instance, what about Wietse Venema's Postfix?


In Search of a Category: this page and others that talk about newbie-type problems. These pages need to be grouped and identified as problem discussions for people who aren't already familiar with the solutions.


(text moved from ThingsYouShouldNeverDo)

Never use the gets() library function.

Use fgets() or getch() instead. -- NCSA Secure Programming Guidelines: DRAFT

libsafe claims to make gets() function calls "safe". The library intercepts calls to gets() and runs its own replacement instead. The replacement examines the stack frame at run time, estimates the buffer size, and limits buffer writes accordingly.

Ask yourself: does "estimates the buffer size" sound like safe coding practice, something to recommend to neophytes?

Thought so.


See: GatedCommunityPattern CharStar

CategoryPitfall, CategoryTutorial?, CategoryCee, CategoryCpp


EditText of this page (last edited March 30, 2010) or FindPage with title or text search