A Foolish Way to Read a String

Back in the old days, the obvious and logical way to read a string was to use the gets() function, where gets stands for “get string.” That makes sense, but only a hardy fool would use that function today.

The gets() function was cousin to the getchar() function. They’re input functions, similar to to puts() and putchar(), which are output functions. The getchar(), putchar(), and puts() functions are still actively used, whereas the gets() function has been deprecated.

A deprecated function was once a valid part of the C Library, but its use is now discouraged. The thought is that eventually the function will disappear, so programmers are encouraged to use something else, typically a newer function or some other alternative.

The problem with gets() is that it doesn’t measure input; it lacks bounds checking. So it’s possible to stuff 200 characters of input into storage for a 64-character string. What happens to the extra 136 characters? Why, they’re loaded into memory, stomping over whatever is already there.

Clever and admittedly evil programmers exploited the gets() function’s weakness to code malicious software. Because of that, millions of lines of code — including code in every major operating system — were examined to find and remove every gets() statement.

You can still use gets() today, although I don’t recommend it. Here’s sample code:

#include <stdio.h>

int main()
{
    char buffer[32];

    printf("Type something: ");
    gets(buffer);
    printf("You typed '%s'\n",buffer);

    return(0);
}

The gets() function at Line 8 reads standard input. The characters typed — up to but not including the Enter key — are stored in the char array buffer.

If you build the code, you might see a warning, although not every compiler displays that warning.

When you run the code, you may see something like this:

warning: this program uses gets(), which is unsafe.

Then the program runs, reading a string and then displaying the string.

Of course, if you type more than 32 characters, who knows where they go or how the code is affected?

The Bad Guys would input binary data or executable code. Sometimes they’d load hundreds of thousands of bytes of data, knowing that it would overwrite something in memory that they could exploit. That sounds difficult to do, but as history shows, the process was successful and many systems were infected.

Next Lesson I’ll review some popular and unpopular alternatives to the gets() function.