String Storage Mysteries

String storage is one of those frustrating things in the C language. Specifically, it’s that null character, \0, that appears at the end of every string. Is that character counted when you input a string? Copy a string? Create storage for a string? It’s a mystery that could drive you nuts.

The null character, \0, is the final character in a string, which is really a special type of char array. The character is required. If your code manually constructs a string, you must remember to append the \0. Plus, you must account for that character when you allocate string storage. That means if you need up to 32 characters for a string, you allocate 33 characters of storage, with the extra byte for the null character.

As an example, the fgets() function reads size characters minus one. So if you have a 32 character buffer, you might specify the following fgets() statement to read text into that buffer:

fgets(buffer,32,stdio);

The above statement reads up to 31 characters into buffer, reserving the final character for the \0.

The snprintf() function works like printf(), but generates a string as output. The n is a size value, which is the length of the output string — minus one for the null character.

And then you have the strlen() function.

The strlen() function returns the length of a string, but it doesn’t count the null character. A few programmers allocate storage based on the value returned from strlen(), forgetting about the \0 tagging along like the caboose on a train.

In the following code, the strlen() function returns a string’s length. I then use a pointer to march through the string, stopping after the null character. Subtracting this pointer’s address from the string’s base address yields how much actual storage the string uses.

#include <stdio.h>
#include <string.h>

int main()
{
    char *string = "This string is 34 characters long.";
    char *s;

    printf("The string '%s' is %d characters long.\n",
            string,
            strlen(string));

    /* find the end of the string */
    s = string;
    while(*s++)
        ;
    printf("Storage size is %d.\n",s - string);

    return(0);
}

Here’s sample output:

The string 'This string is 34 characters long.' is 34 characters long.
Storage size is 35.

To further examine what’s going on, I ran the program through the Code::Blocks debugger. Figure 1 illustrates the string’s memory dump, showing how it’s stored inside the PC.

Figure 1. The string stored in memory, as viewed in the Code::Blocks debugger.

Figure 1. The string stored in memory, as viewed in the Code::Blocks debugger.

A dump of the variables, string and s, is shown in Figure 2.

Figure 2. The values stored in pointers string and s at Line 17 in the code.

Figure 2. The values stored in pointers string and s at Line 17 in the code.

The memory locations shown in Figures 1 and 2 aren’t the same for all computers or even the same computer when running the code, but the calculations remain constant: Pointer s is incremented until it roosts after the location of the null character in the string. The math calculates the true storage used as opposed to the strlen() function that returns on the number of actual characters in the string.

In Figure 2, address 0x403047 minus address 0x403024 equals 0x23, which is 35 decimal.

The bottom line is that you must remember to account for the null character when you deal with a string. The C language I/O functions do so automatically, but strlen() does not.

Leave a Reply