Safe Coding Practices – Terminating a String

String constants and strings created or manipulated by C library functions all set that terminating null character, '\0'. When you build your own strings, however, it’s easy to completely forget that null character. I know. I’ve done it.

A lot of safe coding practice in C deals with strings. From my books and courses, you’ve read that a string isn’t really a variable type in C; a string is a char array. The final character in the “string” array must be a null character, \0. That character’s appearance is how the C language (and just about every other programming language) spies the end of a string — unless you forget to add that character.

The following code builds a string and forgets to cap it with a null character:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char *alphabet;
    int a;

    alphabet = (char *)malloc( sizeof(char)*26);
    if( alphabet == NULL)
    {
        fprintf(stderr,"Memory allocation error.\n");
        exit(1);
    }

    for(a=0;a<26;a++)
        *(alphabet+a) = 'A'+a;

    printf("Here's your alphabet: %s\n",alphabet);

    return(0);
}

Storage space for the string is allocated by the malloc() function at Line 9. Only a 26-character buffer is created, one for each letter of the Latin alphabet. (No room for the null character!) And the array is built in Lines 16 and 17. Line 19 prints the result, which looks like this on my computer:

Here's your alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ[?

Output on another system might look different. For example, you might be lucky and memory is populated with zero-value bytes so the output looks perfect, but the code is still wrong: The string lacks a terminating character and storage lacks room for that character. The array is still valid, and it could be used as an array of single char values, just not as a string.

To ensure that you’re working with a string and not just a collection of char variables waiting at a bus stop, cap the string with a null character, \0. This practice must be done any time you cobble together a string, which happens frequently in C.

Here is the improved code:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char *alphabet;
    int a;

    alphabet = (char *)malloc( sizeof(char)*26+1);
    if( alphabet == NULL)
    {
        fprintf(stderr,"Memory allocation error.\n");
        exit(1);
    }

    for(a=0;a<26;a++)
        *(alphabet+a) = 'A'+a;
    *(alphabet+a) = '\0';

    printf("Here's your alphabet: %s\n",alphabet);

    return(0);
}

In Line 9, I added a +1 inside the malloc() function. This format is a mnemonic to ensure that the null character is accounted for, and I (try to) do it in all my code. In fact, last week’s Lesson used the +1 in a realloc() statement that combines both strings.

Room for the null character must also be accounted for when you allocate a string array directly, as in alphabet[27].

At Line 18, the null character is appended to the end of the alphabet array. The alphabet+a math works because I know that the value of a was incremented one final time after the for loop stopped spinning; the loop’s terminating condition is a<26. Therefore, a must equal 26 after the for loop is done. At that position — alphabet+a or the 27th element of the array — the character \0 is set, terminating the string.

The code is now safe and the string will be handled properly elsewhere in the code because its null-terminated. The output is also consistent:

Here's your alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ

4 thoughts on “Safe Coding Practices – Terminating a String

  1. If you were writing an application that uses malloc a lot to create strings you could write a function called something like mallocstring which calls malloc but adds the 1 for you.

    In fact the number of string related functions you could write is vast!

  2. I was told that C was weak when it came to strings, and I didn’t believe it because the standard library features lots of string functions. Still, after seeing what some OO languages offer for strings, it lets me realize how weak C really is.

    Great idea on the mallocstring() function.

  3. string.h looks comprehensive at first glance, but you still need to do quite a bit of work yourself when dealing with strings. GLib has a comprehensive set of string handling functions which I need to explore some time.

    Obviously C cannot match an OO language in terms of ease of use and clarity, but anything you can do (by which I mean achieve as an end result) in, say, C++ you can do in C – you just cannot tie the variables and the functions which work on them together.

    You can simulate OOP in C by adding function pointers to structs. I have done it a few times but can never decide if it’s a good idea or whether if it is just a futile attempt to make C something it is not.

    Basically I think we need to regard C as a specialist and relatively low-level language and accept that means a bit more work.

  4. I like the way you put that, Chris: “it is just a futile attempt to make C something it is not.” That explains a lot, especially to beginners. C has a great purpose, but it’s not well-designed for some things. It’s like being a vegan and eating tofu bacon. C is tofu and an OO language is bacon. I hope that makes sense.

Leave a Reply