Conversion Character Abuse

The printf() function is most concerned with getting the number of conversion characters — the % placeholders — to match the number of variables specified. Beyond that, it’s rather ambivalent as to whether the types match properly.

Last week’s blog lesson discussed the virtues of signed and unsigned variable types, specifically with regards to how information is stored in memory. The compiler treats both types differently, but the information in memory is unaffected by how the compiler — and the running program — see the value.

Yeah, this is weird, but as long as you mind your signed and unsigned variable types in the code, you’ll be okay.

What gets even weirder is when the printf() conversion characters come into play.

If you’ve read my books or worked some of the examples on this web site, you’ve seen me occasionally use the %d placeholder in a printf() statement to display a char value. You’d expect to use the %c placeholder, which displays the character represented by the stored value. When you use %d, however, you see the character’s code value, which is closer to the information actually stored in memory.

The conversion characters are merely interpreters.

Consider the following code:

#include <stdio.h>

int main()
{
    char c = 0x40;

    printf("%c\n",c);
    printf("%d\n",c);
    printf("%u\n",c);

    return(0);
}

In the code, the char variable c is displayed by using three different conversion characters: %c shows the result as a character, %d shows the result as a signed int value, and %u shows the result as an unsigned int value. The code compiles without any warnings or errors.

Of course, it’s not a conversion character free-for-all in the printf() function. If you add another line to the code:

printf("%s\n",c);

You see a compiler warning displayed about mismatched types, specifically that variable c is not a pointer. The compiler is very concerned about pointer (and therefore string) errors because they deal with memory locations. The operating system jealously protects memory. Running the code after such a warning most likely generates a segmentation fault or other hideous error.

Beyond pointers, the use of the proper conversion character in printf() is pretty much up to interpretation, which brings me to the puzzle presented in last week’s lesson:

140	4294967180
141	4294967181
142	4294967182
143	4294967183

When you use the %u placeholder on an signed variable, the results are not open to interpretation: They’re wrong.

Here’s the code that generated the above output:

#include <stdio.h>

int main()
{
    unsigned char a;
    signed char b;
    int x;

    a = b = 0;
    for(x=0;x<400;x++)
    {
        printf("%3d\t%3u\n",a,b);
        a++; b++;
    }
    return(0);
}

Variable b is declared as a signed value. It can be positive or negative. When the %u (or in the code, %3u) placeholder is used, the signed value stored is interpreted as unsigned. The same value is stored in variable a, an unsigned char variable. So how does the computer make 4294967183 out of 143?

Before unraveling that riddle, remember that the range of a signed char variable is from -128 through 127. Because b is declared as signed, any value over 127 placed into that storage container is interpreted as negative. Variable a sees the value as positive. So when unsigned a is 128, signed b is -128. Likewise, when signed a is 255, unsigned b is -1. Then both values “roll over” to zero. (Review the output from last week’s blog post for a visual example.)

If you can fathom that concept, then what would make sense would be for the %u placeholder merely to convert the value -128 to 128 for both variables a and b. That’s not what happens, of course.

The reason for output going to 4294967183 is that the compiler (at least my compiler) uses four bytes to store a char value. When the improper conversion character is used, the printf() function sees all four bytes of storage in memory, not a char variable, and the resulting output looks screwy.

Bottom line: Although you can get away with substituting conversion characters for some variable types, and such substitution can be put to good use, be careful! When you get weird output, double-check that the types match.

Leave a Reply