The C Variable Myth

A variable in C is a myth. Oh, yeah, it’s a location in memory. That’s pretty much it. After declaring the variable, the compiler — and you, the programmer — pretty much rely upon faith that the variable works and can actually be useful.

I confess that this topic is weird. Higher-level languages have more of a lock on the variable types. When you declare a string or an integer in those languages, the compiler asserts that declaration everywhere. In C, however, a variable type can be silly putty, with the emphasis on putty.

For example:

int x;

The above declaration directs the program to set aside several bytes of storage in which an integer value can be stored. On modern computers, that’s usually four bytes of storage. (Back when I learned C, it was two.)

The variable x isn’t initialized; its value isn’t set to zero or anything else. It’s basically four bytes of memory at a certain address. You can assign the variable a value, but you can also use it uninitialized to see what’s lurking in memory. (Uninitialized values are considered garbage.) Most programmers would opt to assign x a value and then use that integer for some useful purpose.

The myth part comes from treating x as an integer. Specifically, the code above declares x as a signed integer value by default. You could also tell the compiler to treat x as an unsigned integer by declaring it like this:

unsigned x;

The above declaration does nothing more to the four bytes of storage that the signed declaration does. It’s still just four bytes of storage lurking somewhere in memory. What matters — the mythical part — is how you treat the variable in your code, signed or unsigned.

Specifically, if a function requires an unsigned value, the above declaration wouldn’t generate a compiler warning. But when you use a signed int when a function requires an unsigned int, you get a warning. It’s not an error because the code compiles, although the result may be what you want.

That’s one mythical aspect of signed and unsigned variables. Another is which printf() conversion character is used to display the variable: The %d character is used for signed int values; %u for unsigned. But you can swap those conversion characters willy-nilly and the compiler won’t balk.

Well, it might, but it doesn’t matter because the variable is just composed of bits floating inside a four-byte frame somewhere in memory. The conversion character merely interprets the value. Remember that, and consider the following code:

#include <stdio.h>

int main()
{
    unsigned char a;
    signed char b;
    int x;

    a = b = 0;
    for(x=0;x<400;x++)
    {
        printf("%3u\t%3d\n",a,b);
        a++; b++;
    }
    return(0);
}

I’ve used char variables in the code above because they’re smaller than an int, one byte instead of four. The range for a signed char goes from -128 through 127; an unsigned char goes from 0 through 255. The output confirms that:

  0	  0
  1	  1
  2	  2
  3	  3

On up to…

126	126
127	127
128	-128
129	-127
130	-126

The value 128 is interpreted by %u as unsigned for variable a, but signed %d for variable b. The same bit value is stored in memory, 128, but the conversion characters determine how it shows up.

Hold that thought.

Because the range on a char variable is limited, you see its value roll over when x reaches 256:

253	 -3
254	 -2
255	 -1
  0	  0
  1	  1
  2	  2

I view the value (256) as being lopped off when its stored in the char variable: x holds the value 256 but that many bits can fit into the a or b variable’s container, so the value is truncated.

Now pick up the thought I told you to hold a few paragraphs back. Edit the code, specifically Line 12, to read:

printf("%3d\t%3u\n",a,b);

I’m swapping the %3d and %3u in the printf() statement.

Next week I’ll explain what happened and why you see this type of output:

140	4294967180
141	4294967181
142	4294967182
143	4294967183

Leave a Reply