ASCII Programming Tricks, Round II

You probably don’t even think about it. How do the toupper() and tolower() functions do their magic? Both functions take a letter of one case and convert it to the same letter of the opposite case. They don’t touch any other characters or symbols.

Well, how would you code it?

Don’t worry; this post isn’t an Exercise.

If the ASCII table weren’t laid out as carefully as it is, then converting between uppercase and lowercase would involve a lot of coding overhead. In fact, the conversion process would probably be 26 lines of code, one statement for each letter of the alphabet. Thankfully, that’s not the case.

Ha-ha.

If you whip out a handy ASCII chart (ASCII web site) or review the code from the ASCII Tricks Lesson, you can observe the differences in code values between uppercase and lowercase letters.

The code for letter 'A' is 65 and for letter 'a' is 97.

Those are decimal code values, so they’re unimpressive. Behold the hexadecimal code values:

The hex code for letter 'A' is 0x41 and for letter 'a' is 0x61.

That’s a difference of 0x20 between each letter. The difference holds true for all letters of the alphabet, A to Z: Add 0x20 to 'A' to get 'a'; subtract 0x20 from 'a' to get 'A'.

Oh, and by the way: Comparison-wise, uppercase letters are “less” than lowercase. That calculation is based on the character’s ASCII code value. It’s also what the strcmp() family of functions use to compare two strings.

Unlike working with ASCII character codes for characters '0' through '9', you can’t just add or subtract 0x20 from an alphabetic character code value without first knowing the character’s current case. Therefore, I recommend that you instead use binary logic operators to make the conversion. That way, for instance, converting an uppercase letter to uppercase by mistake doesn’t mess up anything.

To convert a letter to uppercase, you need to remove bit 0x20. The operation is & 0xDF, AND 0xDF, illustrated in Figure 1.

Figure 1. Converting from 'a' to 'A' by using the logical & operator.

Figure 1. Converting from ‘a’ to ‘A’ by using the logical & operator.

To convert a letter to lowercase, you need to set bit 0x20. The operation is | 0x20 OR 0x20, illustrated in Figure 2.

Figure 2. Edit.

Figure 2. Converting from ‘A’ to ‘a’ by using the logical | operator.

Refer to my various For Dummies C programming books for details on binary logic operators and how they affect values.

Here’s sample code to demonstrate how the ASCII codes for alphabetic characters can be manipulated between cases:

#include <stdio.h>

int main()
{
    char *alphabet = "ABcdeFGhIJkLmNoPqRstuvWXYz";
    char *a;

/* Show the setence as-is */
    printf("Default: %s\n",alphabet);

/* Convert to all caps */
    printf("  Upper: ");
    a = alphabet;
    while(*a)
    {
        putchar( *a & 0xDF);
        a++;
    }
    putchar('\n');

/* Conver to l/c */
    printf("  Lower: ");
    a = alphabet;
    while(*a)
    {
        putchar( *a | 0x20);
        a++;
    }
    putchar('\n');

    return(0);
}

Here is the output:

Default: ABcdeFGhIJkLmNoPqRstuvWXYz
  Upper: ABCDEFGHIJKLMNOPQRSTUVWXYZ
  Lower: abcdefghijklmnopqrstuvwxyz

This trick works on all letters of the alphabet, but the logical operators also affect other characters and codes. So if you use this technique, you must first test to determine whether a letter is being manipulated. You can pursue that solution on your own.