An Update to My strcasecmp() Function

Many moons ago, I wrote about the non-standard function, strcasecmp(). It works like the C library function strcmp(), though it ignores character case. Turns out my return value from the function isn’t exactly correct.

The man page for strcasecmp() claims that the function returns “an integer greater than, equal to, or less than 0, according as s1 is lexicographically greater than, equal to, or less than s2 after translation of each corresponding character to lower-case.” This is the same language used for the strcmp() function. But this behavior isn’t consistent across all C libraries.

According to North Carolina State University Professor Dr. Greg Byrd, who emailed me on this topic, some compilers return 1, 0 or -1 and not the difference between the characters’ code values. In my code, I used this comparison:

ch = toupper(a) - toupper(b);

This statement reflects the man page description quoted earlier: The value of variable ch is returned from the function: zero when the characters match, the difference between them otherwise. According to Dr. Byrd, however, non-GNU C compilers behave differently. This confusion trips up his students because they anticipate values 1, 0, or -1 and nothing larger.

To address this inconsistency, yet maintain compatibility with the man page for (non-standard) function strcasecmp(), an update to the code is in order. Specifically, I chose to add my sign() function, which was part of this month’s Exercise (solution link).

Here is the updated code, which includes a call to my sign() function:

2021_03_13-Lesson.c

#include <stdio.h>
#include <ctype.h>

int sign(int a)
{
    /* value is negative */
    if( a<0 ) return(-1);

    /* value is positive */
    if( a>0 ) return(1);

    /* value is zero */
    return(0);
}

int strcasecmp(const char *s1, const char *s2)
{
    int offset,ch;
    unsigned char a,b;

    offset = 0;
    ch = 0;
    while( *(s1+offset) != '\0' )
    {
        /* check for end of s2 */
        if( *(s2+offset)=='\0')
            return( *(s1+offset) );

        a = (unsigned)*(s1+offset);
        b = (unsigned)*(s2+offset);
        ch = toupper(a) - toupper(b);
        if( ch<0 || ch>0 )
            return( sign(ch) );
        offset++;
    }

    return( sign(ch) );
}

void test(const char *s1, const char *s2)
{
    printf("%s v. %s = ",s1,s2);
    if( strcasecmp(s1,s2)==0 )
        puts("match");
    else
        puts("no match");
}

int main()
{
    char string1[] = "I drink coffee";
    char string2[] = "I DRINK COFFEE";
    char string3[] = "I drink tea";

    test(string1,string1);
    test(string1,string2);
    test(string1,string3);

    return(0);
}

The sign() function is called from within the strcasecmp() function at Lines 33 and 37. The value returned, ch or the difference between individual characters in the strings passed, is interpreted as -1, 0, or 1. The difference between the characters is no longer returned, which is what Dr. Byrd suggested.

The code still obeys the man page format; the value returned is positive, zero, or negative, though the values are restricted to 1, 0, and -1. Of course, the effect is that other coders who rely upon the return value to reflect the difference between specific characters will now be unable to do so. Still, the beauty of C is that you can always write your own function that behaves the way you want it to.

Thanks again to Professor Byrd from North Carolina State University for pointing out my oversight.

Leave a Reply