The Ctype Functions


Any C programmer eager to mess with characters or strings knows about the handy ctype functions. I use this name because these functions, which include a few macros, are defined in the ctype.h header file. Their job is to manipulate and examine characters.

I divide the ctype functions into the “to” and “is” categories.

The “to” functions manipulate characters. Only two of them are available: toupper() and tolower(), which convert a character from lowercase to uppercase and vice-versa, respectively. Both functions start with “to.”

The “is” functions return TRUE or FALSE based on the character’s attributes. For example, isalpha() returns TRUE when the character examined is alphabetic, upper- or lowercase. The function starts with “is,” which is how I define this category. Lotsa “is” ctype functions are available:

isalnum()
isalpha()
isascii()
isblank()
iscntrl()
isdigit()
isgraph()
islower()
isprint()
ispunct()
isspace()
isupper()
isxdigit())

All of these functions are defined in the ctype.h header file. They all have a similar man page format. For example:

int isspace(int c)

The argument c is specified as an integer, though it must have the value of an unsigned char or EOF. (The EOF is why the prototype is an integer, which allows this function to work with standard I/O.)

The return value is non-zero for a TRUE or positive result, zero otherwise. For example, isspace(' ') returns TRUE when character c is a whitespace character.

These functions work reliably on standard ASCII characters. Supposedly, they can also function in other languages when the locale is set, and variations on the functions are available to handle different alphabets. I’ve been unable to verify whether this feature works. So, my exploration of these functions is limited to standard ASCII, the Latin alphabet.

It’s easy to guess what each of the function does based on the name, though some are kinda weird. Here are brief descriptions:

isalnum() returns TRUE for letters of the alphabet (both upper- and lowercase) as well as digits 0 through 9.
isalpha() returns TRUE for an alphabetic character, both upper- and lowercase.
isascii() returns TRUE if character c is an ASCII character, codes 0 through 127.
isblank() returns TRUE for a space (' ') or tab ('\t') character.
iscntrl() returns TRUE for a control character, ASCII codes 0 through 31 (0x00 through 0x1F).
isdigit() returns TRUE when the character is a digit, zero through 9.
isgraph() returns TRUE for all printable characters except for a space.
islower() returns TRUE for a lowercase character.
isprint() returns TRUE for all printable characters, including the space.
ispunct() returns TRUE for a character that is not a space or alphanumeric.
isspace() returns TRUE for any whitespace character, including space, tab, form feed, newline, carriage return, and vertical tab.
isupper() returns TRUE for an uppercase letter.
isxdigit() returns TRUE for characters used in hexadecimal values, zero through 9 and A through F both upper- and lowercase.

Over the next few weeks, I’ll cover these functions and how they work. I’ll also present code that emulates the functions just because it’s a fun thing to do!

These are “ctype” functions.

7 thoughts on “The Ctype Functions

  1. I will admit to using the above functions myself from time to time. They can be useful, so I donʼt want to judge them too harshly.

    Even with setlocale() (and as noted in the man pages) they only work with extended ASCII, however. These functions donʼt offer any support for Uɴɪᴄᴏᴅᴇ… at all.

    The simplest solution to obtain Uɴɪᴄᴏᴅᴇ-compatible character classification would be the FSFʼs unistring library:

    /* sudo apt install -y libunistring-dev; Linker: -lunistring */
    #include <unictype.h>

    if (uc_is_alpha (U'ま')) /* Hiragana Letter 'ma' */
    fprintf (stdout, "Hiragana 'ma' is classified as a letter.\n");

    Regrettably, this library comes with its own weaknesses. For example, it doesnʼt provide any functions to recognize numerical values that arenʼt positional (like the Chinese logographic number system):

    if (uc_is_digit (U'五')) /* Chinese Hanzi Numeral '5' */
    fprintf (stdout, "Wonʼt recognize 五 as a numeral (with value 5).\n");

    In practice, the Unicode Consortiumʼs International Components for Unicode is probably the best option:

    /* sudo apt install libicu-dev; pkg-config –cflags –libs icu-uc */
    #include <unicode/uchar.h>

    double value = u_getNumericValue (U'五');

    if (value != U_NO_NUMERIC_VALUE)
    fprintf (stdout, "Represents the numeric value %d\n", (int)value);

    Hereʼs a list of ICU character classification functions comparable to those in <ctype.h>

    ctype.h     ICU equivalent
    isalpha()   u_isalpha()   All Unicode letters
    islower()   u_islower()   Unicode lowercase letters
    isupper()   u_isupper()   Unicode uppercase letters
    isdigit()   u_isdigit()   Unicode (positional) digits
    isxdigit()  u_isxdigit()  Hex digits [0-9a-fA-F]
    isalnum()   u_isalnum()   Letters + digits
    isspace()   u_isspace()   All unicode whitespace characters
    isblank()   u_isblank()   Horizontal whitespace only
    isgraph()   u_isgraph()   Visible (non-space) characters
    isprint()   u_isprint()   Printable characters (including space)
    ispunct()   u_ispunct()   Punctuation characters
    iscntrl()   u_iscntrl()   Control characters

    To illustrate the differences, I wrote a small example application that checks for all Uɴɪᴄᴏᴅᴇ characters that should be interpreted as whitespace. Here is its output:

    Code   Name                    isspace uc_is_space u_isspace
    ————————————————————
    U+0009  HT (TAB)                   1        1          1     
    U+000A  LF                         1        1          1     
    U+000B  VT                         1        1          1     
    U+000C  FF                         1        1          1     
    U+000D  CR                         1        1          1     
    U+001C  FS                         0        0          1     
    U+001D  GS                         0        0          1     
    U+001E  RS                         0        0          1     
    U+001F  US                         0        0          1     
    U+0020  SPACE                      1        1          1     
    U+0085  NEL                        0        0          1     
    U+00A0  NO-BREAK SPACE             0        0          1     
    U+1680  OGHAM SPACE MARK           0        1          1     
    U+2000  EN QUAD                    0        1          1     
    U+2001  EM QUAD                    0        1          1     
    U+2002  EN SPACE                   0        1          1     
    U+2003  EM SPACE                   0        1          1     
    U+2004  THREE-PER-EM SPACE         0        1          1     
    U+2005  FOUR-PER-EM SPACE          0        1          1     
    U+2006  SIX-PER-EM SPACE           0        1          1     
    U+2007  FIGURE SPACE               0        0          1     
    U+2008  PUNCTUATION SPACE          0        1          1     
    U+2009  THIN SPACE                 0        1          1     
    U+200A  HAIR SPACE                 0        1          1     
    U+2028  LINE SEPARATOR             0        1          1     
    U+2029  PARAGRAPH SEPARATOR        0        1          1     
    U+202F  NARROW NO-BREAK SPACE      0        0          1     
    U+205F  MEDIUM MATHEMATICAL SPACE  0        1          1     
    U+3000  IDEOGRAPHIC SPACE          0        1          1

    In my view <ctype.h> simply doesnʼt cut it anymore, libunistring / ICU4C for the win!

  2. Outstanding info.

    I’m too chicken to reconfigure a computer for another language. I did that once; Korean still shows up on one of my old Windows 10 boxes. But I agree that many of these functions were written in an era when locality was assumed to always be local.

  3. Thank you for your kind words! I was hoping that my comments would be useful ☺

    My main motivation for the above is that I believe the community needs to clarify that modern C is—of course—a fully Uɴɪᴄᴏᴅᴇ-capable language, a language that provides suitable functions for working with text in various languages and is therefore suitable for developing international applications.

    If we donʼt do this, C will inevitably be seen as inferior to modern languages like Rust or Swift… thus wonʼt be able to survive beyond this half-century outside of legacy applications.

  4. I don’t think C23 improves the standard, though there may be hope for the next standard – or even a subset of C that creates this level of compatibility. It would be nice to see an implementation as an update and not a reintroduction of C as yet another OOP language or some “improved” version of Java.

  5. “It would be nice to see an implementation as an update and not a reintroduction of C as yet another OOP language or some “improved” version of Java.”

    My sentiment exactly, I fully agree. Looking through WG14’s document log, the standards committee unfortunately seems to be bent on adding features like defer and Closures in C as well as quite a few other language extensions to bring C closer to C++ once again (also quite a few additions to the preprocessor). In the coming years, C will probably no longer remain the small language with the simple syntax that we have all come to love…

  6. BASIC was extremely popular in the 1970s and early 80s, specifically on microcomputers of the day. Each one came with its own BASIC.

    Kemeny and Kurtz, who developed BASIC at Dartmouth, wanted in on the action. So they produced TRUE BASIC, which to me smelled a lot like Pascal and wasn’t very BASIC-y at all. My guess is that the same thing may happen to C – not that it hasn’t already happened a dozen times already.

  7. Before I was introduced to Turbo Pascal in secondary school, I experimented with GW-BASIC under MS-DOS 3.x. Having never heard of True BASIC before I just took a look at it. Youʼre right, the syntax does suspiciously look like Pascal!

    Anyway, with the renewed momentum in the development of C itʼs unfortunately all too likely that C2y and its successors will feel markedly different from C89…

Leave a Reply