Trigraph Sequences

I doubt you’ve ever used a trigraph. If you saw a trigraph in some C code, you might assume it was a typo or, from the early days of telecommunications, a modem burp. But trigraphs present a legitimate if not arcane way to represent certain characters, a holdover from the days of teletype input and primitive, barely-ASCII keyboards.

As the primary input devices back in the mainframe days, teletype machines lacked certain symbols now common on computer keyboards:

# [ \ ] ^ { | } ~

If you’re going to code C, even in 1973 on a teletype machine connected to a mainframe across campus, you need to use these characters. To emulate them, the trigraph sequence is available. Here’s the definition from the C standard:

The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set.

A trigraph sequence starts with two question marks, ??. The third character references the missing symbol, as shown in this table:

Trigraph Char. Trigraph Char. Trigraph Char.
??= # ??( [ ??/ \
??) ] ??’ ^ ??< {
??! | ??> } ??- ~

In older C code, you might see something like:

printf("phone ??= ");

The ??= trigraph represents the # character. It’s translated by the precompiler into the proper character. Then the source code is compiled.

Modern compilers dislike trigraphs. Even your editor may choke on the sequence, improperly interpreting them and causing any context-based color coding to go haywire. Yet, according to the C standard, trigraphs are still valid in C.

The following code outputs the pantheon of trigraph characters. Presenting them in a string is the only way I could keep my editor from getting cross with me:

2021_10_16-Lesson.c

#include <stdio.h>

int main()
{
    char trigraph[] =  "??= ??( ??/ ??) ??' ??< ??! ??> ??-";

    printf("%s\n",trigraph);

    return(0);
}

Beyond making the editor uncomfortable, trigraphs are flagged with warnings by modern compilers. The above code generates 9 warnings with clang, one for each trigraph — even with the trigraphs enclosed in double quotes. The warning states that the trigraph is ignored.

Here’s the program’s output:

??= ??( ??/ ??) ??' ??< ??! ??> ??-

To enable the trigraphs, you must compile under an older C standard. Yep, even though the current standard allows for trigraphs, your compiler may choose to ignore them. To properly process the trigraphs, use the -std=c89 switch to force the compiler to generate a program compatible with the C89 standard:

clang -std=c89 2021_10_16-Lesson.c

Warnings about the trigraphs may still appear, though the warning now states that the trigraph is converted into the proper, corresponding character. Further, because the ??/ trigraph is converted to the \ (backslash), the compiler reports a missing escape character. Here is the updated output:

# [ ] ^ { | } ~

To address the missing escape character, double-up on the ??/ trigraph: ??/??/ The double backslash escapes the backslash character, which is now output:

# [ \ ] ^ { | } ~

Trigraphs are an interesting and delightfully cryptic relic of days gone by. I really wish I was actively coding back when these things were common and the nerds knew the trigraph sequences by heart. It’s sad to lose such legacies.

Leave a Reply