Dumping the Screen in W-I-D-E Color

Updating the hexdump utility with color is good, but adding wide characters for output is even better. From last week’s Lesson, I’m adding wide character output to generate codes for non-printing ASCII values 0 through 31 and 127.

I cover wide character output in a series of posts from 2017. The key is to set the locale, then use the wide character functions for input and output. Though these directions seem easy, it took me two tries to perfect the output.

My first attempt uses wide character output functions, as well as “long” strings and character values:

2025_12_13-Lesson-a.c

#include <stdio.h>
#include <locale.h>
#include <wchar.h>

#define RED "\e[31m"
#define NORMAL "\e[m"

int main()
{
    wchar_t chw;

    /* set the local for wide characters */
    setlocale(LC_CTYPE,"UTF-8");

    for( chw=0; chw<=127; chw++ )
    {
        wprintf(L"%02X %03d ",chw,chw);
        if( chw<32 )
            wprintf(L"%s%lc%s",RED,chw+L'@',NORMAL);
        else if( chw==127 )
            wprintf(L"%s%lc%s",RED,9249,NORMAL);
        else
            putwchar(chw);
        putwchar(L'\n');
    }
    return 0;
}

The definitions for the ANSI RED and NORMAL codes don’t need to be long strings. I tried declaring them as such originally, but the output was weird. So they’re coded as standard ASCII strings.

Wide character (wchar_t) variable chw is used for output and as the looping counter.

The setlocale() function sets the character type as UTF-8: setlocale(LC_CTYPE,"UTF-8");

The wprintf() function outputs each line of text. The format string is prefixed with an L to indicate a wide string. For values 0 through 32, the character output is chw+L'@'. The L prefix sets a wide character, which is the value of chw plus @. For the value 127, Unicode character 9249 is output (U-2421 or ␡).

The putwchar() function outputs single wide characters; the wide newline character is prefixed with an L: L'\n'

Here’s the program’s truncated output:

00 000 @
01 001 A
02 002 B
03 003 C
04 004 D
05 005 E
06 006 F
07 007 G
...
78 120 x
79 121 y
7A 122 z
7B 123 {
7C 124 |
7D 125 }
7E 126 ~
7F 127 DEL

The control codes are output using their letter equivalents color-coded RED: @ for Ctrl+@, A for Ctrl+A, and so on.

Curiously, the program generates multiple characters for code U-2421: DEL instead of ␡. This behavior is considered normal. In fact, I updated the code, replacing Line 18 with the following:

wprintf(L"%s%lc%s",RED,chw+9216,NORMAL);

The result of the expression chw+9216 generates the following output (first 32 lines):

00 000 NUL
01 001 SOH
02 002 STX
03 003 ETX
04 004 EOT
05 005 ENQ
06 006 ACK
07 007 BEL
08 008 BS
09 009 HT
0A 010 LF
0B 011 VT
0C 012 FF
0D 013 CR
0E 014 SO
0F 015 SI
10 016 DLE
11 017 DC1
12 018 DC2
13 019 DC3
14 020 DC4
15 021 NAK
16 022 SYN
17 023 ETB
18 024 CAN
19 025 EM
1A 026 SUB
1B 027 ESC
1C 028 FS
1D 029 GS
1E 030 RS
1F 031 US

Individual characters are output instead of single Unicode characters. Despite their appearance, these are Unicode characters. The problem I discovered is with the setlocale() function. To fix the output, the setlocale() function must be updated to read:

setlocale(LC_ALL,"");

After making this change, the first 32 lines output now look like this:

00 000
01 001
02 002
03 003
04 004
05 005
06 006
07 007
08 008
09 009
0A 010
0B 011
0C 012
0D 013
0E 014
0F 015
10 016
11 017
12 018
13 019
14 020
15 021
16 022
17 023
18 024
19 025
1A 026
1B 027
1C 028
1D 029
1E 030
1F 031
...

The characters used for the control codes may not look exactly as shown above; the specific format depends on the terminal’s typeface. Regardless, the updated code is found here on GitHub.

Now that I’ve figured out how to represent non-printing ASCII codes, I could update my improved hexdump utility, but I still need a way to represent values 128 through 255. I cover this task in next week’s Lesson.

2 thoughts on “Dumping the Screen in W-I-D-E Color

  1. I prefer the 2 or 3 separate characters rather than the single diagonal characters for the simple reason that I can actually read them! I can only read ␀ etc. if I zoom in to 150%.

Leave a Reply