The hexdump utility is a marvelous tool for grabbing a sneak peek at a file’s innards, especially when debugging code that performs file access. As a text mode tool, however, it could stand to use some colorful character improvement.
The hexdump utility is a filter. Run at the command prompt, it dumps a file by name or by using redirected input. Here’s a quickie test run:
~$ hexdump
Hello!
0000000 6548 6c6c 216f 000a
0000007
~$
My favorite view is “canonical” mode, activated by using the -C switch:
~$ hexdump -C
Hello!
00000000 48 65 6c 6c 6f 21 0a |Hello!.|
00000007
~$
Seeing both the hex dump and ASCII equivalents can truly help disclose a file’s contents. In fact, I used hextdump a few days ago to dump some word processing files saved years ago by antique software. This tool helped me view the text portion of correspondence, which saves time tussling with Microsoft Word to extract text.
The Exercise for March 2020 was to code a hexdump-like utility. My solution involved reading a file for input, though it can easily be converted into a filter for more flexible input. I’ll use this program as a base for my updated hexdump/color utility, though it requires lotsa modification.
My first goal is to decide how to use colors to help present more than just a period in the ASCII column for non-printable characters. For example, control codes can be output color-coded red. Each code has a corresponding character: ^@ for the null character (\0), ^A for control code 1, ^B for 2, on up to ^_ for code 31. No standard equivalent exists for code 127, the “delete” character: Unicode U-2421 (␡) is often used on the web as well as U-247F ( ⑿ for some reason).
The following code churns through ASCII values zero through 127. For the control codes (values zero through 31), the corresponding character is output in red:
2025_12_06-Lesson.c
#include <stdio.h>
#define RED "\e[31m"
#define NORMAL "\e[m"
int main()
{
unsigned char ch;
for( ch=0; ch<=127; ch++ )
{
printf("%02X %03d ",ch,ch);
if( ch<32 )
printf("%s%c%s",RED,ch+'@',NORMAL);
else
putchar(ch);
putchar('\n');
}
return 0;
}
Variable ch is declared as an unsigned char, which prevents the for loop from repeating endlessly. (A signed value resets ch negative after 127 and the loop continues endlessly.)
Within the for loop, the hexadecimal and decimal values of variable ch are output. When ch is less than 32, its ASCII equivalent character is output color-coded red. Otherwise, the character is output directly. Refer to this blog post for information on ANSI color text output.
Here’s a truncated sample run:
00 000 @
01 001 A
02 002 B
03 003 C
04 004 D
05 005 E
06 006 F
07 007 G
...
78 120 x
79 121 y
7A 122 z
7B 123 {
7C 124 |
7D 125 }
7E 126 ~
7F 127
For value 127 (del), I’d like to output the Unicode character ␡ as the equivalent. Additionally, for character code values 128 through 255, special characters can also be generated, perhaps in color. In fact, equivalents for the control code characters exist as well: code 0, ^@ is ␀. This update to the code requires that I retrofit it with wide character output. I begin this task in next week’s Lesson.
I did something similar once and used an array indexed with ASCII codes and containing the printable characters and text descriptions of non-printable ones. This is the code to create the array. The argument is an empty char pointer array.
void populate_mappings(char** mappings)
{
// initialize to default values
for(int i = 0; i <= 127; i++)
{
mappings[i] = malloc(2);
sprintf(mappings[i], “%c”, i);
}
// replace non-printable characters with descriptions
set_value(mappings, 0, “[null]”);
set_value(mappings, 1, “[start of heading]”);
set_value(mappings, 2, “[start of text]”);
set_value(mappings, 3, “[end of text]”);
set_value(mappings, 4, “[end of transmission]”);
set_value(mappings, 5, “[enquiry]”);
set_value(mappings, 6, “[acknowledge]”);
set_value(mappings, 7, “[bell]”);
set_value(mappings, 8, “[backspace]”);
set_value(mappings, 9, “[tab]”);
set_value(mappings, 10, “[line feed]”);
set_value(mappings, 11, “[vertical tab]”);
set_value(mappings, 12, “[form feed]”);
set_value(mappings, 13, “[carriage return]”);
set_value(mappings, 14, “[shift out]”);
set_value(mappings, 15, “[shift in]”);
set_value(mappings, 16, “[data link escape]”);
set_value(mappings, 17, “[device control 1]”);
set_value(mappings, 18, “[device control 2]”);
set_value(mappings, 19, “[device control 3]”);
set_value(mappings, 20, “[device control 4]”);
set_value(mappings, 21, “[negative acknowledge]”);
set_value(mappings, 22, “[synchronous idle]”);
set_value(mappings, 23, “[end of trans. block]”);
set_value(mappings, 24, “[cancel]”);
set_value(mappings, 25, “[end of medium]”);
set_value(mappings, 26, “[substitute]”);
set_value(mappings, 27, “[escape]”);
set_value(mappings, 28, “[file separator]”);
set_value(mappings, 29, “[group separator]”);
set_value(mappings, 30, “[record separator]”);
set_value(mappings, 31, “[unit separator]”);
set_value(mappings, 32, “[space]”);
set_value(mappings, 127, “[delete]”);
}
Extended ASCII is standard with the proviso of the old joke that the great thing about standards is there are so many to choose from.
I’ve just read Code by Charles Petzold which includes an interesting chapter on the teleprinter origins of ASCII as well as a good description of the inner workings of Unicode.
I missed a bit, the set_value function.
bool set_value(char** array, int index, char* value)
{
array[index] = realloc(array[index], strlen(value) + 1);
strcpy(array[index], value);
}
Nice!