HexWords

Hexadecimal, or counting base 16, uses letters A through F to represent values 11 through 15. This base — “hex” — is common in programming as it works as a shorthand for binary values. But the letters used are also letters, which means that they can spell words.

A recent challenge on Rosetta Code is to use a digital dictionary to find all hex words with four or more letters. These are words like FACE or CAFE, which are hex values 64,236 and 51,966, respectively.

The challenge went on to order the results and perform other magic, but it got me curious.

Last year I wrote a series about accessing the Linux dictionary file and then plundering it for various words. I can use the same techniques to find hexwords, or those words in the electronic dictionary that are composed only of letters A through F.

I approach this problem in two steps.

First read every word in the digital dictionary. It’s found at /usr/share/dict/words for most Linux configurations.

Second, scan each found word for matches with the letters A through F, both upper- and lowercase. This part may seem like a lot of work, but the scanf() function has a special mode that finds only specific letters. I wrote about this letter filter several years ago.

Here is my code, which outputs all hexwords found in the Linux dictionary:

2024_11_09-Lesson.c

/* Hex Words */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* this code assumes the following path is valid */
#define DICTIONARY "/usr/share/dict/words"
#define SIZE 32

int main()
{
    FILE *dict;
    char word[SIZE],hexword[SIZE],*r,*w;

    /* open the dictionary */
    dict = fopen(DICTIONARY,"r");
    if( dict==NULL )
    {
        fprintf(stderr,"Unable to open %s\n",DICTIONARY);
        exit(1);
    }

    /* read the dictionary */
    while( !feof(dict) )
    {
        /* read a word from the dictionary */
        r = fgets(word,SIZE,dict);
        if( r==NULL )    /* no word, done */
            break;

        /* remove newline */
        w = word;
        while(*w)
        {
            if( *w=='\n' )
            {
                *w = '\0';
                break;
            }
            w++;
        }

        /* pull out only hex characters */
        sscanf(word,"%[ABCDEFabcdef]",hexword);

        /* compare hexword with original word */
        if( strcmp(word,hexword)==0 )
            printf("%s\n",hexword);
    }


    /* clean-up */
    fclose(dict);
    return(0);
}

The code’s while loop is stolen directly from the earlier post, which scans all words in the dictionary. The found word is stored in buffer word[]. An inner while loop replaces the newline (read by the fgets() function) with a null character, which makes for better matching later in the code.

The sscanf() function scans the dictionary word and returns only the portion that contains upper- and lowercase letters A through F.

sscanf(word,"%[ABCDEFabcdef]",hexword);

This result is saved in buffer hexword[]. If both word[] and hexword[] match (the result of the strcmp() function is zero), a true hexword is found. A printf() statement outputs the results.

A sample run of the program generates 120 positive hits. Here’s a snapshot of the output:

A
AA
AAA
AB
ABC
AC
AF
AFC
...
facade
face
faced
fad
fade
faded
fed
fee
feed

The original Rosetta Code challenge limited output to words four characters long or greater. The code above apply this restriction, which I add next week, along with other updates to sate my inner nerd.

Leave a Reply