Sorting the Hexwords, Part I

The Linux dictionary stores its words sorted alphabetically. Therefore, the output from the program presented in last week’s Lesson shows valid hexwords (letters A through F, four letters or longer) in alphabetic order. But what if I want the words output in numerical order?

No correlation exists between a hexadecimal value’s alphabetic sort and its numeric sort. For example DECAF is 912,559 but DEED is 57,069. Alphabetically, DEED is higher but numerically DEFAC is higher. If I desire to sort the hexword output by value, I must collect all the words first, then sort them by value before outputting the results. Accomplishing this task requires some rewriting of the code from last week’s Lesson.

To build a list of valid hexwords I define a structure:

struct entry {
    char hex[SIZE];
    long value;
}

The entry structure contains members for both the hexword as well as its value. This second member (value) may seem redundant because the hex[SIZE] member is a hex value, albeit in a string. However, storing only the hexword string requires a double-pointer if I’m creating a dynamic list. No, using a structure requires fewer brain cells.

Based on last week’s Lesson, I know that just over 40 valid hexwords exist in the Linux dictionary. Even so, I opted to dynamically allocate the structures: Each time a valid hexword is found, storage for a new structure is allocated in memory. I could have worked this operation as a linked list, but opted instead to use offsets within a single memory chunk — like a dynamic array.

Here’s the code:

2024_11_23-Lesson.c

/* Hex Words */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* this code assumes the following path is valid */
#define DICTIONARY "/usr/share/dict/words"
#define SIZE 32

int main()
{
    FILE *dict;
    char word[SIZE],hexword[SIZE],*r,*w;
    int count,x;
    struct entry {
        char hex[SIZE];
        long value;
    } *list;

    /* open the dictionary */
    dict = fopen(DICTIONARY,"r");
    if( dict==NULL )
    {
        fprintf(stderr,"Unable to open %s\n",DICTIONARY);
        exit(1);
    }

    /* read the dictionary */
    count = 0;
    while( !feof(dict) )
    {
        /* read a word from the dictionary */
        r = fgets(word,SIZE,dict);
        if( r==NULL )    /* no word, done */
            break;

        /* remove newline */
        w = word;
        while(*w)
        {
            if( *w=='\n' )
            {
                *w = '\0';
                break;
            }
            w++;
        }

        /* pull out only hex characters */
        sscanf(word,"%[ABCDEFabcdef]",hexword);

        /* compare hexword with original word */
        if( strcmp(word,hexword)==0 )
        {
            if( strlen(hexword) > 3 )
            {
                if( count==0 )
                {
                    list = malloc( sizeof(struct entry) );
                    if( list==NULL )
                    {
                        fprintf(stderr,"Unable to allocate memory\n");
                        exit(1);
                    }
                    strcpy( list->hex,hexword );
                    list->value = strtol(hexword,NULL,16);
                }
                else
                {
                    list = realloc( list, sizeof(struct entry) * count+1 );
                    if( list==NULL )
                    {
                        fprintf(stderr,"Unable to reallocate memory\n");
                        exit(1);
                    }
                    strcpy( (list+count)->hex,hexword );
                    (list+count)->value = strtol(hexword,NULL,16);
                }
                count++;
            }
        }
    }

    /* output the list */
    for( x=0; x<count; ++x )
        printf("%3d: %-10s %10ld\n",
                x+1,
                (list+x)->hex,
                (list+x)->value
              );

    /* clean-up */
    fclose(dict);
    free(list);
    return(0);
}

The entry structure is declared as a pointer, *list. This pointer isn’t accessed until the first valid hexword is found. (Line 55 in the code.)

An if statement tests whether variable count is equal to zero, which means the list is just starting. When true, the malloc() function initially allocates storage for one entry structure at address list. The hexword is copied into the list->hex member and its value converted and stored in list->value.

The else condition uses the realloc() function to expand the list beyond the first entry. It adds new hexwords and their values as encountered:

list = realloc( list, sizeof(struct entry) * count+1 );

The word and its value are copied into the structure at an offset (count) within the list memory chunk.

After the dictionary is scanned, and the list built, a for loop outputs the contents, names and values of all the hexwords in the list. The list buffer is then freed. Stuff is cleaned up. The program ends.

Here is the output, which should match the same output as shown in last week’s Lesson:

  1: Bede            48862
  2: Beebe          782014
  3: DECed          912621
  4: Dacca          896202
  5: Dada            56026
...
 38: face            64206
 39: faced         1027309
 40: fade            64222
 41: faded         1027565
 42: feed       2314885530818453605

The output is almost the same . . . except for that final value. Obviously, hex value FEED isn’t equal to 2,314,885,530,818,453,605!

See if you can find the error in the code. It took me a while, but it’s in there. I’ll reveal where in next week’s Lesson. I’ll also add the code that sorts the output numerically.

Leave a Reply