Reading the Dictionary

I admit it: I’m a nerd and I read the dictionary. I know it’s a reference, not a work of fiction. The plot is weak. But I found it enjoyable as a kid to discover new words and their meanings. Alas, the Unix dictionary file lists only words and not definitions. But how many words are in there?

The word count is one of the first useful programs introduced in the original K&R, The C Programming Language. It’s simple code, but you don’t even need it to read words from the Unix dictionary file: Each word is kept on a line by itself. Essentially, all you need is to count the lines in the file. Or if you want to be more exactly, count the number of newline characters, \n.

The point of counting the words is to know the limit. For example, so that a program can pluck out a random word without reading beyond the end of the file. Even then, accessing and counting the words (or lines) in a file is a good exercise.

The following code outputs the contents of the dictionary file, aliased to /usr/share/dict/words on Unix/Linux/macOS systems. (Though the path may not be consistent on all systems.) I cover accessing the file in last week’s Lesson. This code opens the file, reads in and outputs each line, increments a counting variable, and reports the results.

2023_10_14-Lesson-a.c

/* Look up the dictionary */
#include <stdio.h>
#include <stdlib.h>

/* this code assumes the following path is valid */
#define DICTIONARY "/usr/share/dict/words"
#define SIZE 32

int main()
{
    FILE *dict;
    int wc;
    char word[SIZE],*r;

    /* open the dictionary */
    dict = fopen(DICTIONARY,"r");
    if( dict==NULL )
    {
        fprintf(stderr,"Unable to open %s\n",DICTIONARY);
        exit(1);
    }

    /* read and tally the words */
    wc = 0;
    while( !feof(dict) )
    {
        r = fgets(word,SIZE,dict);    /* read a word */
        if( r==NULL )
            break;
        printf("%s",word);    /* words are \n terminated */
        wc++;
    }

    /* results */
    printf("The dictionary file contains %d words\n",wc);

    /* close */
    fclose(dict);

    return(0);
}

The dictionary file name is held in defined constant DICTIONARY. Variable wc counts the words, or lines in the file. The fgets() statement reads in each line, where the string length is set to the value defined by SIZE, or 32 characters (31 characters plus the null character).

A printf() statement outputs each word. If you eliminate this statement the code runs faster, but it’s still pretty fast.

Here is the output from my system:

The dictionary file contains 104334 words

To read only the newlines in the file, which yields the same result, replace the word[] array with single character variable ch. Modify fhe while loop as well:

    /* read and tally the words */
    wc = 0;
    while( !feof(dict) )
    {
        ch = fgetc(dict);
        if( ch=='\n' )
            wc++;
    }

The output is the same, though this update makes the code run slower, probably because of the single-character buffering.

You can obtain this updated code on GitHub.

I continue my exploration of the dictionary file in next week’s Lesson.

Leave a Reply