Checking Your Spelling

At the basic level, a spell-checker works as a simple comparison program: The word in question is compared with each word in the dictionary. When the source word isn’t found, it’s assumed to be misspelled. With a dictionary file on your computer, it’s easy for a C programmer to code this type of program.

Borrowing code from last week’s Lesson makes this task easier, as the dictionary is already scanned. What must be added are the steps to prompt for a word to input and then compare it with each word found in the dictionary file. By hand, such a chore would take hours. But with a computer, comparing words is quick and accurate.

The key is to use the strcmp() function. Even so, the dictionary file is case-sensitive. For example, if you search for “kentucky” it won’t match the word “Kentucky.” Yes, it is misspelled if it’s not capitalized. Then again, the purpose of this week’s Lesson isn’t to fuzzy-match words, rather to match them verbatim.

The following code prompts for input. The input word is compared with each word in the dictionary — tens of thousands of them in nanoseconds — and any match is reported.

2023_11_04-Lesson.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* this code assumes the following path is valid */
#define DICTIONARY "/usr/share/dict/words"
#define SIZE 32
#define FALSE 0
#define TRUE 1

int main()
{
    FILE *dict;
    int found,x;
    char input[SIZE],word[SIZE],*r;

    /* gather user input */
    printf("Spell check a word: ");
    fgets(input,SIZE,stdin);

    /* open the dictionary */
    dict = fopen(DICTIONARY,"r");
    if( dict==NULL )
    {
        fprintf(stderr,"Unable to open %s\n",DICTIONARY);
        exit(1);
    }

    /* scan for matching word (including newline) */
    found = FALSE;
    while( !feof(dict) )
    {
        r = fgets(word,SIZE,dict);    /* read a word */
        if( r==NULL )
            break;
        if( strcmp(input,word)==0 )
        {
            found = TRUE;
            break;
        }
    }

    /* remove newline from input */
    for( x=0; x<SIZE; x++ )
    {
        if( input[x]=='\n' )
        {
            input[x] = '\0';
            break;
        }
    }

    /* output results */
    if(found)
        printf("'%s' is in the dictionary!\n",input);
    else
        printf("I cannot locate '%s' in the dictionary.\n",input);

    /* close */
    fclose(dict);

    return(0);
}

This code sets a few defined constants: DICTIONARY for the word file’s path; SIZE for the input buffer size; FALSE and TRUE to make the code more readable.

First, user input is gathered, a word to scan for in the dictionary.

Second, the dictionary file is opened, which is code pulled from previous Lessons.

Third, variable found is set to FALSE; the input word isn’t yet found. A while loop then scans the dictionary. The strcmp() function compares each entry with the input word. Both words terminate with a newline, which is a feature of the fgets() function. If strcmp() returns zero, the words match, the while loop is broken and variable found is reset to TRUE.

Fourth, the newline is stripped from the input word. I added this code to clean up the output, which is the final step: to show the found word or report that it’s misspelled.

Here’s a sample run:

Spell check a word: receive
'receive' is in the dictionary!

And:

Spell check a word: seperate
I cannot locate 'seperate' in the dictionary.

Obviously more coding is required for a professional, real-time spellchecker. I believe such dictionaries also list common incorrect spellings and perform various fuzzy matching techniques. Now if only such technology were available with my texting app’s autocorrect feature . . .

I have even more fun with the dictionary in next week’s Lesson.

Leave a Reply