A Tally of Unique Words, Part V

The next step in the unique words program is to tally the count of each word. From last week’s Lesson, the word list is sorted, which makes the task of counting duplicates easy.

As a quick review, the program’s code has completed these steps:

  1. A text file is opened and read, stored in dynamic char buffer creatively named buffer.
  2. The text in buffer is parsed, with double-pointer monster **list referencing each word.
  3. The word list is sorted in a case-insensitive manner.
  4. The sorted list is output.

To count the duplicates, which counts unique words as well, the code scans the list of sorted words, comparing each word in the list with the next word. When the neighbors match, an inner loop keeps working through the list, counting the matches.

To accomplish this task, the for loop (from Step 4 above) already in the code is updated from this:

for( x=0; x<count; x++ )
{
    printf("%3d:%s\n",x+1,*(list+x));
}

To this:

dup = 1;
for( x=0; x<count-1; x+=dup )
{
    dup = 1;
    while( strcasecmp(*(list+x),*(list+x+dup))==0 )
    {
        dup++;
    }
    printf("%s (%d)\n",*(list+x),dup);
}

Variable dup counts the duplicates. It also helps work through the outer loop by skipping over any repeated words: The value of dup is used in the incrementing expression, x+=dup. Or when dup==1, the word is flagged as unique, and the next word in the list is processed.

The inner while loop compares the current word in the list *(list+x) with the next word *(list+x+dup). When the words match (case-insensitive), the value of dup is incremented, tallying the repeat count, and the while loop continues.

After all repeating words are found, a printf() statement outputs the word and its repeat value.

Here is the first part of the output:

a (2)
all (1)
And (5)
art (1)
as (1)
brag (1)
breathe (1)
buds (1)
But (1)
...

The full code is available on my Github page.

The final task is to output the unique and duplicate words. This job could be done now: Flag those words with a dup count of one as unique, with the rest output as duplicates along with their repeat value. Here is an if-else structure to replace the printf() statement in the above code snippet:

if( dup==1 )
    printf("Unique: %s\n",*(list+x));
else
    printf("Duplicate: %s (%d)\n",*(list+x),dup);

The output now looks like this:

Duplicate: a (2)
Unique: all
Duplicate: And (5)
Unique: art
Unique: as
Unique: brag
Unique: breathe
Unique: buds
Unique: But

This result may be fine for the purpose of accomplishing the task, but I want to list each group separately, unique and then duplicates along with their count. This process continues with next week’s Lesson, the final in this series.

Leave a Reply