A Tally of Unique Words, Part III

From last week’s Lesson, the text in a buffer is parsed, creating pointers to each word in the string. Alas, the addresses of these words (the pointers) aren’t saved, which is stupid. To handle the job, and to keep the Unique Words project moving forward, a dynamic array of pointers must be allocated.

The C language doesn’t let you reallocate an array; you’re stuck with the size chosen at build time. So to handle an array of unknown size, you must use pointers and dynamically allocate storage. Discussing this topic often sends C programmers into a tizzy.

For my unique words project, the number of words in the buffer is unknown. Back when I tried to avoid pointers, I would just allocate a monstrous array to handle all potential pointers:

char *words[1000];

The size is best-guess determined to handle (hopefully) anything. This approach may work. But if the quantity is too low, you’ve wasted memory. If the quantity isn’t big enough, the program crashes. Using pointers is the only way to keep track of an unknown number of values and dynamically allocate storage.

The data type required to store pointers is one of these monsters:

char **list;

Ugh.

An initial size must be set for the allocation. I use the value 100, which is assigned to int constant size:

const int size = 100;

Here are the statements to initially allocate storage for 100 char pointers in the list memory chunk:

list = malloc( sizeof(char *) * size );
if( list==NULL )
{
    fprintf(stderr,"Memory allocation error\n");
    exit(1);
}

To store the pointers, the parsed words, the code’s while loop is modified. The pointer returned from the strtok() function is saved at an offset within the list buffer. Variable count, which already exists in the code, provides the offset:

count = 0;
word = strtok(buffer,separators);
while( word )
{
    *(list+count) = word;
    word = strtok(NULL,separators);
    count++;
}

As words are parsed from buffer, the word pointer is saved into the list dynamic array. To deal with overflow — because the list buffer initially stores only 100 items — the value of count is compared with constant size:

count%size

When this expression returns zero, the list buffer’s number of items is evenly-divided by 100: 100, 200, 300, and so on. Because the loop is still spinning, 100 more items must be added to storage. The realloc() function does the job:

list = realloc(list,sizeof(char *)*(count+size));

Along with the error testing that follows the reallocation, here is the full while loop:

count = 0;
word = strtok(buffer,separators);
while( word )
{
    *(list+count) = word;
    word = strtok(NULL,separators);
    count++;
    if( count%size==0 )
    {
        list = realloc(list,sizeof(char *)*(count+size));
        if( list==NULL )
        {
            fprintf(stderr,"Unable to reallocate memory\n");
            exit(1);
        }
    }
}

The text output function is removed from the while loop because I want to prove that the word pointers are stored in the list buffer work. Therefore, a for loop now outputs the word list. These updates are found in the source code file on GitHub.

The code from this week’s Lesson has the same output as shown last week. The difference is that the words are all indexed and stored in a buffer. The next step is to find the unique words and duplicates. This topic is covered in next week’s Lesson.

Leave a Reply