Parsing Words with the strspn() Function

I’ve dabbled on the topic of parsing words from a string several times on this blog: Slicing Words from a String, Parse and Count Words in a String, and more. I just can’t have enough! In fact, this Lesson picks up the topic again, continuing my discussion of the strspn() and strcspn() functions from last week’s Lesson.

Yes, it’s possible to use the strspn() function to slice words from a string. This function, which is probably pronounced “string span,” continuously scans one string as long as it contains characters referenced in another string. If the second string contains letters of the alphabet, both upper- and lowercase (and forgetting about contractions), the strspn() function handily returns an offset representing word boundaries in the first string.

2021_12_04-Lesson.c

#include <stdio.h>
#include <string.h>

int main()
{
    const char *a = "It was a dark and stormy night";
    const char *b =
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "abcdefghijklmnopqrstuvwxyz"
        ;
    size_t r = 0;

    do
    {
        printf("%s\n",a);
        r = strspn(a,b);
        a += r+1;
    }
    while(r<strlen(a));
    printf("%s\n",a);

    return(0);
}

At Lines 6 and 7, I declare const char *a and *b as pointer strings. If you use this construction, remember the const classifier. It ensures that the string remains unaltered, which is important for this type of string declaration. If not, you can get into trouble modifying the string.

String *b is declared on two lines, two separate string tokens, which is a valid construction in C. Refer to this Lesson if confusion overwhelms you.

The strspn() function appears in the a do-while loop starting at Line 13. First, at Line 15, the entire string is output. Next, at Line 16 the strspn() function returns the offset of the first non-alpha character in string *a. This offset, r, is added to pointer a at line 17, plus one to skip over the space. The assumption here is that words are separated by only one space — a program flaw, but moving on:

The while loop repeats as long as variable r (the offset) is less than the length of string a: while(r<strlen(a)) This approach should work, though an extra printf() statement is required at Line 20 to output the final word in the string.

Here is a sample run:

It was a dark and stormy night
was a dark and stormy night
a dark and stormy night
dark and stormy night
and stormy night
stormy night
night

The code isn’t perfect, of course. As I mentioned earlier, it doesn’t account for contractions (the apostrophe in a word) or more than a single space between words. Based on the output, it doesn’t parse the words but merely progresses word-by-word through the string.

You could improve the code, but I’d like to move on to a larger project presented in a series of Lessons: Parsing and counting words with the eventual goal of extracting unique words from a chunk of text. This process begins with next week’s Lesson.

Leave a Reply