Slicing Words from a String

The fall-through effect of the switch-case structure can be extremely useful, as demonstrated in last week’s Lesson. For slicing words from a string, you can easily match multiple word separators, saving you from coding complex statements or weirdo if-else structures. But a problem arises when multiple word separators appear.

The issue is to know when the separators stop and a new word starts. To efficiently resolve the problem, you can use a ctype function, isalpha(), which is what I did in the following code.

#include <stdio.h>
#include <ctype.h>

int main()
{
    char string[] = "Hello there strange, little planet\n";
    int x = 0;

    while( string[x] )
    {
        switch( string[x] )
        {
            case ' ':
            case '.':
            case ',':
            case '!':
            case '?':
            case ';':
            case ':':
            case '\n':
                putchar('\n');
                if( string[x+1] == '\0' )
                    break;
                if( !isalpha(string[x+1]) )
                    x++;
                break;
            default:
                putchar( string[x] );
        }
        x++;
    }

    return(0);
}

After the list of case separator character comparisons, the set of statements starting at Line 22 process the next several characters in the string.

First, the newline is output at Line 21.

Second, the code checks the next character in the string, string[x+1], to confirm that it’s not the null character, marking the end of the string. If it is, the break statement exits the switch-case structure.

Finally, the isalpha() test at Line 24 checks the next character. If the character isn’t a letter of the alphabet, variable x is incremented. This solution skips over two separator characters in a row, such as ", " (comma space) or ". " (period space).

Here’s a sample run:

Hello
there
strange
little
planet

And, naturally, this code isn’t without its problems.

If you change the string to "Hello there, strange... little planet\n", here’s the output:

Hello
there
strange

little
planet

The code cannot process multiple word separators; it’s stuck looking at only the next character, string[x+1]. A fix isn’t that difficult, as the following modification demonstrates.

#include <stdio.h>
#include <ctype.h>

int main()
{
    char string[] = "Hello there, strange... little planet\n";
    int x,y;

    x = 0;
    while( string[x] )
    {
        switch( string[x] )
        {
            case ' ':
            case '.':
            case ',':
            case '!':
            case '?':
            case ';':
            case ':':
            case '\n':
                putchar('\n');
                y = 1;
                while(string[x+y])
                {
                    if( !isalpha(string[x+y]) )
                        y++;
                    else
                        break;
                }
                x+=y;
            default:
                putchar( string[x] );
        }
        x++;
    }

    return(0);
}

A new variable is needed, y. It’s initialized at Line 23 then used in the while loop starting at Line 24.

Within the while loop, subsequent characters in the string are applied to the isalpha() test:

if( !isalpha(string[x+y]) )

If this test passes, meaning the next character in the string isn’t a letter of the alphabet, variable y is incremented and the loop repeats. Otherwise, the else condition takes over and the loop breaks.

When an alphabetic letter is found, the value of variable y is added to the value of variable x at Line 31. The rest of the string is then processed.

Here’s sample output:

Hello
there
strange
little
planet

Hang on to the second example from this Lesson because it’s used as the base for this month’s Exercise.

Leave a Reply