Using scanf() to Build a String – Part IV

I refer to the process of converting special characters into strings as tokenizing. The token is a character or string — a code. This code is translated into something else, which allows the program to deal with complex items in a simple manner.

Because the scanf() function naturally ignores whitespace characters (space, tab, newline), these are the three tokens I plan to convert. To update the code from last week’s Lesson, I add a token() function to convert the special strings (tokens), starting with the word END to terminate the string.

The token() function accepts the string generated from the scanf() function and compares it with END. Originally I had the function return a single char value. The problem was the overhead to handle single characters in the main() function. So instead of a single character, the token() function returns a string (char pointer). For the END token, the string (pointer) returned is NULL.

2023_07_29-Lesson-a.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 16

char *token(char *s)
{
    /* test for special strings */
    if( strcmp(s,"END")==0 )
        return(NULL);

    return(s);
}

int main()
{
    char *b,*s;

    /* allocate/initialize buffers */
    b = malloc( SIZE * sizeof(char) );    /* input */
    s = malloc( sizeof(char) );            /* string */
    if( b==NULL || s==NULL )
    {
        fprintf(stderr,"Memory allocation error\n");
        exit(1);
    }
    /* initialize string storage */
    *b = *s = '\0';

    /* fetch input */
    printf("Type some Text: ");
    while(1)
    {
        scanf("%s",b);
        b = token(b);
        if( !b )    /* NULL */
            break;
        /* copy the word */
            /* add two: space and null char */
        s = realloc(s,strlen(s) + strlen(b) + 2);
        if( s==NULL )
        {
            fprintf(stderr,"Reallocation error\n");
            exit(1);
        }
        strcat(s,b);
        strcat(s," ");
    }

    /* output results */
    puts(s);

    return(0);
}

In the main() function, the scanf() function fetches string b, which is immediately sent to the token() function. The string is reassigned when the function returns, replacing the original: b = token(b);

The token() function itself looks only for the text END. If found, NULL is returned. Otherwise, the string input (its address stored in s) is passed back to the calling function. Remember, this update to the code is merely the first step, which is to create the token() function.

After the token() function call, an if statement tests whether NULL is returned: if( !b ), which is TRUE when b is NULL. If so, the loop ends and the string is output.

The program runs the same as the previous version, which is great. The next step is to update it to scan for tokens SP for space, NL for newline, and TB for tab. This update requires revising both the token() and main() functions.

2023_07_29-Lesson-b.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 16

char *token(char *s)
{
    static char space[] = " ";
    static char newline[] = "\n";
    static char tab[] = "\t";

    /* test for special strings */
    if( strcmp(s,"END")==0 )
        return(NULL);
    if( strcmp(s,"SP")==0 )
        return(space);
    if( strcmp(s,"NL")==0 )
        return(newline);
    if( strcmp(s,"TB")==0 )
        return(tab);

    return(s);
}

int main()
{
    char *b,*s;

    /* allocate/initialize buffers */
    b = malloc( SIZE * sizeof(char) );    /* input */
    s = malloc( sizeof(char) );            /* string */
    if( b==NULL || s==NULL )
    {
        fprintf(stderr,"Memory allocation error\n");
        exit(1);
    }
    /* initialize string storage */
    *b = *s = '\0';

    /* fetch input */
    printf("Type some Text: ");
    while(1)
    {
        scanf("%s",b);
        b = token(b);
        if( !b )    /* NULL */
            break;
        /* copy the word */
        s = realloc(s,strlen(s) + strlen(b));
        if( s==NULL )
        {
            fprintf(stderr,"Reallocation error\n");
            exit(1);
        }
        strcat(s,b);
    }

    /* output results */
    puts(s);

    return(0);
}

The token() function contains three static char arrays representing the replacement strings for the space, newline, and tab characters. As with the scan for END, when tokens SP, NL, or TB are encountered, their string equivalents are returned.

In the main() function, I’ve made two changes. First, the reallocation of string s no longer needs storage for two characters added (space and null character). Only the length of the existing string (s) and returned string (b) are required:

s = realloc(s,strlen(s) + strlen(b));

Second, I eliminated the strcat() function that appended a space after each word.

Here’s a sample run:

Type some Text: Hello, SP world! END
Hello, world!

Another more aggressive run:

Type some Text: This SP is SP a SP test NL Hello, SP world! END
This isSPaSPtestHello,NLSPworld!

Ugh. Yes, this botched output means I messed up a pointer thing. I cover what’s going wrong and how to fix it in next week’s Lesson.

Leave a Reply