Parsing and Converting

The goal stated in last week’s Lesson is to convert a date formatted in a filename string into a time_t value. The filename string must be scanned for expected year, month, and date values. This process involves a custom function, convert(), as well as the strtol() function to translate strings of digits into long int values.

The convert() function references a string and a length. It pulls the given number of characters from the string, creating a substring for further manipulation:

char *convert( char *s, int size )

Argument *s references the string, int variable size references the substring’s length. Characters are copied one at a time from s to a static char buffer[]. As static storage, the buffer’s contents aren’t discarded when the function terminates, which is how the value (its address) is returned.

Within the convert() function, a for loop processes each character in the string according to the size value. An if test checks for a period (separating the filename from the extension) and the null character. If encountered, the program terminates as the filename string is most likely improperly formed. Otherwise, the character is set into the buffer:

buffer[x] = c.

After the for loop ends, the buffer[] string is capped with a null character terminator, buffer[x] = '\0'. The address of buffer[] is returned and used in the strtol() function to generate a long int value in the main() function. For example:

month = strtol(convert(filename+4,2),NULL,10);

Above, a substring two characters long is extracted from the fifth character of filename (filename+x) and returned as its own string. The new string is used immediately in the strtol() function to obtain an integer value. This value is stored in the month variable.

Here is the full code:

2021_02_06-Lesson.c

#include <stdio.h>
#include <stdlib.h>

/* copy and convert the digits */
char *convert( char *s, int size )
{
    int x;
    static char buffer[5];
    char c;

    /* avoid buffer overflow */
    if( size > 4 )
    {
        fprintf(stderr,"Buffer overflow: %d\n",size);
        exit(1);
    }

    /* process the given number of characters */
    for( x=0; x<size; x++ )
    {
        c = *(s+x);
        if( c=='.' || c=='\0' )
        {
            fprintf(stderr,"Malformed filename\n");
            exit(2);
        }
        buffer[x] = c;
    }
    buffer[x] = '\0';

    return(buffer);
}

int main(int argc, char *argv[])
{
    char *filename;
    int year, month, day;
    
    /* check for filename argument */
    if( argc<2 )
    {
        /* output error message to standard error */
        fprintf(stderr,"Filename option required\n\n");
        /* leave with exit code 1*/
        exit(1);
    }
    /* assign to pointer for convenience */
    filename = argv[1];

    /* code to confirm that the file exists goes here */
    /* ... */
    
    /* extract integers. */
    year = strtol(convert(filename+0,4),NULL,10);
    month = strtol(convert(filename+4,2),NULL,10);
    day = strtol(convert(filename+6,2),NULL,10);

    /* output results */
    printf("%4d %2d %2d\n",year,month,day);

    return(0);
}

This code assumes the filename to be in the proper format; no checking is done beyond confirming an argument presents itself (Line 12). Here is the output generated when using the filename 20210115.txt:

2021  1 15

And for the filename 2021okay.txt:

2021  0  0

The convert() function doesn’t validate proper input. Any non-digit text is translated into zero values, as shown in the above output. This condition could be tested for later in the code, though my feeling is that zero is still a valid number.

In next week’s Lesson, I add the time functions required to convert the year, month, and day int values into a time_t value.

2 thoughts on “Parsing and Converting

  1. I think it’s possible to validate and extract values from the filename string using Regex. I have never used Regex in C and I’m not sure how to go about it but there must be a library somewhere.

  2. I see that GNU has a library. Often times with these situations, it’s a question of whether you want to use a library (which means it’s a dependency in the release) or just do a quick-and-dirty. I generally opt for the latter.

Leave a Reply