It’s Parsing Time

A recent puzzle presented itself, one where I must extract a date based on a file’s name. The date is part of the name, but my code required I translate the date into a time_t value. It’s an awesome programming puzzle that involves many different tricks.

In short, given the filename 20210130a.txt, the code must generate the time_t value 1611993600. This value represents the number of seconds ticked since the Unix epoch, January 1 1970. To accomplish this feat, the date values (2021, 01, 30) must be extracted from the filename string, converted to integers, then stuffed into a tm structure for final conversion into a time_t value — at least this approach is what I first considered.

The process of pulling out specific tidbits from a string is called parsing. Especially when the information is well-formatted, parsing helps you cull through the data to find the tidbit you want.

The C library feature the strtok() function, which can parse strings into separate chunks. For this Lesson, however, the data bits aren’t delineated by characters but rather by their positions. This consistency is what allows the dates to be cleanly and consistently extracted:

yyyymmdd[a].txt

The first four characters yyyy represent a year, a positive value.

The next two characters mm represent a month, 01 through 12.

Following the month are two characters dd that represent a day of the month, 01 up to 31.

These four digits — always at the start of the filename, always digits, and always in the same order — can be followed by an optional letter in the range of a through z.

As usual, I build such programs in steps. The first step is to write code that confirms a filename argument is present:

2021_01_30-Lesson-a.c

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    char *filename;
    
    /* check for filename argument */
    if( argc<2 )
    {
        /* output error message to standard error */
        fprintf(stderr,"Filename option required\n\n");
        /* leave with exit code 1*/
        exit(1);
    }
    /* assign to pointer for convenience */
    filename = argv[1];

    /* code to confirm that the file exists goes here */

    printf("Filename '%s' specified\n",filename);

    return(0);
}

If the argument count is less than 2 at Line 9, an error message is output to stdout and the program quits. If the argument count is greater, a command line option is available. The value’s address is stored in the argv[1] array element. This value is output at Line 21.

The next step is to extract the digits from the string and convert them into numbers. For this task I use two functions, convert() and strtol().

The convert() function extracts a string of a given length (a substring) from another string. This process may seem similar to what the strstr() function does, but convert() extracts a string without searching, Plus it validates characters to some extent.

The strtol() function is a C library function that converts a string value into a long int.

In next week’s Lesson, I show the code update that adds the convert() function as well as three statements to extract integer values from the filename argument presented.

Leave a Reply