Filename Extractor – Solution

When I see a problem such as finding a filename in a pathname, one of the first things I think of are regular expressions. For this month’s Exercise, however, that’s not the solution I coded.

A regular expression is a text-matching string. It’s a melange of various special characters, such as the * and ? wildcards you’re probably familiar with. The variety of characters and the complexity of how they’re used drives computer nerds nuts. Books and cheat sheets abound on regular expressions, and you could argue that some computer programming languages are simply bizarre extensions of regular expressions.

Due to the complexity of regular expressions, I didn’t even bother with that approach for my solution. Instead, I decided to work backwards through a pathname. Here’s my logic:

  • If the string is empty, no filename (or pathname) exists.
  • Anything to the right of the final / (path separator) is a filename.
  • If the / isn’t found, the pathname is the filename.
  • If the / is the last character of the pathname, no filename is available.

To keep my solution consistent, I define the pathname separator character as SEPARATOR. For Unix:

#define SEPARATOR '/'

And DOS, er, Windows:

#define SEPARATOR '\\'

Remember, two backslashes must be specified as \ is an escape character.

Here is my solution for the filename() function:

char *filename(char *path)
{
    char *p;
    int len;

/* find the end of the string */
    len = strlen(path);

/* if empty string, return empty string */
    if( len == 0 )
        return("[Empty String]");

/* find the terminating separator */
    p = path+len;
    while( *p != SEPARATOR)
    {
        p--;
        /* don't go too far! */
        if( p == path )
            return(path);
    }
    p++;    /* increment beyond the separator */

/* catch no filename condition */
    if( *p == '\0')
        return("[No filename]");
    else
        return(p);
}

The strlen() function locates the end of the string path. Immediately an if statement compares that length with zero, and if true, the text [Empty String] is returned.

The statement p = path+len sets pointer p to the end of the string. Next, a while loop backtracks through string path until the SEPARATOR character is found. If p backs up to the start of the string, condition p == path, the pathname is the filename, and it’s returned.

After the separator is located, then while loop stops and p is incremented to point at the start of the filename (after the separator character). At this point, if *p=='\0', the SEPARATOR terminates the path, meaning no file is present and the text [No filename] is returned. Otherwise, the location in p is returned, which is the filename string.

Click here to view the full code.

Click here to view the Windows version.

I can think of a few other ways to solve this Exercise, and my solution isn’t the best or most perfect. If your solution works and extracts the filenames properly, great!

4 thoughts on “Filename Extractor – Solution

  1. I have some vague idea there is some way of retrieving (either at runtime or compile time) the separator character for the system the code is compiled/running on.

    Your code could easily be extended to retrieve the filename extension – I think you could probably do it within the same single while loop but I have a bad cold and my brain isn’t working properly at the moment so I might be wrong 🙁

    I hate regexes – whoever said C was a write only language has obviously never seen even a simple regex. I was once sent a job description for a company that received stock exchange data in various formats and converted it into a common format for the London Stock Exchange. The job was basically writing regexes all day. The Job From Hell!

    Off-topic, but I am glad you don’t use any of the styles that puts an opening or closing curly bracket on the same line as something else. To me the curly brackets provide an at-a-glance overview of the structure of a function but to do so the bracket pairs need to be vertically in line, with each level of indentation one tab to the right.

  2. Here’s a secret: You can open files and work with pathnames the same on both Unix and Windows. Internally, Windows (and even DOS) recognize the slash as a path separator. Still, I define it in my code one way or the other.

    Originally, I wrote C similar to the way they write php or Javascript, with curly bracket, tab, code. Then, when I updated my books a while back, I referred to the original K&R manual and how clear it was. So I’ve used that coding method ever since. It adds whitespace, but also readability.

  3. “You can open files and worth with pathnames the same on both Unix and Windows.”

    I’d forgotten that. However, if you are writing a function to accept a path string you need to allow for \ and /.

    You could write general-purpose code like this

    #define NIXSEPARATOR ‘/’
    #define MSSEPARATOR ‘\\’
    .
    .
    .
    while( *p != NIXSEPARATOR && *p != MSSEPARATOR)
    .
    .
    .

Leave a Reply