Recursively Plowing a Directory Tree

The code to recursively plow a directory tree presented in last week’s Lesson could be improved upon. Primarily, it relies upon the .. shortcut to jump back to the parent directory. This method works only some of the time.

A problem appears in file structures where symbolic links map directories. In such a configuration, the program can get lost and the parent directory .. is returned as invalid. The solution is to obtain the full pathname, not only for the current directory, but its parent as well.

To get a directory’s full pathname, use the getcwd() function:

char *getcwd(char *buf,size_t size);

buf is a string representing the directory name, and both the . and .. shortcuts can be used. size is the size of the buffer in bytes.

The getcwd() function returns a pointer to the buffer. If the function fails, NULL is returned. The function is defined in the unistd.h header file.

Below I’ve used the main() function from the code presented in last week’s Lesson as a base. It now obtains the pathname of the current directory, or directory supplied as a command line argument, plus the parent directory name. It then calls the dir() function with those two arguments.

int main(int argc, char *argv[])
{
    char current[PATH_MAX],parent[PATH_MAX];

    puts("Finding Directories");

    /* Change to or set current directory */
    if(argc==2)
    {
        strcpy(current,argv[1]);
        chdir(current);
    }
    else
    {
        getcwd(current,PATH_MAX);
    }

    /* fetch parent directory */
    chdir("..");
    getcwd(parent,PATH_MAX);
    /* and change back */
    chdir(current);
    /* go! */
    dir(current,parent);

    return(0);
}

The PATH_MAX constant is defined by including the limits.h header file. Even so, I added some preprocessor directives to ensure that PATH_MAX has a value:

#ifndef PATH_MAX
#define PATH_MAX 1024
#endif

Above, if the constant expression PATH_MAX doesn’t exist (or is undefined), the #define directive creates it and assigns a value.

The dir() function is modified in this code to help track the current and parent directory names: dir(char *directory,char *parent)

This new function is nearly identical to the original dir() function, minus the depth argument. The only changes are the use of getcwd() to obtain the current directory name, which is output instead of the short name and depth indent used in last week’s code.

Click here to view the entire code as it’s a bit long to list in this post. Here’s sample output:

Finding Directories
/Users/dang/prog/c/blog
/Users/dang/prog/c/blog/sto

This new code still has a few issues.

First, if you try to run it on the root directory, the program can crash because the root has no parent. Second, if you use the .. shortcut as a command line argument, the program gets lost and displays more information than you intended. Both of these issues can be addressed in the code with a few tests, which I’ll leave up to you to add.

Leave a Reply