Directory Spelunking

Exploring a folder tree — I mean directory tree — is a procedure found in many file and media utilities. From an original directory, you scan the list of files looking for a subdirectory. When it’s found, you open it and recursively continue the scan.

For example, in Unix, you can use the du command to recursively scan directories and discover how many blocks they consume. (A block is 512 bytes.) In Windows, both the cmd (command prompt) and PowerShell terminals use the TREE command to list directories and subdirectories.

To plumb these depths, you need a recursive function, which goes like this:

  1. Change to a given directory, which is passed to the recursive function as an argument.
  2. Open the current directory for reading.
  3. Scan for any subdirectory entries, ignoring the . and .. entries.
  4. If a subdirectory entry is found, dive into Step 2 again.
  5. Change back to the parent directory and continue the scan at Step 3.

Not covered in last week’s Lesson is the chdir() function. Like its shell counterpart, the chdir() function changes to the named directory:

int chdir(const char *path)

The path is the name of a directory, and the . and .. abbreviations can be used. Upon success, the function returns zero, otherwise -1. The errno variable can be checked to determine the specific reason chdir() failed.

The chdir() function is prototyped in the unistd.h header file.

Using the code presented in last week’s Lesson as a base, I wrote the dir() function. It requires two arguments, a directory (or pathname) and an integer depth value:

void dir(char *directory,int depth)

This function changes to the named directory and reads its listing, scanning for subdirectories. If found, the function calls itself (recurses). The depth argument helps indent the directories in the output.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <dirent.h>
#include <sys/stat.h>

void dir(char *directory,int depth);

int main(int argc, char *argv[])
{
    puts("Finding Directories");

    if(argc==2)
        dir(argv[1],0);
    else
        dir(".",0);

    return(0);
}

void dir(char *directory,int depth)
{
    DIR *folder;
    struct dirent *entry;
    struct stat filestat;

    /* Change to the named directory */
    if(chdir(directory))
    {
        fprintf(stderr,"Error changing to %s\n",directory);
        exit(1);
    }

    /* open the directory */
    folder = opendir(".");
    if(folder == NULL)
    {
        fprintf(stderr,"Unable to read directory %s\n",directory);
        exit(1);
    }

    printf("%*s%s\n",depth*2," ",directory);
    /* Look for a subdirectory */
    while( (entry=readdir(folder)) )
    {
        stat(entry->d_name,&filestat);
        /* look for only directories */
        if( S_ISDIR(filestat.st_mode) )
        {
            /* skip the . and .. entries */
            if(strcmp(entry->d_name,".")==0 || strcmp(entry->d_name,"..")==0)
                continue;
            /* recurse to the found directory */
            dir(entry->d_name,depth+1);
        }
    }

    chdir("..");
    closedir(folder);
}

In the main() function, you can specify a directory to plumb at the command line, otherwise the current directory is used by default.

At the start of the dir() function, chdir() changes to the named directory, which is opened at Line 36.

The directory name is printed at Line 43. The depth value is used to indent the name by a given number of spaces. (See this Lesson.)

The while loop scans the directory entries, with help from the if statement at Line 49. At Line 52, the . and .. entries are skipped. Whatever remains is a subdirectory entry, which is explored recursively at Line 55, along with an incremented value for depth.

As the recursive function rewinds, the chdir() function at Line 59 returns to the parent directory.

This code does have some flaws. For example, it crashes when reading the root directory because you can’t change to .. (the root’s parent), which doesn’t exist. The code is unable to track the parent directories of symbolic links in some filesystems. If you encounter a folder that needs permissions to open, the program halts.

I address the symbolic link issue in next week’s Lesson, which improves the code to better track parent directories.

Leave a Reply