Understanding the Glob

From the history of the Unix operating system, glob is the term used for wildcard matching in filenames. It’s short for global, which to me means that two extra bytes of storage (for 'a' and 'l') were important back in the day.

The two common glob characters are * to match a cluster of characters in a filename, and ? to match a single character. So *.c matches all C language source code files — but only when your code properly interprets the glob character input. Otherwise, as shown in last week’s Lesson, only the first matching filename in a directory is returned. (An explanation of this effect is forthcoming.)

Wildcards are always active in Windows. In Linux/Unix, the glob feature must be active for the wildcards to match filename characters. In bash and other shells, the feature is on by default. Glob functionality is disabled by activating the noglob setting. To do so, use the set command:

set -o noglob

No feedback is generated, but after issuing the above command any wildcards are interpreted literally. For example:

$ ls *.c
ls: *.c: No such file or directory

To reactivate glob, use this command:

set +o noglob

(You can review all shell settings by typing the set -o command.)

In your C code, the glob() function helps evaluate filename input that includes the global wildcard characters. Here is the man page format for the glob() function, which requires that the glob.h header file be included:

int glob(const char * restrict pattern, int flags, int (*errfunc)(const char *epath, int errno), glob_t * restrict pglob);

The four arguments are:

pattern, which is the pathname/wildcard pattern to match, such as *.c.

flags are a series of options to modify the function’s behavior. Defined constants set the options, which can be combined by logically OR’ing them with each other. Plenty of options are available as documented on the man page.

errfunc is an error-handing function that helps deal with some glob() quirks. It can be set to NULL if this concept overwhelms you.

The final argument pglob is a pointer to the base of a linked list packed with useful information about the matching files.

The glob() function returns zero upon success. Otherwise it returns an error code, which I recommend testing against a slate of defined constants, such as GLOB_NOMATCH when no matching files are found. Here is sample code:

2021_06_05-Lesson.c

#include <stdio.h>
#include <stdlib.h>
#include <glob.h>

int main()
{
    char **found;
    glob_t gstruct;
    int r;

    r = glob("*.c", GLOB_ERR , NULL, &gstruct);
    /* check for errors */
    if( r!=0 )
    {
        if( r==GLOB_NOMATCH )
            fprintf(stderr,"No matches\n");
        else
            fprintf(stderr,"Some kinda glob error\n");
        exit(1);
    }
    
    /* success, output found filenames */
    printf("Found %zu filename matches\n",gstruct.gl_pathc);
    found = gstruct.gl_pathv;
    while(*found)
    {
        printf("%s\n",*found);
        found++;
    }

    return(0);
}

The glob() function is called at Line 11 with the wildcard argument *.c. Errors are handled at Line 13. When an error occurs (r!=0), a second check is done at Line 15 with the defined constant GLOB_NOMATCH. This condition reflects when no files match the wildcard given, and an appropriate message is output.

Upon success, the number of matching files held in the gl_pathc member of structure gstruct is output at Line 23. Double-pointer found is assigned to the base of the linked list referenced by gstruct.gl_pathv. A while loop processes and outputs the the names. Here is sample output:

Found 4 filename matches
cowsbulls.c
fun.c
logs.c
mem_binary.c

In next week’s Lesson, I return to the original question of how to handle wildcards in command line input.

2 thoughts on “Understanding the Glob

  1. Can you use a regex as a pattern? (Not that I’m wildly enthusiastic to use regexes of course!)

    Why is it called global? To me global implies variable scope, as opposed to local or block scope and doesn’t seem to have anything to do with wildcards.

    I’m surprised you get the results back as a linked list. A post on sorting it by, for example, modification date would be interesting. If you’d asked me 25 years ago I would probably have known how to sort a linked list.

    Your new code format looks nice. Orange on black reminds me of the ICL terminal I used several decades ago. Is it custom WordPress CSS? If so then if you ever change your WordPress theme then you’ll lose it which is REALLY ANNOYING. Shouting capitals justified IMHO 🙂

  2. In a later Lesson I explore the fnmatch() function, which uses the * and ? wildcards like a regular expression. To my knowledge, however, you must add a regex library to go full nuts. I do agree that regular expressions are bonkers.

    My guess is that “glob” came about well before regular expressions were a thing. It might have been the inspiration. Remember that the ‘grep’ utility is an acronym for “global regular expression print.”

    Dear lord! Sorting a linked list? Possible, but messy. I’ll probably do it soon…

    My terminal window is orange on black, which I stole from an ancient Amdek monitor I once owned. It was amber – beautiful! I miss it. The change was just two lines in the style CSS. They allow you to add your own styles, which is what I did. But you’re right, mess with their stuff and you lose everything. It’s scary.

Leave a Reply