Roll Your Own String Search Function

How do you search for one string within another string? Brute force! That means, you size up each character one at a time until you find the perfect match.

I’m certain other search methods and algorithms exist beyond the brute force method. In fact, it may not be brute force as much as it’s simply searching for something the way humans do: One letter at a time.

For example, to find yst in the word haystack you start by comparing y to h. No match.

Compare y to a. No match.

Compare y to y. Match! So compare the next two letters: s to s. Match! And you keep on going until the entire word is found — or until the end of the search string.

The following code demonstrates this process. The search string is pointer variable haystack. The string to find is pointer variable needle:

#include <stdio.h>

int main()
{
    char *haystack = "Was this the face that launch'd a thousand ships";
    char *needle = "face";
    char *base,*found;
    int i,offset;

    printf("Finding '%s' in:\n%s\n\n",needle,haystack);

    /* retain original pointer locations */
    base = haystack;
    found = NULL;       /* initialize */

    /* find the string */
    while(*base)
    {
        if(*base == *needle)
        {
            i = 0;
            while(*(base+i) == *(needle+i))
            {
                i++;
                if(*(needle+i) == '\0')
                    found = base;
            }
        }
        base++;
        if(found != NULL)
            break;
    }

    if( found == NULL)
        puts("String not found");
    else
    {
        offset = (int)found - (int)haystack + 1;
        printf("Found at offset %d.\n",offset);
    }

    return(0);
}

The central part of this code is a simple while loop, which starts at Line 17. Stripped of the text-searching statements, here’s the loop:

while(*base)
{
    base++;
}

So as long as the value *base isn’t the \0 character (the end of the string), the loop keeps spinning, processing the string one character at a time.

On the loop’s journey, the if condition at Line 19 tests whether two characters of both strings match. If so, a second search begins. Otherwise, the base pointer (the string to search) is incremented, and the loop compares the next two characters.

When two characters match, the actual comparison begins. Index variable i is initialized at Line 21.

A nested while loop at Line 22 compares subsequent characters by incrementing though both strings (without changing the base address). Characters are compared. The loop stops at a mismatch due to the while statement’s condition.

The if condition at Line 25 determines whether the full match was found: If the null terminator at the end of string needle has been reached, the found pointer is set equal to the location of the matching text.

The while loop at Line 17 has a second terminating condition, which is triggered when the found pointer isn’t NULL, i.e., when the matching text has been found.

By the way, you could re-write Line 30 to read:

if(found)
    break;

And line 34 to read:

if(!found)

Both statements read better that way, but the concepts may not be as clearly presented as shown in the code.

Here’s the output:

Finding 'face' in:
Was this the face that launch'd a thousand ships

Found at offset 14.

Test it a few times by replacing the haystack and needle strings with text of your own concoction.