Non-Identical Yet Very Similar

Computer’s are notoriously obedient and serious. They’re exact. If you give them inaccurate instructions or bad data, the computer does its job without question. But the real world isn’t binary and often times it’s necessary to add some forgiveness into your code.

As an example, computers are good at performing searches and comparisons. The desired results appear only when the matches are perfect. The problem is that items in the real world can often be close enough that a human would match them where the diligent computer finds no match.

As a case in point, consider the following two integer arrays:

int target[COUNT] = { 24, 28, 32, 45, 50, 66, 67, 70, 80, 95 };
int sample[COUNT] = { 26, 26, 30, 42, 50, 61, 67, 75, 85, 99 };

At first glance, you see that both arrays contain different sets of values. Additional examination shows that the each value is sequentially higher than the previous one and, further, they’re all positive integers less than 100. Other than that, there’s nothing interesting about the two sets of values — unless you graph them both, as shown in Figure 1.

Figure 1. The two data sets graphed as lines.

Figure 1. The two data sets graphed as lines.

When shown visually, you can quickly see that the two arrays, target[] and sample[] are pretty much identical. If you were conducting a survey or comparing test results, you would say that there’s really little difference between the two; they both graph what is essentially the same line.

So how do you convince the computer that they’re similar?

The answer is what I call fuzzy matching. You code a program so that it doesn’t just compare values one-to-one. Instead, you factor in some slop, plus you provide forgiveness for values way outside the range. The amount of fuzz you can add is up to your code, but before showing how that works, the following sample program compares the two arrays in a traditional, discrete and unsympathetic way:

#include <stdio.h>

#define COUNT 10
#define TRUE 1
#define FALSE 0

int main()
{
    int target[COUNT] = { 24, 28, 32, 45, 50,
                          66, 67, 70, 80, 95 };
    int sample[COUNT] = { 26, 26, 30, 42, 50,
                          61, 67, 75, 85, 99 };
    int x,match;

    match = TRUE;       /* initialize match */

    /* compare arrays */
    for(x=0; x<COUNT; x++)
    {
        if( target[x] != sample[x] )
        {
            match = FALSE;
            break;
        }
    }

    /* display results */
    if(match)
        printf("The arrays are identical\n");
    else
        printf("The arrays do not match\n");

    return(0);
}

Arrays target[] and sample[] are declared at Lines 9 and 11.

The for loop at Line 18 compares each array’s elements. When the if statement’s result at Line 20 is false, variable match is set to FALSE at Line 22 and the loop is broken (Line 23).

The if-else structure at Line 28 displays the comparison results based on the value of variable match.

Here is a sample run:

The arrays do not match

The program is correct: The arrays do not match. If the code is plotting a spacecraft’s course to Jupiter, such results would be a good thing. Yet, if it’s comparing the two lines shown in Figure 1, the results aren’t that helpful.

To get the results you want, you must add some fuzz to the comparison formula. I’ll demonstrate how to do so in next week’s Lesson.

Leave a Reply