Fuzzy Matching Tolerance

Comparing values is a discrete process; an if comparison is absolute. The result is either true or false. Never is the result perhaps or somewhat — unless you add some fudge to the comparison.

What is fudge? Well, it’s delicious. When it comes to coding, however, you fudge allows a smidgen of wiggle room just in case two values don’t compare exactly.

The issue, of course, is how much fudge to add?

For programming purposes, I refer to the fudge value as variation. It can be calculated discretely or as a percentage. Choosing one over the other depends on the type of data you’re comparing.

A discrete amount of variation is a specific value. For example, you could add a tolerance of 5 points to meter readings to do a fuzzy match. Here’s how that might look:

if( abs( a - b ) > 5 )
    match = FALSE;
else
    match = TRUE;

The abs() function (defined in stdlib.h) determines the absolute value of a number, which is always positive. So the statement abs( a - b ) returns a positive value. If the difference between variables a and b is greater than 5 (the fudge or variation) then the fuzzy match fails. Otherwise, the items match — more-or-less.

The following code modifies the example from last week’s Lesson. A new variable, variation, is added. It’s set to 5 to allow for a tolerance of 5 points difference between the two array elements.

#include <stdio.h>
#include <stdlib.h>

#define COUNT 10
#define TRUE 1
#define FALSE 0

int main()
{
    int target[COUNT] = { 24, 28, 32, 45, 50,
                          66, 67, 70, 80, 95 };
    int sample[COUNT] = { 26, 26, 30, 42, 50,
                          61, 67, 75, 85, 99 };
    int x,match,variation;

    match = TRUE;       /* initialize match */
    variation = 5;      /* set tolerance */

    /* compare arrays */
    for(x=0; x<COUNT; x++)
    {
        if( abs(target[x]-sample[x]) > variation )
        {
            match = FALSE;
            break;
        }
    }

    /* display results */
    if(match)
        printf("The arrays match\n");
    else
        printf("The arrays do not match\n");

    return(0);
}

This code is pretty much identical (in a fuzzy way) to the original presented last week. The difference is in the if comparison at Line 22. The variation value allows forgiveness in the comparison. Here’s a sample run:

The arrays match

You can tighten the value of variable variation up or down to determine how well the arrays match, but 5 is the minimum for the sample data; when I modified the code so that variation is set to 4, the arrays didn’t match.

Another way to calculate the fuzz is to use a percentage instead of a discrete value. This type of fuzzy matching works better for values that may change dramatically in size, as shown in the sample code. I’ll demonstrate that approach in next week’s Lesson.

Leave a Reply