{"id":2124,"date":"2016-09-17T00:01:05","date_gmt":"2016-09-17T07:01:05","guid":{"rendered":"http:\/\/c-for-dummies.com\/blog\/?p=2124"},"modified":"2016-09-24T08:22:51","modified_gmt":"2016-09-24T15:22:51","slug":"fuzzy-matching-with-percentage-variation","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=2124","title":{"rendered":"Fuzzy Matching with Percentage Variation"},"content":{"rendered":"<p>A fuzzy match that uses discrete amount of fudge might not yield a match, especially when the values cover a wide range. To better make the match work, set a percentage variation instead.<br \/>\n<!--more--><br \/>\nFor example, suppose the variation is fixed at 2.0. For the values 1.2 and 1.7, the fuzzy match works. But if the values are 209.7 and 206.2, the match would fail. If both sets of numbers are in the same series, then you might want to try a percentage variation instead of a fixed, or discrete, value.<\/p>\n<p>In my code, I base the percentage on the value fetched from the first array. For example, if element <code>target[0]<\/code> is 24, the tolerance is 10 percent of that value, or 2.4. As long as the comparison value, <code>sample[0]<\/code>, is within the range of 24 &plusmn;2.4, the values match.<\/p>\n<p>When the values get larger, the percentage increases and so does the tolerance. If <code>target[9]<\/code> is set to 95, then its 10 percent tolerance is 95 &plusmn;9.5. The value of <code>sample[9]<\/code> can be in the range of 85.5 (<em>int<\/em> 85) to 104.5 (<em>int<\/em> 105) and the fuzzy match passes.<\/p>\n<p>Here is sample code, based on the code from <a href=\"http:\/\/c-for-dummies.com\/blog\/?p=2104\">last week&#8217;s Lesson<\/a>:<\/p>\n<pre class=\"screen\">\r\n#include &lt;stdio.h&gt;\r\n#include &lt;stdlib.h&gt;\r\n\r\n#define COUNT 10\r\n#define TRUE 1\r\n#define FALSE 0\r\n\r\nint main()\r\n{\r\n    int target[COUNT] = { 24, 28, 32, 45, 50,\r\n                          66, 67, 70, 80, 95 };\r\n    int sample[COUNT] = { 26, 26, 30, 42, 50,\r\n                          61, 67, 75, 85, 99 };\r\n    int x,match,variation;\r\n    float tolerance;\r\n\r\n    match = TRUE;       <span class=\"comments\">\/* initialize match *\/<\/span>\r\n    tolerance = 0.10;   <span class=\"comments\">\/* percentage tolerance *\/<\/span>\r\n\r\n    <span class=\"comments\">\/* compare arrays *\/<\/span>\r\n    for(x=0; x&lt;COUNT; x++)\r\n    {\r\n        variation = abs(target[x]-sample[x]);\r\n        if( (float)variation &gt; (float)target[x]*tolerance)\r\n        {\r\n            match = FALSE;\r\n            break;\r\n        }\r\n    }\r\n\r\n    <span class=\"comments\">\/* display results *\/<\/span>\r\n    if(match)\r\n        printf(\"The arrays are similar, within %.f%% tolerance\\n\",\r\n                tolerance*100);\r\n    else\r\n        printf(\"The arrays are not similar, within %.f%% tolerance\\n\",\r\n                tolerance*100);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>The <em>float<\/em> variable <code>tolerance<\/code> is set to the fuzz factor, the percentage value that the comparison can fluctuate. It&#8217;s set to <code>0.10<\/code> in Line 18, which is 10 percent.<\/p>\n<blockquote><p>Remember that percentages are decimal values: 50 percent is 0.50 not 50.0.<\/p><\/blockquote>\n<p>The <em>for<\/em> loop at Line 21 plows through each of the two arrays&#8217; elements. At Line 23 in the loop, the difference between each elements&#8217; values is calculated. The <em>abs()<\/em> function sets a positive value difference.<\/p>\n<p>The <em>if<\/em> comparison at Line 24 multiplies the original array element&#8217;s value by the <code>tolerance<\/code> percentage. When the result is greater than the value of <code>variation<\/code>, the match fails. The values of <code>variation<\/code> and <code>element[x]<\/code> are typecast to <em>float<\/em> variables so that the result is calculated accurately.<\/p>\n<p>Lines 33 and 36, both display the results and display the percentage value used for the comparison.<\/p>\n<p>Here is a sample run:<\/p>\n<pre><code>The arrays are similar, within 10% tolerance<\/code><\/pre>\n<p>You can set the value of variable <code>tolerance<\/code> to another percentage in Line 18. So if you want to use 5 percent, set the value to <code>0.05<\/code>.<\/p>\n<p>In this sample code, however, anything less than 10 percent tolerance fails the match. That&#8217;s because the smaller values in the samples require a higher tolerance than the larger values. Discrete values make more sense for this type of comparison, still it&#8217;s puzzling when you look at Figure 1 and see how similar the lines are that a 10 percent tolerance doesn&#8217;t result in a positive fuzzy match.<\/p>\n<div id=\"attachment_2100\" style=\"width: 560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2100\" src=\"http:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2016\/08\/0903-figure1.png\" alt=\"Figure 1. The two data sets graphed as lines.\" width=\"550\" height=\"367\" class=\"size-full wp-image-2100\" srcset=\"https:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2016\/08\/0903-figure1.png 550w, https:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2016\/08\/0903-figure1-300x200.png 300w, https:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2016\/08\/0903-figure1-450x300.png 450w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><p id=\"caption-attachment-2100\" class=\"wp-caption-text\">Figure 1. The two data sets graphed as lines.<\/p><\/div>\n<p>The solution is to add another level of forgiveness. I cover that fuzzy feature in <a href=\"http:\/\/c-for-dummies.com\/blog\/?p=2133\">next week&#8217;s Lesson<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A better way to calculate the <em>fuzz<\/em> in a match might be to use a percentage instead of a discrete value. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=2124\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-2124","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2124","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2124"}],"version-history":[{"count":4,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2124\/revisions"}],"predecessor-version":[{"id":2158,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2124\/revisions\/2158"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2124"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2124"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2124"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}