{"id":901,"date":"2014-08-30T00:01:34","date_gmt":"2014-08-30T07:01:34","guid":{"rendered":"http:\/\/c-for-dummies.com\/blog\/?p=901"},"modified":"2014-08-23T08:24:09","modified_gmt":"2014-08-23T15:24:09","slug":"roll-your-own-string-search-function","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=901","title":{"rendered":"Roll Your Own String Search Function"},"content":{"rendered":"<p>How do you search for one string within another string? Brute force! That means, you size up each character one at a time until you find the perfect match.<br \/>\n<!--more--><br \/>\nI&#8217;m certain other search methods and algorithms exist beyond the brute force method. In fact, it may not be <em>brute<\/em> force as much as it&#8217;s simply searching for something the way humans do: One letter at a time.<\/p>\n<p>For example, to find <em>yst<\/em> in the word <em>haystack<\/em> you start by comparing <em>y<\/em> to <em>h<\/em>. No match.<\/p>\n<p>Compare <em>y<\/em> to <em>a<\/em>. No match.<\/p>\n<p>Compare <em>y<\/em> to <em>y<\/em>. Match! So compare the next two letters: <em>s<\/em> to <em>s<\/em>. Match! And you keep on going until the entire word is found &#8212; or until the end of the search string.<\/p>\n<p>The following code demonstrates this process. The search string is pointer variable <code>haystack<\/code>. The string to find is pointer variable <code>needle<\/code>:<\/p>\n<pre class=\"screen\">\r\n#include &lt;stdio.h&gt;\r\n\r\nint main()\r\n{\r\n    char *haystack = \"Was this the face that launch'd a thousand ships\";\r\n    char *needle = \"face\";\r\n    char *base,*found;\r\n    int i,offset;\r\n\r\n    printf(\"Finding '%s' in:\\n%s\\n\\n\",needle,haystack);\r\n\r\n    <span class=\"comments\">\/* retain original pointer locations *\/<\/span>\r\n    base = haystack;\r\n    found = NULL;       <span class=\"comments\">\/* initialize *\/<\/span>\r\n\r\n    <span class=\"comments\">\/* find the string *\/<\/span>\r\n    while(*base)\r\n    {\r\n        if(*base == *needle)\r\n        {\r\n            i = 0;\r\n            while(*(base+i) == *(needle+i))\r\n            {\r\n                i++;\r\n                if(*(needle+i) == '\\0')\r\n                    found = base;\r\n            }\r\n        }\r\n        base++;\r\n        if(found != NULL)\r\n            break;\r\n    }\r\n\r\n    if( found == NULL)\r\n        puts(\"String not found\");\r\n    else\r\n    {\r\n        offset = (int)found - (int)haystack + 1;\r\n        printf(\"Found at offset %d.\\n\",offset);\r\n    }\r\n\r\n    return(0);\r\n}<\/pre>\n<p>The central part of this code is a simple <em>while<\/em> loop, which starts at Line 17. Stripped of the text-searching statements, here&#8217;s the loop:<\/p>\n<pre><code>while(*base)\r\n{\r\n    base++;\r\n}<\/code><\/pre>\n<p>So as long as the value <code>*base<\/code> isn&#8217;t the <code>\\0<\/code> character (the end of the string), the loop keeps spinning, processing the string one character at a time.<\/p>\n<p>On the loop&#8217;s journey, the <em>if<\/em> condition at Line 19 tests whether two characters of both strings match. If so, a second search begins. Otherwise, the <code>base<\/code> pointer (the string to search) is incremented, and the loop compares the next two characters.<\/p>\n<p>When two characters match, the actual comparison begins. Index variable <code>i<\/code> is initialized at Line 21.<\/p>\n<p>A nested <em>while<\/em> loop at Line 22 compares subsequent characters by incrementing though both strings (without changing the <code>base<\/code> address). Characters are compared. The loop stops at a mismatch due to the <em>while<\/em> statement&#8217;s condition.<\/p>\n<p>The <em>if<\/em> condition at Line 25 determines whether the full match was found: If the null terminator at the end of string <code>needle<\/code> has been reached, the <code>found<\/code> pointer is set equal to the location of the matching text.<\/p>\n<p>The <em>while<\/em> loop at Line 17 has a second terminating condition, which is triggered when the <code>found<\/code> pointer isn&#8217;t <code>NULL<\/code>, i.e., when the matching text has been found.<\/p>\n<p>By the way, you could re-write Line 30 to read:<\/p>\n<pre><code>if(found)\r\n    break;<\/code><\/pre>\n<p>And line 34 to read:<\/p>\n<pre><code>if(!found)<\/code><\/pre>\n<p>Both statements read better that way, but the concepts may not be as clearly presented as shown in the code.<\/p>\n<p>Here&#8217;s the output:<\/p>\n<pre><code>Finding 'face' in:\r\nWas this the face that launch'd a thousand ships\r\n\r\nFound at offset 14.<\/code><\/pre>\n<p>Test it a few times by replacing the <code>haystack<\/code> and <code>needle<\/code> strings with text of your own concoction.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The C library has its string-searching functions, but you learn more when you do it yoursef. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=901\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-901","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/901","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=901"}],"version-history":[{"count":10,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/901\/revisions"}],"predecessor-version":[{"id":946,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/901\/revisions\/946"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=901"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=901"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=901"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}