{"id":6098,"date":"2023-11-11T00:01:06","date_gmt":"2023-11-11T08:01:06","guid":{"rendered":"https:\/\/c-for-dummies.com\/blog\/?p=6098"},"modified":"2023-11-18T09:17:25","modified_gmt":"2023-11-18T17:17:25","slug":"finding-four-letter-words","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=6098","title":{"rendered":"Finding Four-Letter Words"},"content":{"rendered":"<p>Not all the nasty words are four letters long, but a good chunk of them are. If you ran the program from <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=6082\">last week&#8217;s Lesson<\/a>, you can quickly check the computer&#8217;s dictionary for the words you once couldn&#8217;t say on TV, gleefully typing them in and confirming that they exist in the dictionary. But how many four letter words are there?<br \/>\n<!--more--><br \/>\nI was going to make this Lesson&#8217;s code a monthly challenge because it took me some time to work out a solution. I assumed it would be easy, but a few things tripped me up.<\/p>\n<p>Rather than use the <em>strlen()<\/em> function, I decide to check the input buffer, array <code>word[]<\/code>, for a newline at element 4: <code>if( word[4]=='\\n' )<\/code> Such a test yields a four-letter word: four characters plus the newline.<\/p>\n<p>Upon testing the program, however, I found a bunch of possessives included, such as <code>it's<\/code>. To remove them, I added another test: <code>if( word[2]=='\\'' )<\/code> If true, the possessive is skipped over.<\/p>\n<p>But the biggest problem I had was re-using the <code>word[]<\/code> buffer when scanning the dictionary. This buffer is filled with each word in the dictionary, but it&#8217;s not erased or re-initialized. The effect is that characters linger in the buffer, which can lead to false positives.<\/p>\n<p>For example, a four letter word followed by a two-letter word means the newline at offset <code>word[4]<\/code> still is present. The program would spit out the two-letter word as a match when it isn&#8217;t. To remedy this situation, I use the <em>memset()<\/em> function to clear the <code>word[]<\/code> buffer for each iteration of the <em>while<\/em> loop.<\/p>\n<p>Here is the full code:<\/p>\n<h3><a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2023_11_11-Lesson.c\" rel=\"noopener\" target=\"_blank\">2023_11_11-Lesson.c<\/a><\/h3>\n<pre class=\"screen\">\r\n#include &lt;stdio.h&gt;\r\n#include &lt;stdlib.h&gt;\r\n#include &lt;string.h&gt;\r\n\r\n<span class=\"comments\">\/* this code assumes the following path is valid *\/<\/span>\r\n#define DICTIONARY \"\/usr\/share\/dict\/words\"\r\n#define SIZE 32\r\n\r\nint main()\r\n{\r\n    FILE *dict;\r\n    int wc;\r\n    char word[SIZE],*r;\r\n\r\n    <span class=\"comments\">\/* open the dictionary *\/<\/span>\r\n    dict = fopen(DICTIONARY,\"r\");\r\n    if( dict==NULL )\r\n    {\r\n        fprintf(stderr,\"Unable to open %s\\n\",DICTIONARY);\r\n        exit(1);\r\n    }\r\n\r\n    <span class=\"comments\">\/* scan for four-letter words *\/<\/span>\r\n    wc = 0;\r\n    while( !feof(dict) )\r\n    {\r\n        memset(word,'\\0',SIZE);        <span class=\"comments\">\/* clear buffer *\/<\/span>\r\n        r = fgets(word,SIZE,dict);    <span class=\"comments\">\/* read a word *\/<\/span>\r\n        if( r==NULL )\r\n            break;\r\n        if( word[2]=='\\'' )            <span class=\"comments\">\/* skip possessives *\/<\/span>\r\n            continue;\r\n        if( word[4]=='\\n' )            <span class=\"comments\">\/* four-letter word *\/<\/span>\r\n        {\r\n            printf(\"%s\",word);\r\n            wc++;\r\n        }\r\n    }\r\n    printf(\"I found %d four-letter words!\\n\",wc);\r\n\r\n    <span class=\"comments\">\/* close *\/<\/span>\r\n    fclose(dict);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>The code borrows from examples shown in previous Lessons. The updated <em>while<\/em> loop includes calling <em>memset()<\/em> to clear the input buffer, fetching a new word, then culling out four-letter words that lack an apostrophe at the third character.<\/p>\n<p>The output shows all of the four-letter words in the digital dictionary. This list includes abbreviations and plurals &mdash; and, yes, dirty words. Here&#8217;s a snippet of the output on my computer:<\/p>\n<p><code>ABCs<br \/>\nABMs<br \/>\nACLU<br \/>\nACTH<br \/>\n...<br \/>\nzone<br \/>\nzoom<br \/>\nzoos<br \/>\nI found 3376 four-letter words!<\/code><\/p>\n<p>The <em>memset()<\/em> function was the key to making this program run properly. I&#8217;ve <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5242\">written about this function before<\/a>, and how it will be deprecated in the C23 update. The problem is optimization, where this function&#8217;s behavior can be skipped if the compiler believes its clean-up loop doesn&#8217;t do anything. Alternative functions are available, though for this example <em>memset()<\/em>&#8216;s flaws shouldn&#8217;t affect the output.<\/p>\n<p>Upon reflection, I did make one improvement to the code: The word length value can be set as a constant. For exmample:<\/p>\n<p><code>#define LENGTH 4<\/code><\/p>\n<p>The code can be updated to reflect a flexible word size:<\/p>\n<p><code>if( word[LENGTH-2]=='\\'' )<\/code><\/p>\n<p>and:<\/p>\n<p><code>if( word[LENGTH]=='\\n' )<\/code><\/p>\n<p>These changes come in handy for <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=6104\">next week&#8217;s Lesson<\/a>, which continues my exploration of the digital dictionary file.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Yes, the computer dictionary contains a lot of four letter words! <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=6098\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-6098","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/6098","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6098"}],"version-history":[{"count":4,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/6098\/revisions"}],"predecessor-version":[{"id":6118,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/6098\/revisions\/6118"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6098"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6098"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6098"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}