{"id":6070,"date":"2023-10-21T00:01:57","date_gmt":"2023-10-21T07:01:57","guid":{"rendered":"https:\/\/c-for-dummies.com\/blog\/?p=6070"},"modified":"2023-10-28T08:32:44","modified_gmt":"2023-10-28T15:32:44","slug":"finding-the-long-words","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=6070","title":{"rendered":"Finding the Long Words"},"content":{"rendered":"<p>Beyond knowing how many words are in the computer&#8217;s dictionary, another good measure to know is how many characters are in the longest word. Together, these two values give you a profile for the complete word matrix.<br \/>\n<!--more--><br \/>\nContinuing the exploration of the Linux\/Unix\/macOS dictionary from <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=6065\">last week&#8217;s Lesson<\/a>, the task at hand is to find the longest word stored in the dictionary file, <code>\/usr\/share\/dictionary\/words<\/code>. In last week&#8217;s code, defined constant <code>SIZE<\/code> is set to 32 and used in the <em>fgets()<\/em> function to scoop out lines of text from the file and output the words.<\/p>\n<p>The value 32 is a guess. When a word is longer, the <em>fgets()<\/em> function truncates it, which is okay, but the total word count would be off. That&#8217;s because the file position indicator remains on the same line as the truncated word, which alters the total word count output. I know this situation didn&#8217;t occur because the second program (from last week&#8217;s Lesson) just counted newlines and the results are same &mdash; at least for the dictionary I installed.<\/p>\n<p>Knowing the maximum size of a word in the dictionary is important if you want to manipulate the stored words. The value 32 is just a guess. For this week&#8217;s dictionary-reading program, I set a larger <code>SIZE<\/code> value to read all the words in the dictionary file. The word count is monitored and successive larger words are output. The result shows the maximum word size.<\/p>\n<h3><a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2023_10_21-Lesson.c\" rel=\"noopener\" target=\"_blank\">2023_10_21-Lesson.c<\/a><\/h3>\n<pre class=\"screen\">\r\n#include &lt;stdio.h&gt;\r\n#include &lt;stdlib.h&gt;\r\n#include &lt;string.h&gt;\r\n\r\n<span class=\"comments\">\/* this code assumes the following path is valid *\/<\/span>\r\n#define DICTIONARY \"\/usr\/share\/dict\/words\"\r\n#define SIZE 1024\r\n\r\nint main()\r\n{\r\n    FILE *dict;\r\n    int maxlen;\r\n    char word[SIZE],*r;\r\n\r\n    <span class=\"comments\">\/* open the dictionary *\/<\/span>\r\n    dict = fopen(DICTIONARY,\"r\");\r\n    if( dict==NULL )\r\n    {\r\n        fprintf(stderr,\"Unable to open %s\\n\",DICTIONARY);\r\n        exit(1);\r\n    }\r\n\r\n    <span class=\"comments\">\/* find the longest word *\/<\/span>\r\n    maxlen = 0;\r\n    while( !feof(dict) )\r\n    {\r\n        r = fgets(word,SIZE,dict);    <span class=\"comments\">\/* read a word *\/<\/span>\r\n        if( r==NULL )\r\n            break;\r\n        if( strlen(word) &gt; maxlen )\r\n        {\r\n            printf(\"%s\",word);\r\n            maxlen = strlen(word);\r\n        }\r\n    }\r\n\r\n    <span class=\"comments\">\/* results *\/<\/span>\r\n    printf(\"The longest word is %d characters long\\n\",maxlen);\r\n\r\n    <span class=\"comments\">\/* close *\/<\/span>\r\n    fclose(dict);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>Defined constant <code>SIZE<\/code> is set to 1024 (1K), which should be adequate for any word in my mother tongue.<\/p>\n<p>After the dictionary file is opened, variable <code>maxlen<\/code> is initialized to zero: <code>maxlen = 0;<\/code><\/p>\n<p>A <em>while<\/em> loop scans the dictionary file reading words just like last week&#8217;s Lesson The <em>strlen()<\/em> function returns the word&#8217;s length and compares it with the value stored in <code>maxlen<\/code>: <code>if( strlen(word) &gt; maxlen )<\/code> When the value is greater, the word is output and a new value for <code>maxlen<\/code> is set.<\/p>\n<p>The program ends with a <em>printf()<\/em> statement that outputs the character count for the longest word.<\/p>\n<p>Here&#8217;s a sample run:<\/p>\n<p><code>A<br \/>\nAA<br \/>\nAAA<br \/>\nAA's<br \/>\nABC's<br \/>\nACLU's<br \/>\nANZUS's<br \/>\nAachen's<br \/>\nAaliyah's<br \/>\nAberdeen's<br \/>\nAbernathy's<br \/>\nAbyssinian's<br \/>\nAdirondacks's<br \/>\nAfrocentrism's<br \/>\nAmericanization<br \/>\nAmericanization's<br \/>\nAndrianampoinimerina<br \/>\nAndrianampoinimerina's<br \/>\nelectroencephalograph's<br \/>\nThe longest word is 24 characters long<\/code><\/p>\n<p>Pretty!<\/p>\n<p>My initial guess value from last week&#8217;s Lesson was close, seeing how the longest word in my system&#8217;s dictionary is 24 characters and the original <code>SIZE<\/code> value was set to 32. Different dictionaries yield different results, with the obnoxiously huge dictionary files containing scientific and technical words that may greatly exceed 24 characters.<\/p>\n<p>The point of this exercise isn&#8217;t just to know the type of matrix in which the words are stored (total word count and word size), but to avoid potential overflow. It&#8217;s too easy to guess at a shorter buffer, which not only can crop the output but can lead to misreading the words and messing up the results.<\/p>\n<p>I have more fun with the dictionary in <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=6078\">next week&#8217;s Lesson<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>To properly manipulate the word dictionary, you must know both the number of words and the size of the longest word. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=6070\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-6070","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/6070","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6070"}],"version-history":[{"count":4,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/6070\/revisions"}],"predecessor-version":[{"id":6089,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/6070\/revisions\/6089"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6070"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6070"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6070"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}