{"id":5130,"date":"2022-01-15T00:01:44","date_gmt":"2022-01-15T08:01:44","guid":{"rendered":"https:\/\/c-for-dummies.com\/blog\/?p=5130"},"modified":"2022-01-08T10:15:04","modified_gmt":"2022-01-08T18:15:04","slug":"a-tally-of-unique-words-part-vi","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=5130","title":{"rendered":"A Tally of Unique Words, Part VI"},"content":{"rendered":"<p>Any mortal programmer would have stopped with <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5121\">last week&#8217;s Lesson<\/a>, where a tally of unique and duplicate words is output. This is the desired result, right? Yes, but it&#8217;s an un-orderly list.<br \/>\n<!--more--><br \/>\nTo separate the unique and different words, each must be stored in a new list. A new list is something I wanted to avoid early on in the program&#8217;s development. Yet, creating a new storage structure is necessary to associates words with their repeat count.<\/p>\n<p>My solution involves creating a structure:<\/p>\n<p><code>struct u {<br \/>\n    char *w;<br \/>\n    int d;<br \/>\n} *unique;<\/code><\/p>\n<p>The <code>u<\/code> structure contains two members: <code>w<\/code>, a pointer to the word inside the <code>**list<\/code> thing; and <code>d<\/code>, the duplicate count. The structure is declared as pointer variable <code>unique<\/code>.<\/p>\n<p>After the word list is sorted, I added the following statements to the code. Storage is allocated for the <code>unique<\/code> structures, with the number of structures allocated matching the word count as stored in variable <code>count<\/code>. This value is the maximum size required, assuming that each word is unique:<\/p>\n<pre class=\"screen\">\r\nunique = malloc( sizeof(struct u) * count );\r\nif( unique==NULL )\r\n{\r\n    fprintf(stderr,\"Unable to allocate structures\\n\");\r\n    exit(1);\r\n}<\/pre>\n<p>Immediately following this allocation, the <em>for<\/em> loop that scans the list for duplicates is updated. Two statements are added, shown at the end of the loop block:<\/p>\n<pre class=\"screen\">\r\ndup = 1;\r\nfor( x=0,index=0; x&lt;count-1; x+=dup,index++ )\r\n{\r\n    dup = 1;\r\n    while( strcasecmp(*(list+x),*(list+x+dup))==0 )\r\n    {\r\n        dup++;\r\n    }\r\n    (unique+index)-&gt;w = *(list+x);\r\n    (unique+index)-&gt;d = dup;\r\n}\r\nindex--;<\/pre>\n<p>Variables <code>index<\/code> and <code>x<\/code> are initialized together in the <em>for<\/em> loop. (See <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=3122\">this Lesson<\/a> for more details on setting multiple expressions in a <em>for<\/em> loop.)<\/p>\n<p>The <code>index<\/code> variable is used in the final two statements in the block. It references the offset within the list of structures where the word and its count are stored. Each structure is updated with the word&#8217;s address from the <code>**list<\/code> buffer, and the <code>dup<\/code> repeat count.<\/p>\n<p>After the <em>for<\/em> loop builds the <code>unique<\/code> structure list, variable <code>index<\/code> is decremented, <code>index--<\/code>. Its size indicates the number of items &mdash; starting with zero &mdash; in the <code>unique<\/code> list. This code is followed by two new loops that output each group, unique words and duplicates:<\/p>\n<pre class=\"screen\">\r\n<span class=\"comments\">\/* unique words... *\/<\/span>\r\nprintf(\"Unique words:\\n\");\r\nfor( x=0; x&lt;index; x++ )\r\n{\r\n    if( (unique+x)-&gt;d == 1 )\r\n        printf(\"%s \",\r\n                (unique+x)-&gt;w\r\n              );\r\n}\r\nprintf(\"\\n\\n\");\r\n\r\n<span class=\"comments\">\/* duplicates *\/<\/span>\r\nprintf(\"Words appearing more than once:\\n\");\r\nfor( x=0; x&lt;index; x++ )\r\n{\r\n    if( (unique+x)-&gt;d &gt; 1 )\r\n        printf(\"%s (%d) \",\r\n                (unique+x)-&gt;w,\r\n                (unique+x)-&gt;d\r\n              );\r\n}\r\nprintf(\"\\n\\n\");<\/pre>\n<p>Here&#8217;s the final program&#8217;s output:<\/p>\n<p><code>Unique words:<br \/>\nall art as brag breathe buds But By chance changing compare complexion course darling date day death declines dimmed do every eye eyes fade from gives gold grow'st hath heaven hot I is lease life lines lives lose lovely May men nature\u2019s not often ow\u2019st possession Rough see shade shake shines short summer temperate that thy Time untrimmed wand'rest <\/p>\n<p>Words appearing more than once:<br \/>\na (2) And (5) can (2) eternal (2) fair (3) his (2) in (2) long (2) more (2) Nor (2) of (3) or (2) Shall (3) So (2) sometime (2) summer\u2019s (2) the (2) thee (2) this (2) thou (4) to (3) too (2) <\/code><\/p>\n<p>Alas, some words are split on the screen in the output. I thought about tracking the horizontal character output to  avoid splitting words. But, naaa; I didn&#8217;t want to write a Part VII for this series.<\/p>\n<p>You can obtain the full code from my <a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2022_01_15-Lesson.c\" rel=\"noopener\" target=\"_blank\">Github page<\/a>. I ran the program on another, larger text file and it worked. The Macintosh had some issues reading longer text files, but this problem didn&#8217;t occur in other platforms.<\/p>\n<p>This code demonstrates that computers don&#8217;t mind performing a task that would drive a human nuts. As a programmer, your job is to write the code.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Not content with the results desired, I take the code one step further to store the unique and duplicate words. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5130\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-5130","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5130","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5130"}],"version-history":[{"count":6,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5130\/revisions"}],"predecessor-version":[{"id":5163,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5130\/revisions\/5163"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5130"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5130"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5130"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}