{"id":5121,"date":"2022-01-08T00:01:35","date_gmt":"2022-01-08T08:01:35","guid":{"rendered":"https:\/\/c-for-dummies.com\/blog\/?p=5121"},"modified":"2022-01-01T10:17:16","modified_gmt":"2022-01-01T18:17:16","slug":"a-tally-of-unique-words-part-v","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=5121","title":{"rendered":"A Tally of Unique Words, Part V"},"content":{"rendered":"<p>The next step in the unique words program is to tally the count of each word. From <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5113\">last week&#8217;s Lesson<\/a>, the word list is sorted, which makes the task of counting duplicates easy.<br \/>\n<!--more--><br \/>\nAs a quick review, the program&#8217;s code has completed these steps:<\/p>\n<ol>\n<li>A text file is opened and read, stored in dynamic <em>char<\/em> buffer creatively named <code>buffer<\/code>.<\/li>\n<li>The text in <code>buffer<\/code> is parsed, with double-pointer monster <code>**list<\/code> referencing each word.<\/li>\n<li>The word list is sorted in a case-insensitive manner.<\/li>\n<li>The sorted list is output.<\/li>\n<\/ol>\n<p>To count the duplicates, which counts unique words as well, the code scans the list of sorted words, comparing each word in the list with the next word. When the neighbors match, an inner loop keeps working through the list, counting the matches.<\/p>\n<p>To accomplish this task, the <em>for<\/em> loop (from Step 4 above) already in the code is updated from this:<\/p>\n<pre class=\"screen\">\r\nfor( x=0; x&lt;count; x++ )\r\n{\r\n    printf(\"%3d:%s\\n\",x+1,*(list+x));\r\n}<\/pre>\n<p>To this:<\/p>\n<pre class=\"screen\">\r\ndup = 1;\r\nfor( x=0; x&lt;count-1; x+=dup )\r\n{\r\n    dup = 1;\r\n    while( strcasecmp(*(list+x),*(list+x+dup))==0 )\r\n    {\r\n        dup++;\r\n    }\r\n    printf(\"%s (%d)\\n\",*(list+x),dup);\r\n}<\/pre>\n<p>Variable <code>dup<\/code> counts the duplicates. It also helps work through the outer loop by skipping over any repeated words: The value of <code>dup<\/code> is used in the incrementing expression, <code>x+=dup<\/code>. Or when <code>dup==1<\/code>, the word is flagged as unique, and the next word in the list is processed.<\/p>\n<p>The inner <em>while<\/em> loop compares the current word in the list <code>*(list+x)<\/code> with the next word <code>*(list+x+dup)<\/code>. When the words match (case-insensitive), the value of <code>dup<\/code> is incremented, tallying the repeat count, and the <em>while<\/em> loop continues.<\/p>\n<p>After all repeating words are found, a <em>printf()<\/em> statement outputs the word and its repeat value.<\/p>\n<p>Here is the first part of the output:<\/p>\n<p><code>a (2)<br \/>\nall (1)<br \/>\nAnd (5)<br \/>\nart (1)<br \/>\nas (1)<br \/>\nbrag (1)<br \/>\nbreathe (1)<br \/>\nbuds (1)<br \/>\nBut (1)<br \/>\n...<\/code><\/p>\n<p>The full code is available on my <a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2022_01_08-Lesson.c\" rel=\"noopener\" target=\"_blank\">Github page<\/a>.<\/p>\n<p>The final task is to output the unique and duplicate words. This job could be done now: Flag those words with a <code>dup<\/code> count of one as unique, with the rest output as duplicates along with their repeat value. Here is an <em>if-else<\/em> structure to replace the <em>printf()<\/em> statement in the above code snippet:<\/p>\n<pre class=\"screen\">\r\nif( dup==1 )\r\n    printf(\"Unique: %s\\n\",*(list+x));\r\nelse\r\n    printf(\"Duplicate: %s (%d)\\n\",*(list+x),dup);<\/pre>\n<p>The output now looks like this:<\/p>\n<p><code>Duplicate: a (2)<br \/>\nUnique: all<br \/>\nDuplicate: And (5)<br \/>\nUnique: art<br \/>\nUnique: as<br \/>\nUnique: brag<br \/>\nUnique: breathe<br \/>\nUnique: buds<br \/>\nUnique: But<\/code><\/p>\n<p>This result may be fine for the purpose of accomplishing the task, but I want to list each group separately, unique and then duplicates along with their count. This process continues with next week&#8217;s Lesson, the final in this series.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Before you can identify unique and duplicate words, you must count them. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5121\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-5121","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5121","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5121"}],"version-history":[{"count":9,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5121\/revisions"}],"predecessor-version":[{"id":5150,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5121\/revisions\/5150"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5121"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5121"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5121"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}