{"id":5113,"date":"2022-01-01T00:01:08","date_gmt":"2022-01-01T08:01:08","guid":{"rendered":"https:\/\/c-for-dummies.com\/blog\/?p=5113"},"modified":"2022-01-15T08:47:15","modified_gmt":"2022-01-15T16:47:15","slug":"a-tally-of-unique-words-part-iv","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=5113","title":{"rendered":"A Tally of Unique Words, Part IV"},"content":{"rendered":"<p>In <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5105\">our last episode<\/a>, the unique words code is able to parse and list individual words in the buffer. To find unique and duplicate words, the next step is to sort the list.<br \/>\n<!--more--><br \/>\nThe word list is held in a dynamically allocated array of pointers, <code>**list<\/code>. To sort this list, I use my old pal <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=1287\">the <em>qsort()<\/em> function<\/a>. The function has this horrible <em>man<\/em> page format:<\/p>\n<p><code>void qsort(void *base, size_t nel, size_t width, int (*compar)(const void *, const void *));<\/code><\/p>\n<p>The first argument is the base of the list to sort. For my word list, it&#8217;s the name of the <code>**list<\/code> pointer: <code>list<\/code>. That&#8217;s it, no asteriskses.<\/p>\n<p>The second argument is the number of items to sort. In the code, this quantity is stored in the <code>count<\/code> variable. So far so good.<\/p>\n<p>The third argument is the comparison function, which is pretty much boilerplate except that the items to be sorted are in a double-pointer list. Fret not! I&#8217;ve already covered this topic in <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=1293\">an older blog post<\/a>.<\/p>\n<p>To update the code, the <em>compare()<\/em> function is added before the <em>main()<\/em> function:<\/p>\n<pre class=\"screen\">\r\nint compare(const void *a, const void *b)\r\n{\r\n    return( strcmp( *(const char **)a, *(const char **)b ));\r\n}<\/pre>\n<p>Again, read my previous blog post if you want to assure yourself that I didn&#8217;t just randomly type asterisks in that <em>return<\/em> statement.<\/p>\n<p>The other change to the code is to add the <em>qsort()<\/em> function between the final <em>while<\/em> and <em>for<\/em> loops:<\/p>\n<p><code>qsort(list,count,sizeof(char *),compare);<\/code><\/p>\n<p>That&#8217;s it. The list is sorted. Here is some of the program&#8217;s output:<\/p>\n<p><code>&nbsp;&nbsp;1:And<br \/>\n&nbsp;&nbsp;2:And<br \/>\n&nbsp;&nbsp;3:And<br \/>\n&nbsp;&nbsp;4:But<br \/>\n&nbsp;&nbsp;5:By<br \/>\n&nbsp;&nbsp;6:I<br \/>\n...<br \/>\n107:to<br \/>\n108:to<br \/>\n109:to<br \/>\n110:too<br \/>\n111:too<br \/>\n112:untrimmed<br \/>\n113:wand'rest<br \/>\n114:winds<\/code><\/p>\n<p>The list is sorted and the duplicates are easy to spy. But one problem is apparent right away if you peruse the entire list: The sorting method doesn&#8217;t catch any case differences. The list shows three uppercase words <em>And<\/em> and two lowercase words <em>and<\/em> separately.<\/p>\n<p>This fix for this problem is rather sneaky: Replace the <em>strcmp()<\/em> function in the <em>compare()<\/em> function&#8217;s <em>return<\/em> statement with <em>strcasecamp()<\/em>, which compares strings without regards to letter case.<\/p>\n<p>The <em>strcasecmp()<\/em> function isn&#8217;t part of the standard C library. To use it, you must include the <code>strings.h<\/code> header file. You could use <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=4657\">my own version of the function<\/a>, but it has a flaw: It&#8217;s unable to distinguish between matching words of different lengths, such as &#8220;to&#8221; and &#8220;too.&#8221; I have an update to my own <em>strcasecmp()<\/em> function coming in <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5160\">a future Lesson<\/a>.<\/p>\n<p><a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2022_01_01-Lesson.c\" rel=\"noopener\" target=\"_blank\">Click here<\/a> to view the all the modifications to the code on GitHub. In this update, the <em>strcmp()<\/em> is replaced with <em>strcasecmp()<\/em> in the <em>compare()<\/em> function. Oh, and the <code>strings.h<\/code> header file is included.<\/p>\n<p>Here is the updated output:<\/p>\n<p><code>&nbsp;&nbsp;1:a<br \/>\n&nbsp;&nbsp;2:a<br \/>\n&nbsp;&nbsp;3:all<br \/>\n&nbsp;&nbsp;4:And<br \/>\n&nbsp;&nbsp;5:And<br \/>\n&nbsp;&nbsp;6:and<br \/>\n...<br \/>\n107:to<br \/>\n108:to<br \/>\n109:too<br \/>\n110:too<br \/>\n111:untrimmed<br \/>\n112:wand'rest<br \/>\n113:When<br \/>\n114:winds<\/code><\/p>\n<p>You can see the all duplicates right away, regardless of case. The next step is to add code to find the unique words and the duplicates. This update is covered in <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5121\">next week&#8217;s Lesson<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>To find unique words in a let, you start by sorting the list. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5113\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-5113","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5113","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5113"}],"version-history":[{"count":9,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5113\/revisions"}],"predecessor-version":[{"id":5166,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5113\/revisions\/5166"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5113"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5113"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5113"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}