{"id":5099,"date":"2021-12-18T00:01:52","date_gmt":"2021-12-18T08:01:52","guid":{"rendered":"https:\/\/c-for-dummies.com\/blog\/?p=5099"},"modified":"2021-12-25T07:50:59","modified_gmt":"2021-12-25T15:50:59","slug":"a-tally-of-unique-words-part-ii","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=5099","title":{"rendered":"A Tally of Unique Words, Part II"},"content":{"rendered":"<p>Continuing with my Unique Words project from <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5090\">last week&#8217;s Lesson<\/a>: Once the buffer contains text, the next step is to parse the words: to split the long string of text stored in memory into separate word chunks. For this task, I turn to my old pall, the <em>strtok()<\/em> function.<br \/>\n<!--more--><br \/>\nThe process starts with the text file <a href=\"https:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2021\/12\/sonnet18.txt\">sonnet18.txt<\/a> opened and stored in a buffer. No matter how many lines of text are in the file, it&#8217;s stored as one long string, which is perfect for <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=1758\">the <em>strtok()<\/em> function<\/a> to slice through. But first some additions must be made to the code.<\/p>\n<p>Three new variables are declared:<\/p>\n<p><code>const char separators<\/code> lists those characters used by the <em>strtok()<\/em> function to parse words from the buffer.<\/p>\n<p><code>char *word<\/code> references words found in the buffer, retaining the address\/offset.<\/p>\n<p><code>int count<\/code> counts the words found.<\/p>\n<p>Here is how the updated variable declaration statements for the <em>main()<\/em> function appear when added to the source code file presented last week:<\/p>\n<pre class=\"screen\">\r\nconst char filename[] = \"sonnet18.txt\";\r\nchar *buffer;\r\nconst char separators[] = \",.:;!?\\n \";\r\nFILE *fp;\r\nchar *word;\r\nint offset,ch,count;<\/pre>\n<p>The <em>printf()<\/em> statement at the end of the <em>main()<\/em> function is removed. A <em>while<\/em> loop is added in its place to parse the buffer, count and output the words. These are the new statements added to the code:<\/p>\n<pre class=\"screen\">\r\ncount = 0;\r\nword = strtok(buffer,separators);\r\nwhile( word )\r\n{\r\n    printf(\"%3d:%s\\n\",count+1,word);\r\n    word = strtok(NULL,separators);\r\n    count++;\r\n}<\/pre>\n<p>The <em>strtok()<\/em> function must be called twice. The initial call at Line 54 (that&#8217;s Line 54 from the full source code file) identifies the buffer and the characters stored in the <code>separators<\/code> string. The value returned is a pointer to the first word in the string, saved in <em>char<\/em> pointer <code>word<\/code>. When the value returned is NULL, <em>strtok()<\/em> has exhausted the search string.<\/p>\n<p>The <em>while<\/em> loop spins as long as new words are parsed from the buffer. The <em>strtok()<\/em> function&#8217;s first argument replaced with NULL at Line 58 to keep scanning the same string. The output generated consists of a long list of words in the buffer:<\/p>\n<p><code>&nbsp;&nbsp;1:Shall<br \/>\n&nbsp;&nbsp;2:I<br \/>\n&nbsp;&nbsp;3:compare<br \/>\n&nbsp;&nbsp;4:thee<br \/>\n&nbsp;&nbsp;5:to<br \/>\n&nbsp;&nbsp;6:a<br \/>\n&nbsp;&nbsp;7:summer\u2019s<br \/>\n&nbsp;&nbsp;8:day<br \/>\n&nbsp;&nbsp;9:Thou<br \/>\n&nbsp;10:art<br \/>\n...<br \/>\n109:and<br \/>\n110:this<br \/>\n111:gives<br \/>\n112:life<br \/>\n113:to<br \/>\n114:thee<\/code><\/p>\n<p><a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2021_12_18-Lesson.c\" rel=\"noopener\" target=\"_blank\">Click here<\/a> to view the full source code in my GitHub repository.<\/p>\n<p>At this point the memory addresses saved in the <code>word<\/code> pointer are lost, continuously overwritten. But this problem is okay! The code confirms that words in the buffer can be counted and parsed, which is another step toward finding unique words and those words that repeat.<\/p>\n<p>Oh, and I don&#8217;t free pointers <code>buffer<\/code> or <code>word<\/code> because all allocated memory is released when the program quits. If it makes you feel good, you can add these statements to the end of the <em>main()<\/em> function, before the <em>return<\/em> statement:<\/p>\n<p><code>free(buffer);<br \/>\nfree(word);<\/code><\/p>\n<p>Freeing buffers is always necessary when they&#8217;re allocated for temporary storage in a function.<\/p>\n<p>In <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5105\">next week&#8217;s Lesson<\/a>, I continue the improvement process by retaining the word pointers allocated.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The next step in locating unique words is to parse the buffer, slicing into separate word chunks. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5099\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-5099","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5099","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5099"}],"version-history":[{"count":5,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5099\/revisions"}],"predecessor-version":[{"id":5126,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5099\/revisions\/5126"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5099"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5099"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5099"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}