{"id":6065,"date":"2023-10-14T00:01:59","date_gmt":"2023-10-14T07:01:59","guid":{"rendered":"https:\/\/c-for-dummies.com\/blog\/?p=6065"},"modified":"2023-10-21T08:57:29","modified_gmt":"2023-10-21T15:57:29","slug":"reading-the-dictionary","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=6065","title":{"rendered":"Reading the Dictionary"},"content":{"rendered":"<p>I admit it: I&#8217;m a nerd and I read the dictionary. I know it&#8217;s a reference, not a work of fiction. The plot is weak. But I found it enjoyable as a kid to discover new words and their meanings. Alas, the Unix dictionary file lists only words and not definitions. But how many words are in there?<br \/>\n<!--more--><br \/>\nThe word count is one of the first useful programs introduced in the original K&#038;R, <em>The C Programming Language<\/em>. It&#8217;s simple code, but you don&#8217;t even need it to read words from the Unix dictionary file: Each word is kept on a line by itself. Essentially, all you need is to count the lines in the file. Or if you want to be more exactly, count the number of newline characters, <code>\\n<\/code>.<\/p>\n<p>The point of counting the words is to know the limit. For example, so that a program can pluck out a random word without reading beyond the end of the file. Even then, accessing and counting the words (or lines) in a file is a good exercise.<\/p>\n<p>The following code outputs the contents of the dictionary file, aliased to <code>\/usr\/share\/dict\/words<\/code> on Unix\/Linux\/macOS systems. (Though the path may not be consistent on all systems.) I cover accessing the file in <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=6054\">last week&#8217;s Lesson<\/a>. This code opens the file, reads in and outputs each line, increments a counting variable, and reports the results.<\/p>\n<h3><a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2023_10_14-Lesson-a.c\" rel=\"noopener\" target=\"_blank\">2023_10_14-Lesson-a.c<\/a><\/h3>\n<pre class=\"screen\">\r\n<span class=\"comments\">\/* Look up the dictionary *\/<\/span>\r\n#include &lt;stdio.h&gt;\r\n#include &lt;stdlib.h&gt;\r\n\r\n<span class=\"comments\">\/* this code assumes the following path is valid *\/<\/span>\r\n#define DICTIONARY \"\/usr\/share\/dict\/words\"\r\n#define SIZE 32\r\n\r\nint main()\r\n{\r\n    FILE *dict;\r\n    int wc;\r\n    char word[SIZE],*r;\r\n\r\n    <span class=\"comments\">\/* open the dictionary *\/<\/span>\r\n    dict = fopen(DICTIONARY,\"r\");\r\n    if( dict==NULL )\r\n    {\r\n        fprintf(stderr,\"Unable to open %s\\n\",DICTIONARY);\r\n        exit(1);\r\n    }\r\n\r\n    <span class=\"comments\">\/* read and tally the words *\/<\/span>\r\n    wc = 0;\r\n    while( !feof(dict) )\r\n    {\r\n        r = fgets(word,SIZE,dict);    <span class=\"comments\">\/* read a word *\/<\/span>\r\n        if( r==NULL )\r\n            break;\r\n        printf(\"%s\",word);    <span class=\"comments\">\/* words are \\n terminated *\/<\/span>\r\n        wc++;\r\n    }\r\n\r\n    <span class=\"comments\">\/* results *\/<\/span>\r\n    printf(\"The dictionary file contains %d words\\n\",wc);\r\n\r\n    <span class=\"comments\">\/* close *\/<\/span>\r\n    fclose(dict);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>The dictionary file name is held in defined constant <code>DICTIONARY<\/code>. Variable <code>wc<\/code> counts the words, or lines in the file. The <em>fgets()<\/em> statement reads in each line, where the string length is set to the value defined by <code>SIZE<\/code>, or 32 characters (31 characters plus the null character).<\/p>\n<p>A <em>printf()<\/em> statement outputs each word. If you eliminate this statement the code runs faster, but it&#8217;s still pretty fast.<\/p>\n<p>Here is the output from my system:<\/p>\n<p><code>The dictionary file contains 104334 words<\/code><\/p>\n<p>To read only the newlines in the file, which yields the same result, replace the <code>word[]<\/code> array with single character variable <code>ch<\/code>. Modify fhe <em>while<\/em> loop as well:<\/p>\n<pre class=\"screen\">\r\n    <span class=\"comments\">\/* read and tally the words *\/<\/span>\r\n    wc = 0;\r\n    while( !feof(dict) )\r\n    {\r\n        ch = fgetc(dict);\r\n        if( ch=='\\n' )\r\n            wc++;\r\n    }<\/pre>\n<p>The output is the same, though this update makes the code run slower, probably because of the single-character buffering.<\/p>\n<p>You can obtain this updated code <a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2023_10_14-Lesson-b.c\" rel=\"noopener\" target=\"_blank\">on GitHub<\/a>.<\/p>\n<p>I continue my exploration of the dictionary file in <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=6070\">next week&#8217;s Lesson<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you are to code a dictionary program, it helps to know how many words it contains. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=6065\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-6065","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/6065","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6065"}],"version-history":[{"count":4,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/6065\/revisions"}],"predecessor-version":[{"id":6081,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/6065\/revisions\/6081"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6065"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6065"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6065"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}