{"id":5072,"date":"2021-12-04T00:01:50","date_gmt":"2021-12-04T08:01:50","guid":{"rendered":"https:\/\/c-for-dummies.com\/blog\/?p=5072"},"modified":"2021-11-27T10:07:11","modified_gmt":"2021-11-27T18:07:11","slug":"parsing-words-with-the-strspn-function","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=5072","title":{"rendered":"Parsing Words with the <em>strspn()<\/em> Function"},"content":{"rendered":"<p>I&#8217;ve dabbled on the topic of parsing words from a string several times on this blog: <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=3612\">Slicing Words from a String<\/a>, <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=3616\">Parse and Count Words in a String<\/a>, and more. I just can&#8217;t have enough! In fact, this Lesson picks up the topic again, continuing my discussion of the <em>strspn()<\/em> and <em>strcspn()<\/em> functions from <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5068\">last week&#8217;s Lesson<\/a>.<br \/>\n<!--more--><br \/>\nYes, it&#8217;s possible to use the <em>strspn()<\/em> function to slice words from a string. This function, which is probably pronounced &#8220;string span,&#8221; continuously scans one string as long as it contains characters referenced in another string. If the second string contains letters of the alphabet, both upper- and lowercase (and forgetting about contractions), the <em>strspn()<\/em> function handily returns an offset representing word boundaries in the first string.<\/p>\n<h3><a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2021_12_04-Lesson.c\" rel=\"noopener\" target=\"_blank\">2021_12_04-Lesson.c<\/a><\/h3>\n<pre class=\"screen\">\r\n#include &lt;stdio.h&gt;\r\n#include &lt;string.h&gt;\r\n\r\nint main()\r\n{\r\n    const char *a = \"It was a dark and stormy night\";\r\n    const char *b =\r\n        \"ABCDEFGHIJKLMNOPQRSTUVWXYZ\"\r\n        \"abcdefghijklmnopqrstuvwxyz\"\r\n        ;\r\n    size_t r = 0;\r\n\r\n    do\r\n    {\r\n        printf(\"%s\\n\",a);\r\n        r = strspn(a,b);\r\n        a += r+1;\r\n    }\r\n    while(r&lt;strlen(a));\r\n    printf(\"%s\\n\",a);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>At Lines 6 and 7, I declare <em>const char<\/em> <code>*a<\/code> and <code>*b<\/code> as pointer strings. If you use this construction, remember the <em>const<\/em> classifier. It ensures that the string remains unaltered, which is important for this type of string declaration. If not, you can get into trouble modifying the string.<\/p>\n<p>String <code>*b<\/code> is declared on two lines, two separate string tokens, which is a valid construction in C. Refer to <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5020\">this Lesson<\/a> if confusion overwhelms you.<\/p>\n<p>The <em>strspn()<\/em> function appears in the a <em>do-while<\/em> loop starting at Line 13. First, at Line 15, the entire string is output. Next, at Line 16 the <em>strspn()<\/em> function returns the offset of the first non-alpha character in string <code>*a<\/code>. This offset, <code>r<\/code>, is added to pointer <code>a<\/code> at line 17, plus one to skip over the space. The assumption here is that words are separated by only one space &mdash; a program flaw, but moving on:<\/p>\n<p>The <em>while<\/em> loop repeats as long as variable <code>r<\/code> (the offset) is less than the length of string <code>a<\/code>: <code>while(r&lt;strlen(a))<\/code> This approach should work, though an extra <em>printf()<\/em> statement is required at Line 20 to output the final word in the string.<\/p>\n<p>Here is a sample run:<\/p>\n<p><code>It was a dark and stormy night<br \/>\nwas a dark and stormy night<br \/>\na dark and stormy night<br \/>\ndark and stormy night<br \/>\nand stormy night<br \/>\nstormy night<br \/>\nnight<\/code><\/p>\n<p>The code isn&#8217;t perfect, of course. As I mentioned earlier, it doesn&#8217;t account for contractions (the apostrophe in a word) or more than a single space between words. Based on the output, it doesn&#8217;t parse the words but merely progresses word-by-word through the string.<\/p>\n<p>You could improve the code, but I&#8217;d like to move on to a larger project presented in a series of Lessons: Parsing and counting words with the eventual goal of extracting unique words from a chunk of text. This process begins with next week&#8217;s Lesson.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Put the <em>strspn()<\/em> function to work plowing through a string to pluck out words. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5072\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-5072","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5072","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5072"}],"version-history":[{"count":5,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5072\/revisions"}],"predecessor-version":[{"id":5088,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5072\/revisions\/5088"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5072"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5072"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5072"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}