{"id":5963,"date":"2023-07-29T00:01:13","date_gmt":"2023-07-29T07:01:13","guid":{"rendered":"https:\/\/c-for-dummies.com\/blog\/?p=5963"},"modified":"2023-07-22T10:32:26","modified_gmt":"2023-07-22T17:32:26","slug":"using-scanf-to-build-a-string-part-iv","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=5963","title":{"rendered":"Using <em>scanf()<\/em> to Build a String &#8211; Part IV"},"content":{"rendered":"<p>I refer to the process of converting special characters into strings as tokenizing. The token is a character or string &mdash; a code. This code is translated into something else, which allows the program to deal with complex items in a simple manner.<br \/>\n<!--more--><br \/>\nBecause the <em>scanf()<\/em> function naturally ignores whitespace characters (space, tab, newline), these are the three tokens I plan to convert. To update the code from <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5957\">last week&#8217;s Lesson<\/a>, I add a <em>token()<\/em> function to convert the special strings (tokens), starting with the word END to terminate the string.<\/p>\n<p>The <em>token()<\/em> function accepts the string generated from the <em>scanf()<\/em> function and compares it with END. Originally I had the function return a single <em>char<\/em> value. The problem was the overhead to handle single characters in the <em>main()<\/em> function. So instead of a single character, the <em>token()<\/em> function returns a string (<em>char<\/em> pointer). For the END token, the string (pointer) returned is <code>NULL<\/code>.<\/p>\n<h3><a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2023_07_29-Lesson-a.c\" rel=\"noopener\" target=\"_blank\">2023_07_29-Lesson-a.c<\/a><\/h3>\n<pre class=\"screen\">\r\n#include &lt;stdio.h&gt;\r\n#include &lt;stdlib.h&gt;\r\n#include &lt;string.h&gt;\r\n\r\n#define SIZE 16\r\n\r\nchar *token(char *s)\r\n{\r\n    <span class=\"comments\">\/* test for special strings *\/<\/span>\r\n    if( strcmp(s,\"END\")==0 )\r\n        return(NULL);\r\n\r\n    return(s);\r\n}\r\n\r\nint main()\r\n{\r\n    char *b,*s;\r\n\r\n    <span class=\"comments\">\/* allocate\/initialize buffers *\/<\/span>\r\n    b = malloc( SIZE * sizeof(char) );    <span class=\"comments\">\/* input *\/<\/span>\r\n    s = malloc( sizeof(char) );            <span class=\"comments\">\/* string *\/<\/span>\r\n    if( b==NULL || s==NULL )\r\n    {\r\n        fprintf(stderr,\"Memory allocation error\\n\");\r\n        exit(1);\r\n    }\r\n    <span class=\"comments\">\/* initialize string storage *\/<\/span>\r\n    *b = *s = '\\0';\r\n\r\n    <span class=\"comments\">\/* fetch input *\/<\/span>\r\n    printf(\"Type some Text: \");\r\n    while(1)\r\n    {\r\n        scanf(\"%s\",b);\r\n        b = token(b);\r\n        if( !b )    <span class=\"comments\">\/* NULL *\/<\/span>\r\n            break;\r\n        <span class=\"comments\">\/* copy the word *\/<\/span>\r\n            <span class=\"comments\">\/* add two: space and null char *\/<\/span>\r\n        s = realloc(s,strlen(s) + strlen(b) + 2);\r\n        if( s==NULL )\r\n        {\r\n            fprintf(stderr,\"Reallocation error\\n\");\r\n            exit(1);\r\n        }\r\n        strcat(s,b);\r\n        strcat(s,\" \");\r\n    }\r\n\r\n    <span class=\"comments\">\/* output results *\/<\/span>\r\n    puts(s);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>In the <em>main()<\/em> function, the <em>scanf()<\/em> function fetches string <code>b<\/code>, which is immediately sent to the <em>token()<\/em> function. The string is reassigned when the function returns, replacing the original: <code>b = token(b);<\/code><\/p>\n<p>The <em>token()<\/em> function itself looks only for the text END. If found, <code>NULL<\/code> is returned. Otherwise, the string input (its address stored in <code>s<\/code>) is passed back to the calling function. Remember, this update to the code is merely the first step, which is to create the <em>token()<\/em> function.<\/p>\n<p>After the <em>token()<\/em> function call, an <em>if<\/em> statement tests whether <code>NULL<\/code> is returned: <code>if( !b )<\/code>, which is TRUE when <code>b<\/code> is <code>NULL<\/code>. If so, the loop ends and the string is output.<\/p>\n<p>The program runs the same as the previous version, which is great. The next step is to update it to scan for tokens SP for space, NL for newline, and TB for tab. This update requires revising both the <em>token()<\/em> and <em>main()<\/em> functions.<\/p>\n<h3><a href=\"https:\/\/github.com\/dangookin\/C-For-Dummies-Blog\/blob\/master\/2023_07_29-Lesson-b.c\" rel=\"noopener\" target=\"_blank\">2023_07_29-Lesson-b.c<\/a><\/h3>\n<pre class=\"screen\">\r\n#include &lt;stdio.h&gt;\r\n#include &lt;stdlib.h&gt;\r\n#include &lt;string.h&gt;\r\n\r\n#define SIZE 16\r\n\r\nchar *token(char *s)\r\n{\r\n    static char space[] = \" \";\r\n    static char newline[] = \"\\n\";\r\n    static char tab[] = \"\\t\";\r\n\r\n    <span class=\"comments\">\/* test for special strings *\/<\/span>\r\n    if( strcmp(s,\"END\")==0 )\r\n        return(NULL);\r\n    if( strcmp(s,\"SP\")==0 )\r\n        return(space);\r\n    if( strcmp(s,\"NL\")==0 )\r\n        return(newline);\r\n    if( strcmp(s,\"TB\")==0 )\r\n        return(tab);\r\n\r\n    return(s);\r\n}\r\n\r\nint main()\r\n{\r\n    char *b,*s;\r\n\r\n    <span class=\"comments\">\/* allocate\/initialize buffers *\/<\/span>\r\n    b = malloc( SIZE * sizeof(char) );    <span class=\"comments\">\/* input *\/<\/span>\r\n    s = malloc( sizeof(char) );            <span class=\"comments\">\/* string *\/<\/span>\r\n    if( b==NULL || s==NULL )\r\n    {\r\n        fprintf(stderr,\"Memory allocation error\\n\");\r\n        exit(1);\r\n    }\r\n    <span class=\"comments\">\/* initialize string storage *\/<\/span>\r\n    *b = *s = '\\0';\r\n\r\n    <span class=\"comments\">\/* fetch input *\/<\/span>\r\n    printf(\"Type some Text: \");\r\n    while(1)\r\n    {\r\n        scanf(\"%s\",b);\r\n        b = token(b);\r\n        if( !b )    <span class=\"comments\">\/* NULL *\/<\/span>\r\n            break;\r\n        <span class=\"comments\">\/* copy the word *\/<\/span>\r\n        s = realloc(s,strlen(s) + strlen(b));\r\n        if( s==NULL )\r\n        {\r\n            fprintf(stderr,\"Reallocation error\\n\");\r\n            exit(1);\r\n        }\r\n        strcat(s,b);\r\n    }\r\n\r\n    <span class=\"comments\">\/* output results *\/<\/span>\r\n    puts(s);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>The <em>token()<\/em> function contains three <em>static char<\/em> arrays representing the replacement strings for the space, newline, and tab characters. As with the scan for END, when tokens SP, NL, or TB are encountered, their string equivalents are returned.<\/p>\n<p>In the <em>main()<\/em> function, I&#8217;ve made two changes. First, the reallocation of string <code>s<\/code> no longer needs storage for two characters added (space and null character). Only the length of the existing string (<code>s<\/code>) and returned string (<code>b<\/code>) are required:<\/p>\n<p><code>s = realloc(s,strlen(s) + strlen(b));<\/code><\/p>\n<p>Second, I eliminated the <em>strcat()<\/em> function that appended a space after each word.<\/p>\n<p>Here&#8217;s a sample run:<\/p>\n<p><code>Type some Text: Hello, SP world! END<br \/>\nHello, world!<\/code><\/p>\n<p>Another more aggressive run:<\/p>\n<p><code>Type some Text: This SP is SP a SP test NL Hello, SP world! END<br \/>\nThis isSPaSPtestHello,NLSPworld!<\/code><\/p>\n<p>Ugh. Yes, this botched output means I messed up a pointer thing. I cover what&#8217;s going wrong and how to fix it in next week&#8217;s Lesson.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Time to get fancy with scanning input, allowing for special characters. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=5963\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-5963","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5963","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5963"}],"version-history":[{"count":4,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5963\/revisions"}],"predecessor-version":[{"id":5977,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/5963\/revisions\/5977"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5963"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5963"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5963"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}