{"id":282,"date":"2013-09-14T00:01:05","date_gmt":"2013-09-14T08:01:05","guid":{"rendered":"http:\/\/c-for-dummies.com\/blog\/?p=282"},"modified":"2013-09-07T07:19:36","modified_gmt":"2013-09-07T15:19:36","slug":"conversion-character-abuse","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=282","title":{"rendered":"Conversion Character Abuse"},"content":{"rendered":"<p>The <em>printf()<\/em> function is most concerned with getting the number of conversion characters &#8212; the <code>%<\/code> placeholders &#8212; to match the number of variables specified. Beyond that, it&#8217;s rather ambivalent as to whether the types match properly.<br \/>\n<!--more--><br \/>\nLast week&#8217;s <a href=\"http:\/\/c-for-dummies.com\/blog\/?p=209\">blog lesson<\/a> discussed the virtues of <em>signed<\/em> and <em>unsigned<\/em> variable types, specifically with regards to how information is stored in memory. The compiler treats both types differently, but the information in memory is unaffected by how the compiler &#8212; and the running program &#8212; see the value.<\/p>\n<p>Yeah, this is weird, but as long as you mind your <em>signed<\/em> and <em>unsigned<\/em> variable types in the code, you&#8217;ll be okay.<\/p>\n<p>What gets even weirder is when the <em>printf()<\/em> conversion characters come into play.<\/p>\n<p>If you&#8217;ve read my books or worked some of the examples on this web site, you&#8217;ve seen me occasionally use the <code>%d<\/code> placeholder in a <em>printf()<\/em> statement to display a <em>char<\/em> value. You&#8217;d expect to use the <code>%c<\/code> placeholder, which displays the character represented by the stored value. When you use <code>%d<\/code>, however, you see the character&#8217;s code value, which is closer to the information actually stored in memory.<\/p>\n<p>The conversion characters are merely interpreters.<\/p>\n<p>Consider the following code:<\/p>\n<pre class=\"screen\">\r\n#include &lt;stdio.h&gt;\r\n\r\nint main()\r\n{\r\n    char c = 0x40;\r\n\r\n    printf(\"%c\\n\",c);\r\n    printf(\"%d\\n\",c);\r\n    printf(\"%u\\n\",c);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>In the code, the <em>char<\/em> variable <code>c<\/code> is displayed by using three different conversion characters: <code>%c<\/code> shows the result as a character, <code>%d<\/code> shows the result as a <em>signed int<\/em> value, and <code>%u<\/code> shows the result as an <em>unsigned int<\/em> value. The code compiles without any warnings or errors.<\/p>\n<p>Of course, it&#8217;s not a conversion character free-for-all in the <em>printf()<\/em> function. If you add another line to the code:<\/p>\n<p><code>printf(\"%s\\n\",c);<\/code><\/p>\n<p>You see a compiler warning displayed about mismatched types, specifically that variable <code>c<\/code> is not a pointer. The compiler is very concerned about pointer (and therefore string) errors because they deal with memory locations. The operating system jealously protects memory. Running the code after such a warning most likely generates a segmentation fault or other hideous error.<\/p>\n<p>Beyond pointers, the use of the proper conversion character in <em>printf()<\/em> is pretty much up to interpretation, which brings me to the puzzle presented in last week&#8217;s lesson:<\/p>\n<pre><code>140\t4294967180\r\n141\t4294967181\r\n142\t4294967182\r\n143\t4294967183<\/code><\/pre>\n<p>When you use the <code>%u<\/code> placeholder on an <em>signed<\/em> variable, the results are not open to interpretation: They&#8217;re wrong.<\/p>\n<p>Here&#8217;s the code that generated the above output:<\/p>\n<pre class=\"screen\">\r\n#include &lt;stdio.h&gt;\r\n\r\nint main()\r\n{\r\n    unsigned char a;\r\n    signed char b;\r\n    int x;\r\n\r\n    a = b = 0;\r\n    for(x=0;x&lt;400;x++)\r\n    {\r\n        printf(\"%3d\\t%3u\\n\",a,b);\r\n        a++; b++;\r\n    }\r\n    return(0);\r\n}<\/pre>\n<p>Variable <code>b<\/code> is declared as a <em>signed<\/em> value. It can be positive or negative. When the <code>%u<\/code> (or in the code, <code>%3u<\/code>) placeholder is used, the <em>signed<\/em> value stored is interpreted as <em>unsigned<\/em>. The same value is stored in variable <code>a<\/code>, an <em>unsigned char<\/em> variable. So how does the computer make 4294967183 out of 143?<\/p>\n<p>Before unraveling that riddle, remember that the range of a <em>signed char<\/em> variable is from -128 through 127. Because <code>b<\/code> is declared as <em>signed<\/em>, any value over 127 placed into that storage container is interpreted as negative. Variable <code>a<\/code> sees the value as positive. So when <em>unsigned<\/em> <code>a<\/code> is 128, <em>signed<\/em> <code>b<\/code> is -128. Likewise, when <em>signed<\/em> <code>a<\/code> is 255, <em>unsigned<\/em> <code>b<\/code> is -1. Then both values &#8220;roll over&#8221; to zero. (Review the output from <a href=\"http:\/\/c-for-dummies.com\/blog\/?p=209\">last week&#8217;s blog post<\/a> for a visual example.)<\/p>\n<p>If you can fathom that concept, then what would make sense would be for the <code>%u<\/code> placeholder merely to convert the value -128 to 128 for both variables <code>a<\/code> and <code>b<\/code>. That&#8217;s not what happens, of course.<\/p>\n<p>The reason for output going to 4294967183 is that the compiler (at least my compiler) uses four bytes to store a <em>char<\/em> value. When the improper conversion character is used, the <em>printf()<\/em> function sees all four bytes of storage in memory, not a <em>char<\/em> variable, and the resulting output looks screwy.<\/p>\n<p>Bottom line: Although you can get away with substituting conversion characters for some variable types, and such substitution can be put to good use, be careful! When you get weird output, double-check that the types match.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The C language&#8217;s rather carefree way of dealing with variables can bring up some interesting puzzles regarding the <em>printf()<\/em> function&#8217;s use of conversion characters. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=282\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-282","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/282","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=282"}],"version-history":[{"count":5,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/282\/revisions"}],"predecessor-version":[{"id":287,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/282\/revisions\/287"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=282"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=282"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=282"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}