{"id":2681,"date":"2017-09-08T00:01:43","date_gmt":"2017-09-08T07:01:43","guid":{"rendered":"http:\/\/c-for-dummies.com\/blog\/?p=2681"},"modified":"2017-09-02T08:39:05","modified_gmt":"2017-09-02T15:39:05","slug":"the-url-decoding-filter-solution","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=2681","title":{"rendered":"The URL Decoding Filter &#8211; Solution"},"content":{"rendered":"<p>Unwinding percent-encoding involves three steps:<\/p>\n<ol>\n<li>Pass-through the unchanged characters.<\/li>\n<li>Change <code>+<\/code> back into a space.<\/li>\n<li>Decode the percent strings, which is the most involved process.<\/li>\n<\/ol>\n<p><!--more--><br \/>\nFor <a href=\"http:\/\/c-for-dummies.com\/blog\/?p=2672\">this month&#8217;s Exercise<\/a>, my solution is a simple I\/O filter. In fact, for most decoding, the character input shoots back out the filter as output. Here is the <em>main()<\/em> function in my solution:<\/p>\n<pre class=\"screen\">\r\nint main()\r\n{\r\n    int i;\r\n\r\n    while( (i=getchar()) != EOF )\r\n    {\r\n        if( i=='%' )\r\n            pdecode();\r\n        else if( i=='+')\r\n            putchar(' ');\r\n        else\r\n            putchar(i);\r\n    }\r\n\r\n    return(0);\r\n}<\/pre>\n<p>The <em>while<\/em> loop keeps spinning until an <code>EOF<\/code> (end of file) character is encountered. The <em>getchar()<\/em> function reads input; the <em>putchar()<\/em> function writes output. The <em>if-else-if-else<\/em> structure handles the three steps for decoding percent encoding. As you can see in the <em>main()<\/em> function, I handle the steps backwards:<\/p>\n<p>Third, when the <code>%<\/code> character is encountered, execution shuffles off to the <em>pdecode()<\/em> function. I could do the translation in the <em>main()<\/em> function, but by using <em>pdecode()<\/em> I keep <em>main()<\/em> short and readable.<\/p>\n<p>Second, if the input character is a <code>+<\/code>, the space character is output.<\/p>\n<p>First, anything left over is output directly. This final condition catches all alphanumeric text as well as the four exceptions in HTML5: <code>- . _ *<\/code><\/p>\n<p>The <em>pdecode()<\/em> function works to grab the next two characters of input; the <code>%<\/code> is already digested and not output.<\/p>\n<pre class=\"screen\">\r\nvoid pdecode(void)\r\n{\r\n    int a;\r\n\r\n    a = char2hex() * 0x10;\r\n    a += char2hex();\r\n    putchar(a);\r\n}<\/pre>\n<p>Variable <code>a<\/code> builds the value, the ASCII code translated from the next two characters of input. The <em>char2hex()<\/em> function reads and tests input. The value returned is 0x0 through 0xF (15), translated from characters 0 through 9 and A through F.<\/p>\n<p>The first character read is multiplied by 0x10 (16) and stored in variable <code>a<\/code>; the next character is added to that value. So, for example, string A1 is translated into <code>0xA * 0x10 + 0x1<\/code>, which becomes 0xA1. That value is output by <code>putchar(a)<\/code>.<\/p>\n<p>The <em>char2hex()<\/em> function&#8217;s job is to read a character of input, confirm that it&#8217;s a valid hexadecimal digit, and return that value. On an error, the function exits the program:<\/p>\n<pre class=\"screen\">\r\nint char2hex(void)\r\n{\r\n    int d;\r\n\r\n    d = getchar();\r\n    if( d==EOF ) exit(1);   <span class=\"comments\">\/* quit on EOF *\/<\/span>\r\n    if( isnumber(d) )\r\n        return( d - '0' );\r\n    d = toupper(d);\r\n    if( d&gt;='A' && d&lt;='F' )\r\n        return( d - 'A' + 0x0A );\r\n    else\r\n        exit(1);\r\n}<\/pre>\n<p>The <em>getchar()<\/em> function fetches the next character from the input stream. Immediately, if that character is the <code>EOF<\/code>, the function exits the program. This, and all exit conditions in this function, indicated a poorly-formed percent-encoded string, so bailing out is a valid move.<\/p>\n<p>The next <em>if<\/em> test checks for a digit, 0 to 9. If so, that digit&#8217;s value is returned, 0 to 9.<\/p>\n<p>For non-digit characters, the <em>toupper()<\/em> function translates input to uppercase, and another <em>if<\/em> test confirms whether the character is in the range <code>'A'<\/code> to <code>'F'<\/code>. If so, the character&#8217;s value, 10 through 15, is returned.<\/p>\n<p>The final <em>else<\/em> catches any non-hexadecimal character and terminates the program as the percent-ended string is malformed or invalid.<\/p>\n<p><a href=\"http:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2017\/08\/09exercise.c\">Click here<\/a> to view my entire solution. Your solution may vary, of course, but if it handles the translation, it&#8217;s good.<\/p>\n<p>To test the program, process filtered input from <a href=\"http:\/\/c-for-dummies.com\/blog\/?p=2626\">last month&#8217;s Exercise<\/a> and run it through the solution for this month&#8217;s exercise. If the final string matches the initial string, everything works. For example:<\/p>\n<pre class=\"screen\">\r\n$ cat 09solution.txt\r\n<blockquote class=\"wp-embedded-content\" data-secret=\"lsB3rs3HHB\"><a href=\"https:\/\/c-for-dummies.com\/blog\/?p=2681\">The URL Decoding Filter &#8211; Solution<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8220;The URL Decoding Filter &#8211; Solution&#8221; &#8212; C For Dummies Blog\" src=\"https:\/\/c-for-dummies.com\/blog\/?p=2681&#038;embed=true#?secret=6bJTSZ7DmH#?secret=lsB3rs3HHB\" data-secret=\"lsB3rs3HHB\" width=\"584\" height=\"329\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\r\n$ cat 09solution.txt | .\/percent-encode\r\nhttp%3A%2F%2Fc-for-dummies.com%2Fblog%2F%3Fp%3D2681%0A$\r\n$ cat 09solution.txt | .\/percent-encode | .\/percent-decode \r\nhttp:\/\/c-for-dummies.com\/blog\/?p=2681<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Unwinding percent-encoding involves three steps: Pass-through the unchanged characters. Change + back into a space. Decode the percent strings, which is the most involved process.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"class_list":["post-2681","post","type-post","status-publish","format-standard","hentry","category-solution"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2681","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2681"}],"version-history":[{"count":5,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2681\/revisions"}],"predecessor-version":[{"id":2700,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2681\/revisions\/2700"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2681"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2681"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2681"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}