{"id":2578,"date":"2017-07-01T00:01:16","date_gmt":"2017-07-01T07:01:16","guid":{"rendered":"http:\/\/c-for-dummies.com\/blog\/?p=2578"},"modified":"2017-07-08T07:41:50","modified_gmt":"2017-07-08T14:41:50","slug":"wide-characters-and-unicode-part-ii","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=2578","title":{"rendered":"Wide Characters and Unicode, Part II"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2017\/06\/wide-characters.png\" alt=\"\" width=\"250\" height=\"66\" class=\"alignnone size-full wp-image-2575\" \/><\/p>\n<p>After you set the necessary locale for your program, you&#8217;re free to use the wide character functions defined in the <code>wchar.h<\/code> header file. For some reason, this process is poorly-documented on the Internet, which is probably why you&#8217;re here.<br \/>\n<!--more--><br \/>\nFrom <a href=\"http:\/\/c-for-dummies.com\/blog\/?p=2568\">last week&#8217;s Lesson<\/a>, you learned of the <em>setlocale()<\/em> function, which is key to creating a program environment capable of outputting wide character. The <em>setlocale()<\/em> function features two arguments: a category constant and a locale string.<\/p>\n<p>You&#8217;ll find several category constants, each of which relates to a specific locale element, such as number formatting, date and time, monetary symbols, and so on. The two often used when playing with wide characters are <code>LC_ALL<\/code> and <code>LC_CTYPE<\/code>.<\/p>\n<p>The <code>LC_ALL<\/code> constant sets the program&#8217;s entire locale. <code>LC_CTYPE<\/code> is specific to text, so it&#8217;s the one I use.<\/p>\n<p>For the locale string, you must specify the proper character set. What you&#8217;re after for wide characters\/Unicode is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/UTF-8\" target=\"_blank\">UTF-8 standard<\/a>. The string to use is &#8220;UTF-8&#8221; and the full statement is:<\/p>\n<p><code>setlocale(LC_CTYPE,\"UTF-8\");<\/code><\/p>\n<p>Also acceptable is the specific language tag for your region, such as:<\/p>\n<p><code>setlocale(LC_CTYPE,\"en_us.UTF-8\");<\/code><\/p>\n<p>Where <code>en_us<\/code> is American English.<\/p>\n<p>Once the locale is set, you can employ the wide character functions defined in <code>wchar.h<\/code> in your code. For example, the <em>putwchar()<\/em> function is the wide-character counterpart to the <em>putchar()<\/em> function. Its argument is <em>wchar_t<\/em> value, a Unicode character.<\/p>\n<p>The following code displays several Unicode characters to standard output.<\/p>\n<pre class=\"screen\">\r\n#include &lt;locale.h&gt;\r\n#include &lt;wchar.h&gt;\r\n\r\nint main()\r\n{\r\n    wchar_t hello[7] = {\r\n        0x41f, 0x440, 0x438, 0x432, 0x435, 0x442, 0x021\r\n    };\r\n    int x;\r\n\r\n    setlocale(LC_CTYPE,\"UTF-8\");\r\n    for(x=0;x&lt;7;x++)\r\n        putwchar(hello[x]);\r\n    putchar('\\n');\r\n\r\n    return(0);\r\n}<\/pre>\n<p>Only the <code>locale.h<\/code> and <code>wchar.h<\/code> headers are required; the <code>wchar.h<\/code> header includes <code>stdio.h<\/code>, so you don&#8217;t need to specify it again. (This inclusion may not hold for every C compiler, so if you get an error, include the <code>stdio.h<\/code> header.)<\/p>\n<p>An array of <em>wchar_t<\/em> characters is defined at Line 6. It&#8217;s not a string. It doesn&#8217;t end with a null character.<\/p>\n<p>Line 11 sets the locale to output UTF-8 characters.<\/p>\n<p>The <em>for<\/em> loop at Line 12 sends the <code>hello[]<\/code> array&#8217;s Unicode\/<em>wchar_t<\/em> characters to standard output courtesy of the <em>putwchar()<\/em> function.<\/p>\n<p>The <em>putchar()<\/em> function at Line 14 adds a newline, which could have been part of the <code>hello[]<\/code> array, but I wanted to show that you can mix text-output methods. (ASCII code 0x21 is part of the array.)<\/p>\n<p>Here&#8217;s the output:<\/p>\n<pre><code>&#1055;&#1088;&#1080;&#1074;&#1077;&#1090;&#33;<\/code><\/pre>\n<p>To create more string-like output, you can modify the <code>hello[]<\/code> array to include a null character and instead of looping through the elements, use the <em>fputws()<\/em> output function:<\/p>\n<pre class=\"screen\">\r\n#include &lt;locale.h&gt;\r\n#include &lt;wchar.h&gt;\r\n\r\nint main()\r\n{\r\n    wchar_t hello[] = {\r\n        0x41f, 0x440, 0x438, 0x432, 0x435,\r\n        0x442, '!', '\\n', '\\0'\r\n    };\r\n\r\n    setlocale(LC_CTYPE,\"UTF-8\");\r\n    fputws(hello,stdout);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>The <em>hello[]<\/em> array now includes the ASCII characters !, newline, and null. The <em>fputws()<\/em> function at Line 12 sends that wide-character string to standard output. (An equivalent <em>putws()<\/em> macro doesn&#8217;t exist.)<\/p>\n<p>As you might guess, a wide-character version of the <em>printf()<\/em> statement is available, <em>wprintf()<\/em>, along with its various sisters for different types of formatted wide-character output. It&#8217;s not a straightforward version of <em>printf()<\/em>, as you&#8217;re dealing with wide (long) characters. I&#8217;ll explore the quirks in <a href=\"http:\/\/c-for-dummies.com\/blog\/?p=2593\">next week&#8217;s Lesson<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Configure the proper locale details and the wide assortment of wide character output functions await your pleasure. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=2578\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-2578","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2578","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2578"}],"version-history":[{"count":8,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2578\/revisions"}],"predecessor-version":[{"id":2613,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2578\/revisions\/2613"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2578"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2578"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2578"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}