{"id":2605,"date":"2017-07-15T00:01:34","date_gmt":"2017-07-15T07:01:34","guid":{"rendered":"http:\/\/c-for-dummies.com\/blog\/?p=2605"},"modified":"2019-05-27T11:12:29","modified_gmt":"2019-05-27T18:12:29","slug":"wide-characters-and-unicode-part-iv","status":"publish","type":"post","link":"https:\/\/c-for-dummies.com\/blog\/?p=2605","title":{"rendered":"Wide Characters and Unicode, Part IV"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2017\/06\/wide-characters.png\" alt=\"\" width=\"250\" height=\"66\" class=\"alignnone size-full wp-image-2575\" \/><\/p>\n<p>String input is a weird thing when it comes to wide characters, mostly because how the heck do you type wide characters in a terminal window beyond copy-and-paste?<br \/>\n<!--more--><br \/>\nFor single character input, the <code>wchar.h<\/code> header defines three functions: <em>fgetwc()<\/em>, <em>getwc()<\/em>, and <em>getwchar()<\/em>.<\/p>\n<p>The <em>fgetwc()<\/em> and <em>getwc()<\/em> functions both read one <em>wint_t<\/em> character (wide-character <em>int<\/em> type) from a named file stream. The <em>getwchar()<\/em> function (actually a macro) reads wide characters from standard input. This is the function I prefer.<\/p>\n<blockquote><p>The <em>wint_t<\/em> wide-character integer type is larger (wider) than the <em>wchar_t<\/em> &#8220;character&#8221; type. The primary reason is to accommodate the end-of-file character, <code>WEOF<\/code>, which is defined outside the bounds of a <em>wchar_t<\/em> variable. You can use plain old <em>wchar_t<\/em> variables with  the functions, but when you&#8217;re hunting for the end-of-file marker with <em>fgetwc()<\/em> or <em>getwc()<\/em>, use <em>wint_t<\/em> instead.<\/p><\/blockquote>\n<p>The following code uses the <em>getwchar()<\/em> function to read wide characters one-at-a-time from standard input. The <em>getwstring()<\/em> function reads up to <code>count<\/code> characters or stops at the newline (<code>'\\n'<\/code>), storing the result in the <em>wchar_t<\/em> string <code>input<\/code>:<\/p>\n<pre class=\"screen\">\r\n#include &lt;locale.h&gt;\r\n#include &lt;wchar.h&gt;\r\n\r\nvoid getwstring(wchar_t *ws,int count)\r\n{\r\n    int x = 0;\r\n    wchar_t *a,wch;\r\n\r\n    a = ws;\r\n    while(x&lt;count-1)\r\n    {\r\n        wch = getwchar();\r\n        if( wch=='\\n')\r\n            break;\r\n        *a = wch;\r\n        a++;\r\n        x++;\r\n    }\r\n    *a = '\\0';\r\n}\r\n\r\nint main()\r\n{\r\n    wchar_t input[10];\r\n\r\n    setlocale(LC_CTYPE,\"UTF-8\");\r\n\r\n    wprintf(L\"Type some fancy text: \");\r\n    getwstring(input,10);\r\n    wprintf(L\"You typed: %ls!\\n\",input);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>The <em>getwstring()<\/em> function uses the <em>getwchar()<\/em> macro at Line 12 to read wide-characters from standard input. The characters are read until the newline is input or the buffer is full. Then the string is capped at Line 19.<\/p>\n<p>For the sample run, I copied and pasted text from a web page:<\/p>\n<pre><code>Type some fancy text: \u4f60\u597d\uff0c\u4e16\u754c\r\nYou typed: \u4f60\u597d\uff0c\u4e16\u754c!<\/code><\/pre>\n<p>On the Mac terminal, you can press Control+Command+Space to see the Emoji and Symbols palette, which you can use to pluck out one character at a time, as shown in Figure 1.<\/p>\n<div id=\"attachment_2607\" style=\"width: 560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2607\" src=\"http:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2017\/07\/0715-figure1.png\" alt=\"\" width=\"550\" height=\"389\" class=\"size-full wp-image-2607\" srcset=\"https:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2017\/07\/0715-figure1.png 550w, https:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2017\/07\/0715-figure1-300x212.png 300w, https:\/\/c-for-dummies.com\/blog\/wp-content\/uploads\/2017\/07\/0715-figure1-424x300.png 424w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><p id=\"caption-attachment-2607\" class=\"wp-caption-text\">Figure 1. The weirdo character palette in the OS X Terminal program.<\/p><\/div>\n<p>Instead of writing your own <em>getwstring()<\/em> function to read a wide-character string, you can use the <em>fgetws()<\/em> function. This function is the wide-character equivalent to my old pal <em>fgets()<\/em>. Here&#8217;s the updated code:<\/p>\n<pre class=\"screen\">\r\n#include &lt;locale.h&gt;\r\n#include &lt;wchar.h&gt;\r\n\r\nint main()\r\n{\r\n    wchar_t input[10];\r\n\r\n    setlocale(LC_CTYPE,\"UTF-8\");\r\n\r\n    wprintf(L\"Type some fancy text: \");\r\n    fgetws(input,10,stdin);\r\n    wprintf(L\"You typed: %ls!\\n\",input);\r\n\r\n    return(0);\r\n}<\/pre>\n<p>As with <em>fgets()<\/em>, the arguments for <em>fgetws()<\/em> are a buffer, input size, and the file handle. For standard input, <em>stdin<\/em> is used, as shown in the code.<\/p>\n<p>The code&#8217;s output is almost the same:<\/p>\n<pre><code>Type some fancy text: \u4f60\u597d\uff0c\u4e16\u754c\r\nYou typed: \u4f60\u597d\uff0c\u4e16\u754c\r\n!<\/code><\/pre>\n<p>The <em>fgetws()<\/em> function, like its <em>fgets()<\/em> twin, reads and retains the newline character. You can view my quickie solution for this effect <a href=\"http:\/\/c-for-dummies.com\/blog\/?p=24\">here<\/a>, if you want to eliminate the newline. I&#8217;ll present another solution in a future Lesson.<\/p>\n<p>One more note: The terminal window assumes that every character displayed has the same width. Historically, terminals play with only ASCII text. In a GUI, the terminal assumes a monospaced font. A few Unicode characters, especially emojis, are wider than a single character position in the terminal window. The result is overlap, which you can kind of see in Figure 1.<\/p>\n<p>The <em>wcwidth()<\/em> function returns the number of column positions for a wide character, though it&#8217;s not particularly useful. For example, the pizza slice shown in Figure 1 is more than 1 column wide, though the <em>wcwidth()<\/em> function returns a character width value of 1, which isn&#8217;t correct. My advice is to be cautious when showing such characters. If you plan on using a specific character, then you can eyeball the width and plan accordingly.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Wide character string input is almost barely possible. <a href=\"https:\/\/c-for-dummies.com\/blog\/?p=2605\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-2605","post","type-post","status-publish","format-standard","hentry","category-main"],"_links":{"self":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2605","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2605"}],"version-history":[{"count":6,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2605\/revisions"}],"predecessor-version":[{"id":3633,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2605\/revisions\/3633"}],"wp:attachment":[{"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2605"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2605"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c-for-dummies.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2605"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}