After you set the necessary locale for your program, you’re free to use the wide character functions defined in the wchar.h
header file. For some reason, this process is poorly-documented on the Internet, which is probably why you’re here.
From last week’s Lesson, you learned of the setlocale() function, which is key to creating a program environment capable of outputting wide character. The setlocale() function features two arguments: a category constant and a locale string.
You’ll find several category constants, each of which relates to a specific locale element, such as number formatting, date and time, monetary symbols, and so on. The two often used when playing with wide characters are LC_ALL
and LC_CTYPE
.
The LC_ALL
constant sets the program’s entire locale. LC_CTYPE
is specific to text, so it’s the one I use.
For the locale string, you must specify the proper character set. What you’re after for wide characters/Unicode is the UTF-8 standard. The string to use is “UTF-8” and the full statement is:
setlocale(LC_CTYPE,"UTF-8");
Also acceptable is the specific language tag for your region, such as:
setlocale(LC_CTYPE,"en_us.UTF-8");
Where en_us
is American English.
Once the locale is set, you can employ the wide character functions defined in wchar.h
in your code. For example, the putwchar() function is the wide-character counterpart to the putchar() function. Its argument is wchar_t value, a Unicode character.
The following code displays several Unicode characters to standard output.
#include <locale.h> #include <wchar.h> int main() { wchar_t hello[7] = { 0x41f, 0x440, 0x438, 0x432, 0x435, 0x442, 0x021 }; int x; setlocale(LC_CTYPE,"UTF-8"); for(x=0;x<7;x++) putwchar(hello[x]); putchar('\n'); return(0); }
Only the locale.h
and wchar.h
headers are required; the wchar.h
header includes stdio.h
, so you don’t need to specify it again. (This inclusion may not hold for every C compiler, so if you get an error, include the stdio.h
header.)
An array of wchar_t characters is defined at Line 6. It’s not a string. It doesn’t end with a null character.
Line 11 sets the locale to output UTF-8 characters.
The for loop at Line 12 sends the hello[]
array’s Unicode/wchar_t characters to standard output courtesy of the putwchar() function.
The putchar() function at Line 14 adds a newline, which could have been part of the hello[]
array, but I wanted to show that you can mix text-output methods. (ASCII code 0x21 is part of the array.)
Here’s the output:
Привет!
To create more string-like output, you can modify the hello[]
array to include a null character and instead of looping through the elements, use the fputws() output function:
#include <locale.h> #include <wchar.h> int main() { wchar_t hello[] = { 0x41f, 0x440, 0x438, 0x432, 0x435, 0x442, '!', '\n', '\0' }; setlocale(LC_CTYPE,"UTF-8"); fputws(hello,stdout); return(0); }
The hello[] array now includes the ASCII characters !, newline, and null. The fputws() function at Line 12 sends that wide-character string to standard output. (An equivalent putws() macro doesn’t exist.)
As you might guess, a wide-character version of the printf() statement is available, wprintf(), along with its various sisters for different types of formatted wide-character output. It’s not a straightforward version of printf(), as you’re dealing with wide (long) characters. I’ll explore the quirks in next week’s Lesson.
With: $ gcc –version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
(1) I had to specify “en_US.UTF-8” for the correct Unicode output. With only “UTF-8” each character was output as ‘?’. (2) putchar(‘\n’) did not work but putwchar(‘\n’) did work. (3) use of fputws() required #include otherwise gcc did not recognize ‘stdout’.
Interesting observations, and thanks for sharing because I’m sure a lot of others will have similar issues. Again, a lot of this stuff is compiler-dependant, such as the header file issues. So I’m glad you shared your solutions. Thanks!