Wide Characters and Unicode, Part II

After you set the necessary locale for your program, you’re free to use the wide character functions defined in the wchar.h header file. For some reason, this process is poorly-documented on the Internet, which is probably why you’re here.

From last week’s Lesson, you learned of the setlocale() function, which is key to creating a program environment capable of outputting wide character. The setlocale() function features two arguments: a category constant and a locale string.

You’ll find several category constants, each of which relates to a specific locale element, such as number formatting, date and time, monetary symbols, and so on. The two often used when playing with wide characters are LC_ALL and LC_CTYPE.

The LC_ALL constant sets the program’s entire locale. LC_CTYPE is specific to text, so it’s the one I use.

For the locale string, you must specify the proper character set. What you’re after for wide characters/Unicode is the UTF-8 standard. The string to use is “UTF-8” and the full statement is:

setlocale(LC_CTYPE,"UTF-8");

Also acceptable is the specific language tag for your region, such as:

setlocale(LC_CTYPE,"en_us.UTF-8");

Where en_us is American English.

Once the locale is set, you can employ the wide character functions defined in wchar.h in your code. For example, the putwchar() function is the wide-character counterpart to the putchar() function. Its argument is wchar_t value, a Unicode character.

The following code displays several Unicode characters to standard output.

#include <locale.h>
#include <wchar.h>

int main()
{
    wchar_t hello[7] = {
        0x41f, 0x440, 0x438, 0x432, 0x435, 0x442, 0x021
    };
    int x;

    setlocale(LC_CTYPE,"UTF-8");
    for(x=0;x<7;x++)
        putwchar(hello[x]);
    putchar('\n');

    return(0);
}

Only the locale.h and wchar.h headers are required; the wchar.h header includes stdio.h, so you don’t need to specify it again. (This inclusion may not hold for every C compiler, so if you get an error, include the stdio.h header.)

An array of wchar_t characters is defined at Line 6. It’s not a string. It doesn’t end with a null character.

Line 11 sets the locale to output UTF-8 characters.

The for loop at Line 12 sends the hello[] array’s Unicode/wchar_t characters to standard output courtesy of the putwchar() function.

The putchar() function at Line 14 adds a newline, which could have been part of the hello[] array, but I wanted to show that you can mix text-output methods. (ASCII code 0x21 is part of the array.)

Here’s the output:

Привет!

To create more string-like output, you can modify the hello[] array to include a null character and instead of looping through the elements, use the fputws() output function:

#include <locale.h>
#include <wchar.h>

int main()
{
    wchar_t hello[] = {
        0x41f, 0x440, 0x438, 0x432, 0x435,
        0x442, '!', '\n', '\0'
    };

    setlocale(LC_CTYPE,"UTF-8");
    fputws(hello,stdout);

    return(0);
}

The hello[] array now includes the ASCII characters !, newline, and null. The fputws() function at Line 12 sends that wide-character string to standard output. (An equivalent putws() macro doesn’t exist.)

As you might guess, a wide-character version of the printf() statement is available, wprintf(), along with its various sisters for different types of formatted wide-character output. It’s not a straightforward version of printf(), as you’re dealing with wide (long) characters. I’ll explore the quirks in next week’s Lesson.

2 thoughts on “Wide Characters and Unicode, Part II

  1. With: $ gcc –version
    gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
    (1) I had to specify “en_US.UTF-8” for the correct Unicode output. With only “UTF-8” each character was output as ‘?’. (2) putchar(‘\n’) did not work but putwchar(‘\n’) did work. (3) use of fputws() required #include otherwise gcc did not recognize ‘stdout’.

  2. Interesting observations, and thanks for sharing because I’m sure a lot of others will have similar issues. Again, a lot of this stuff is compiler-dependant, such as the header file issues. So I’m glad you shared your solutions. Thanks!

Leave a Reply