Adding Unicode sup­port to a C pro­gram

Created: Tue Mar 19 03:26:48 CET 2019

Last mod­i­fied: Tue Mar 19 03:52:36 CET 2019


I don’t know how it works, though I kinda know it does ;( This post may not be THAT in­ter­est­ing since I wrote it in 10 min­utes.

Okay so this will be a very short post which should ap­ply to a range of small pro­grams.

This post has a (short) pre­req! It’s my three days old post Programming new fea­tures for file2c(1).

Basically, the goal is to get this,

echo せ | file2c -c
# L'せ',L'\012'

Instead of this:

echo せ | file2c -c
# 'ã','','','\012'

Just fol­low the steps:

Headers

wchar.h con­tains a de­f­i­n­i­tion of the wchar_t type.

#include <wchar.h>
#include <locale.h>

Now change the type of c from int to wchar_t.

Also, re­place oc­curences of getchar() with getwc(stdin).

Calling set­lo­cale()

Like this, at the be­gin­ning of main():

(void)setlocale(LC_ALL, "");

In the later switch state­ment

We use %lc in­stead of %c. Also, don’t for­get the L pre­fix in the format strings.

linepos += printf("L'%lc'", c);

Those char­ac­ters can then be printed back us­ing printf:

printf("%lc", L'せ');

For ref­er­ence, it works with -ansi -pedantic op­tions.

I don’t know how portable it is. I will try to do some re­search and compile what I find in a blog post or a wiki ar­ti­cle.

source code