Why is my Applicaion not displaying Unicode characters correctly?


I decided to convert my win32 c++ application to Unicode version, but when I use it I get unreadable letters in Arabic, Chinese and Japanese...


If I don't use Unicode, I can find the name window title in Arabic in the edit box:

HWND hWnd = CreateWindowEx(WS_EX_CLIENTEDGE, "Edit", "ا ب ت ث ج ح خ د ذ", WS_CHILD | WS_VISIBLE | WS_BORDER | ES_MULTILINE, 10, 10, 300, 200, hWnd, (HMENU)100, GetModuleHandle(NULL), NULL);

SetWindowText(hWnd, "صباح الخير");

The output looks ok and works fine! (no unicode).

  • Unicode: used

I added before including the header:

#define UNICODE
#include <windows.h

Now in a "window procedure":

case WM_CREATE:{
    HWND hEdit = CreateWindowExW(WS_EX_CLIENTEDGE, L"Edit", L"ا ب ت ث ج ح خ د ذ", WS_CHILD | WS_VISIBLE | WS_BORDER | ES_MULTILINE, 10, 10, 300, 200, hWnd, (HMENU)100, GetModuleHandle(NULL), NULL);

    // Even I send message to change text but I get unreadable characters!
    SendDlgItemMessageW(hWnd, 100, WM_SETTEXT, 0, (LPARAM)L"السلام عليكم"); // Get unreadable characters also

ِ As seen with Unicode, the control does not display Arabic characters correctly.

  • Important: After creating the control, use " backspaceNow" to manually delete the content. If I manually enter the Arabic text, will it be displayed correctly and correctly? ! ! But why does Wen use a function? like SetWindowTextW()??

please help. thanks.

Mark Tolonen

Make sure to save the source file as UTF-16 or UTF-8 using the BOM. Otherwise, many Windows applications are encoded in ANSI (the default native Windows code page). You can also check compiler switches to force UTF-8 for source files. For example, MS Visual Studio 2015's compiler has a /utf-8switch so there is no need to save with a BOM.

Here's a simple example, saved in UTF-8, then in UTF-8 w/ BOM, and compiled using the Microsoft Visual Studio compiler. Note that if you hardcode the W version of the API and use L"" for wide strings, you don't need to define UNICODE:

#include <windows.h>

int main()
    MessageBoxW(NULL,L"ا ب ت ث ج ح خ د ذ",L"中文",MB_OK);

result (UTF-8). The compiler took ANSI encoding (Windows-1252) and decoded wide strings incorrectly.

Image corrupted

Result (UTF-8 with BOM). The compiler detects the BOM and decodes the source code using UTF-8, producing the correct data for wide strings.

correct image

Some Python code demonstrating the decoding error:

>>> s='中文,ا ب ت ث ج ح خ د ذ'
>>> print(s.encode('utf8').decode('Windows-1252'))
中文,ا ب ت ث ج ح خ د ذ


