about WM_CHAR UTF-16 conversion

Rainstorm · February 29, 2008, 03:38:15 AM

in the SDK it says

QuoteThe WM_CHAR message uses Unicode Transformation Format (UTF)-16.

so do i have to convert to ASCII while using it ? (that would just involve putting the 16 bit UTF-16 code in a Byte since the values are the same...is that right ?)
can someone confirm all this pls,.. since elsewhere it says the Unicode values just get sent if the Unicode version of the RegisterClass function was used to register the window class.

MichaelW · February 29, 2008, 10:48:04 AM

Now that you point that out, I have no idea what it means. With a US keyboard the WM_CHAR message wParam values look to me like normal character codes. Also, in the WM_CHAR documentation:

"The WM_CHAR message is posted to the window with the keyboard focus when a WM_KEYDOWN message is translated by the TranslateMessage function."

And in the TranslateMessage documentation:

"TranslateMessage produces WM_CHAR messages only for keys that are mapped to ASCII characters by the keyboard driver. "

ToutEnMasm · February 29, 2008, 01:26:52 PM

Hello,
The unicode format is the normal format for the system (unicode = two bytes).
Ascii codes are bytes and aren't the same value of the key.
For example,the key 114 is pressed, this wil be change to the value of the ascii character 'A'.The WM_CHAR message give you the value of 'A'.This translate has been done following

Quote
The Unicode Transformation Format (UTF)-16

The format is just a norm that say you,in this country,the key 114 is A and for another country it's Q.

u · February 29, 2008, 03:35:26 PM

Tout, I think UTF16 has nothing to do with the keyboard-layout.
The first 255 UTF-16 values map directly to ANSI.

I just made a test-app to print the wParam of WM_CHAR, and typed in English, Bulgarian and Japanese. My default language in RegionalOptions is English, yet characters in Bulgarian did give their ANSI values (around 233). But no JP input method generated anything but the char "?". To sum-up, no value over 255 was presented in wParam of WM_CHAR, but...
maybe it's got something to do with the fact that I used CreateWindowExA and so on, instead of the W versions of the API.

Rainstorm · February 29, 2008, 05:56:31 PM

Michael,

this quote is from the SDK in About Keyboard Input-->Nonsystem character Messages , on that page.

QuoteThe value of the character code depends on the window class of the window receiving the message. If the Unicode version of the RegisterClass function was used to register the window class, the system provides Unicode characters to all windows of that class. Otherwise, the system provides ASCII character codes.

toutenasm wrote

QuoteThe unicode format is the normal format for the system (unicode = two bytes).
Ascii codes are bytes and aren't the same value of the key.

Toutenasm, was not referring to the key, meant the ASCII values of characters(in the WM_CHAR msg) which are in bytes are the same as the UTF-16 values which are in Word-sizes for those characters, as long as they are ascii. - I think the values between the two can differ after 127 (U + 007f) depending on the mappings of the chars above 128 in ANSI.
..just trying to make sense of why they make a distinction by saying that the character codes provided in the WM_CHAR message are either ASCII or unicode depending on what's specified in the RegisterClass function.

One more question, when the keystroke messages are Translated using TranslateMessage it creates WM_CHAR msgs & sends them to the WindowProc. - Are those Keystroke messages (like WM_KEYDOWN etc) for keystrokes that have been translated, still sent to the WindowProc ?

Many thanks everyone for the assistance..
-

u · February 29, 2008, 09:18:32 PM

iirc, the flow is like this:
keyboard keycodes get mapped according to qwerty/azerty/.. into those VK_UP/VK_ENTER/VK_V (there's no discerning of lowercase and uppercase letters at this stage). Your app and the IME get the wm_keydown msg. The IME then translates this keydown according to the current state (capslock, shift, previous typed letter) and decides when/whether to send the WM_CHAR msg to your app. So, (count_of_WMKeyDown>=count_of_WMChar).
For best testing of how you should handle input, install the Japanese IME. (simply add Japanese to your keyboard input modes). Choose one of the three alphabets, and type in latin letters. (for example "koneko"). The word is accepted only after your press <enter> (chars get generated only then). <space> selects between different meanings and ways of writing the word. (it's wicked).

MichaelW · February 29, 2008, 09:51:27 PM

QuoteAre those Keystroke messages (like WM_KEYDOWN etc) for keystrokes that have been translated, still sent to the WindowProc ?

At least in my test, WM_KEYDOWN and WM_KEYUP are.

u · March 01, 2008, 02:35:18 AM

Btw, Rainstorm, simply use VKDebug to see when which message gets sent :) . Or even easier - the app Spy++ (if you have a copy of VisualStudio)

Rainstorm · March 01, 2008, 11:06:21 AM

michael wrote...

QuoteAt least in my test, WM_KEYDOWN and WM_KEYUP are.

thanks for the feedback & confirming that :thumbu

Ultrano thanks for the tip, will try it out.

Rainstorm · March 03, 2008, 02:13:55 AM

I guess for ASCII input what i asked is not gonna make a diff but for other character processing beyond chr127 & chr255 in the UTF-16 spectrum it would.- all characters in the ASCII set would be transparent having the same values (just Word sizes instead of bytes) so still don't understand why in the SDK they make all that distinction between the two.l
am just looking at displaying keyboard input currently..so maybe I'll post about this later again sometime.

thanks for the assistance

u · March 03, 2008, 03:06:58 AM

o_O the distinction is quite needed. You map Latin characters ok, but for anything else - 16-bit is necessary. It becomes obvious when you start typing with Latin, Cyrillic, accented Latin, Greek, Japanese, Chinese - all in the same window. And when you throw-in different rules of uppercase/lowercase conversion due to some languages and dialects, it all goes nuts. And finally, when you want to fit it into ANSI, Windows watches what's your default language and tries to fit it in such a way, that Win98 could display it. (i.e for me, it would use the upper half of the 255 symbols for my Cyrillic alphabet, for others - the Greek, French/German accented letters, and for Japanese it'll try to unroll whole words into the selected few symbols - pretty hard when JP people merge all words, use no spaces).
It's hell, which Windows fortunately has solved in maybe the most optimized way.

Rainstorm · March 03, 2008, 06:32:08 PM

ultrano wrote. . .

QuoteAnd finally, when you want to fit it into ANSI, Windows watches what's your default language and tries to fit it in such a way, that Win98 could display it. (i.e for me, it would use the upper half of the 255 symbols for my Cyrillic alphabet, for others

..don't know much about this stuff but I thought ANSI was a standard where the symbols 1-127 (decimal) were fixed & agreed upon, & the symbols 128 upwards till 255 could be changed forming diff code pages depending upon location/language etc. - I've never really experimented with what you were writing about, but if i needed to use more symbols beyond 127 i'd use unicode & like you said that would make it simpler.

http://en.wikipedia.org/wiki/Windows-1251

that's a link to the cyrillic codepage & the mappings from 1 - 127 seem to be the same as the english ones.. the ones from 128 & up are diff though.

thanks!
-

News:

about WM_CHAR UTF-16 conversion