News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

WriteConsoleW

Started by jj2007, March 29, 2010, 04:19:41 PM

Previous topic - Next topic

jj2007

I can't convince WriteConsoleW to display any 'real' Unicode glyphs in the console window:

Quoteinclude \masm32\MasmBasic\MasmBasic.inc

.code

start:   ; normally 1252, user must set console properties to Lucida Console:
   invoke SetConsoleOutputCP, 1256   ; arabic; CP_UTF8 aka 65001 doesn't work
   push esi   ; oh well the ABI ;-)

   Let esi=wRes$(123)   ; get a Unicode string from the resource stringtable

;   pushad
;   invoke MessageBoxW, 0, esi, wChr$("Unicode is great:"), MB_OK   ; works fine, displays arabic text
;   popad

   mov edx, esp   ; save the slot address
   push edx   ; create the slot
   push 0   ; reserved para
   push edx   ; slot address for the chars written variable
   push wLen(esi)   ; numbytes
   push esi   ; ptr to the wide string
   invoke GetStdHandle, STD_OUTPUT_HANDLE   ; get handle to console input
   push eax
   call WriteConsoleW   ; invoke WriteConsoleW, eax, 7, edx, 0   ; eax=ptr TheChar, edx=ptr TheCount
   pop edx

   pop esi
   getkey
   Exit

end start

Rsrc
STRINGTABLE
BEGIN
   123,   "An arabic text: ١- أذكر حدثين من الحرب كان لهما تأثير على الرأي العام."
END
Rsrc
Results from copying from the console and pasting here:
Quote
With raster font:
CP_UTF8:   An arabic text: Ù¡- أذكر حدثين Ù...Ù† اÙ,,حرب كان Ù,,Ù‡Ù...ا
1256:   An arabic text: 1- أذكر حدثين من الحرب كان لهما تأثير على الرأي العام.

With Lucida console:
1256:   An arabic text: 1- أذكر حدثين من الحرب كان لهما تأثير على الرأي العام.
CP_UTF8:   An arabic text: ١- أذكر حدثين من الحرب كان لهما تأثير على الرأي العام.

Unfortunately, none of the four versions display any arabic on my console - just strange characters or brackets. Has anybody seen Chinese, Japanese or Arabic glyphs displayed in the console window? If yes, what did the trick?

dedndave

do you have the arabic font package installed ?
also - for the command prompt, there is something called code pages

redskull

SetConsoleOutputCP() is the appropriate function; as for the installed code pages, they're somewhere in the registers (current control set?), I can't recall exactly where.  You get the ID, and then pass it as the argument

-r

*edit* - nevermind, i see you already have that.  From MS knowledge base: 821083

QuoteArabic and Hebrew (and for that matter Thai, Hindi, and other complex scripts) are not supported on Windows 2000 and Windopws XP console.
Strange women, lying in ponds, distributing swords, is no basis for a system of government

jj2007

#3
Quote from: redskull on March 29, 2010, 04:29:14 PM
From MS knowledge base: 821083

QuoteArabic and Hebrew (and for that matter Thai, Hindi, and other complex scripts) are not supported on Windows 2000 and Windopws XP console.

EDIT:
Mystery resolved, thanks.
The story goes on - Arabic text can be displayed, as shown in attachment, WriteConsoleW_AL.png

But why? Taking account of the following observations...:

- it works fine for Russian,
- it works for Arabic only when launched via F6 (assemble & link & run) from RichMasm,
- it fails when launching the exe directly from the commandline (WriteConsoleW_AL.png),
- but not for Russian,
- all that with identical console properties (Lucida console, CPs),

...the only logical explanation is that it depends on the mood of the OS.

Now I sincerely hope that somebody here in the Forum is right now sufficiently close to the Ballmer Peak to explain all this.

FORTRANS

Hi,

   Do things change between full screen and windowed?

Regards,

Steve

jj2007

Quote from: FORTRANS on March 29, 2010, 09:59:11 PM
   Do things change between full screen and windowed?

Yes indeed, Sherlock Steve Holmes...! When I ...

- double-click the exe from Explorer (or 2x), the initial display for Arabic is ???
- but when I resize the window, the Arabic appears

- when I launch the exe via command.com, the initial display for Arabic is correct and stays correct
- except if I switch to full screen mode with Ctrl Return

- finally, if I launch the exe via cmd, the initial display for Arabic is ??? and no tricks can convince it to display Arabic

Now I am sure this is all documented somewhere on MSDN, right?
:boohoo:

redskull

After some thought and wine, i reverse my previous post: if you are using Unicode functions, then it shouldn't matter what code page you are using at all.  The output code page should only matter during an ASCII version of WriteConsole.  Since those characters are only available for a true-type font, then the full screen behavior makes sense (full screen is raster only, right?).  The "official" supported Unicode ranges for Lucida Console font are:
Basic Latin
Latin-1 Supplement
Latin Extended-A
Latin Extended-B
Greek
Cyrillic
General Punctuation
Box Drawing
Block Elements

which doesn't include the arabic.  As far as the command vs cmd, as the tree said to the lumberjack, I'm stumped.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

jj2007

Quote from: redskull on March 29, 2010, 11:58:02 PM
As far as the command vs cmd, as the tree said to the lumberjack, I'm stumped.

Apparently it's a crappy WM_PAINT handler. When you launch the exe via double-click, and drag a small floating window over the arabic text, the text is redrawn in a rather, ehm, fancy way. Example:

Here is the resource file version (displays fine in FF and MSIE):
An arabic text: ١- أذكر حدثين من الحرب كان لهما تأثير على الرأي العام. END

Here I tried to move the floating window only over the lower line of text, so the upper one should be the correct display (I don't speak Arabic and have no idea what it actually means, so apologies if it says something wrong).

The Russian text is not affected by this problem, it always works fine. Note that in all cases I used both WriteConsoleW and the workaround via WideCharToMultiByte(CP_UTF8, ...)+ANSI print proposed by some sources. Results are identical.

redskull

Sorry to revive such a dead topic, but this thread has been bugging me every since.  After a lot of experimenting, the weak link in the chain seems to be the font you use for the console, and not the code page/console itself.   There are numerous restrictions on the fonts available for the console (they must be fixed-size, among other things), and can only be added via a registry hack.  The venerable "Lucida Console" font apparently doesn't contain arabic and hebrew characters, and other fonts which can be used in the console are few and far between.  I tried a "DejaVu Sans Mono" font, which displayed the arabic characters no problem (ASCII versions through code page 1256).  Hebrew (1255) was still a dud.

The bottom line seems to be, for ASCII, you need to the right code page installed (both the registry entry *and* the c_XXXX.nls file), and have a font with the right characters programmed into it.  For Unicode, all you need is the right font.  These probably vary enormously between different versions sold in different countries.  I was completly incapable of generating JJ's results (the repaint bug), on either Vista or XP; all I ever got was blocks with the Lucida Console font.

JJ - what font were you using to generate your results, and what version?

-r

Strange women, lying in ponds, distributing swords, is no basis for a system of government

jj2007

Quote from: redskull on May 06, 2010, 12:14:41 PM

JJ - what font were you using to generate your results, and what version?


Lucida Console under cmd.exe, lucon.ttf created 04 August 2004, 14:00:00

My impression is that Windows substitutes the missing characters by printing them "over" the console DC using some Windows non-console font.

redskull

It's my general understanding that if a particular TT font doesn't have the appropriate character, it will draw the "notdef" character (normally the box) instead.  For example, if I switch my console from Lucida to DejaVu, the previous output will change from boxes to arabic (and back again).  If I copy and paste the boxes from the Lucida console into Notepad++ (with the right encoding set), they paste in correctly (with the notepad font as courier new).  The previous KB article I linked to seems to be incorrect; the CONSOLE supports any character you can come up with; it's the default console font that doesn't include the needed character sets.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

oex

Cite Precedence
Quote from: redskull on May 06, 2010, 12:14:41 PM
Sorry to revive such a dead topic

I have only 2 fonts available for Console.... How do I add more?

Available:
    Lucinda Console
    Raster Fonts

I want to print out UTF-8....

      invoke SetConsoleOutputCP, 65001 ; UTF-8
      push eax
      invoke WriteConsoleW, sdwStdOutput, oexString, oexStringLength, esp, 0
      pop eax


Quote from: redskull on May 06, 2010, 02:08:08 PM
It's my general understanding that if a particular TT font doesn't have the appropriate character, it will draw the "notdef" character (normally the box) instead.

Yes this is my problem

Quote from: redskull on May 06, 2010, 02:08:08 PM
For example, if I switch my console from Lucida to DejaVu

This is where I maybe have missed the blatently obvious

I am using Windows XP SP3
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

jj2007

Peter,

Attached a full example for use with MasmBasic. Just open in RichMasm.exe and hit F6. You may insert int 3 to see its beauty in Olly - the OPT_ions are arranged for symbols.

hth,
Jochen

redskull

Quote from: oex on June 03, 2011, 10:36:42 PM
This is where I maybe have missed the blatently obvious

KB 247815 is the official way to do it, and lists the necessary steps (editing the registry): http://support.microsoft.com/kb/247815

The font I refered to (and still use) is "DejaVu Sans Mono".  It has a fairly complete set of foreign language fonts, but being a native english speaker, I don't delve into the more esoteric Unicode points, so YMMV.  Plus I prefer the way it looks: http://dejavu-fonts.org/wiki/Main_Page

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

jj2007

#14
One more example showing what happens to redirected output - see redirect= in attachment.
For redirect=0, PrintWide.exe >ucUtf8.txt results in a UTF-8 file being created. A Unicode editor will show it correctly.
For redirect=1, PrintWide.exe >ucFull.txt results in a Unicode file being created.

The "real" Unicode version can be displayed correctly by Notepad, RichMasm etc, while hutch' new Unicode editor inserts a blank after the first char (it doesn't like BOMs - but most editors, including Notepad, add a BOM when saving a Unicode or UTF-8 file...).
The UTF version can be displayed correctly only by Notepad. The next version of RichMasm will display UTF-8 files, too - beta attached. Extract with "use folder names", then drag test_UTF-8.asm or test_Unicode.asm over RichMasm.exe to see a plain text file with a Chinese comment. Press F6 to assemble & link

Edit: New version looks for a UTF-8 BOM, and will handle *.asm files in UTF-8 format, i.e. you can add Chinese or Russian or Arabic comments and still flawlessly assemble your code. Same for "real" Unicode, but remember it bloats your source by a factor 2 - not so for little brother UTF-8.