I can't convince WriteConsoleW to display any 'real' Unicode glyphs in the console window:
Quoteinclude \masm32\MasmBasic\MasmBasic.inc
.code
start: ; normally 1252, user must set console properties to Lucida Console:
invoke SetConsoleOutputCP, 1256 ; arabic; CP_UTF8 aka 65001 doesn't work
push esi ; oh well the ABI ;-)
Let esi=wRes$(123) ; get a Unicode string from the resource stringtable
; pushad
; invoke MessageBoxW, 0, esi, wChr$("Unicode is great:"), MB_OK ; works fine, displays arabic text
; popad
mov edx, esp ; save the slot address
push edx ; create the slot
push 0 ; reserved para
push edx ; slot address for the chars written variable
push wLen(esi) ; numbytes
push esi ; ptr to the wide string
invoke GetStdHandle, STD_OUTPUT_HANDLE ; get handle to console input
push eax
call WriteConsoleW ; invoke WriteConsoleW, eax, 7, edx, 0 ; eax=ptr TheChar, edx=ptr TheCount
pop edx
pop esi
getkey
Exit
end start
Rsrc
STRINGTABLE
BEGIN
123, "An arabic text: ١- أذكر حدثين من الحرب كان لهما تأثير على الرأي العام."
END
Rsrc
Results from copying from the console and pasting here:
Quote
With raster font:
CP_UTF8: An arabic text: Ù¡- أذكر Øدثين Ù...Ù† اÙ,,Øرب كان Ù,,Ù‡Ù...ا
1256: An arabic text: 1- أذكر حدثين من الحرب كان لهما تأثير على الرأي العام.
With Lucida console:
1256: An arabic text: 1- أذكر حدثين من الحرب كان لهما تأثير على الرأي العام.
CP_UTF8: An arabic text: ١- أذكر حدثين من الحرب كان لهما تأثير على الرأي العام.
Unfortunately, none of the four versions display any arabic on my console - just strange characters or brackets. Has anybody seen Chinese, Japanese or Arabic glyphs displayed in the console window? If yes, what did the trick?
do you have the arabic font package installed ?
also - for the command prompt, there is something called code pages
SetConsoleOutputCP() is the appropriate function; as for the installed code pages, they're somewhere in the registers (current control set?), I can't recall exactly where. You get the ID, and then pass it as the argument
-r
*edit* - nevermind, i see you already have that. From MS knowledge base: 821083
QuoteArabic and Hebrew (and for that matter Thai, Hindi, and other complex scripts) are not supported on Windows 2000 and Windopws XP console.
Quote from: redskull on March 29, 2010, 04:29:14 PM
From MS knowledge base: 821083
QuoteArabic and Hebrew (and for that matter Thai, Hindi, and other complex scripts) are not supported on Windows 2000 and Windopws XP console.
EDIT:Mystery resolved, thanks.The story goes on - Arabic text can be displayed, as shown in attachment, WriteConsoleW_AL.png
But why? Taking account of the following observations...:
- it works fine for Russian,
- it works for Arabic only when launched via F6 (assemble & link & run) from RichMasm,
- it fails when launching the exe directly from the commandline (WriteConsoleW_AL.png),
- but not for Russian,
- all that with identical console properties (Lucida console, CPs),
...the only logical explanation is that it depends on the mood of the OS.
Now I sincerely hope that somebody here in the Forum is right now sufficiently close to the Ballmer Peak (http://www.urbandictionary.com/define.php?term=Ballmer%20Peak) to explain all this.
Hi,
Do things change between full screen and windowed?
Regards,
Steve
Quote from: FORTRANS on March 29, 2010, 09:59:11 PM
Do things change between full screen and windowed?
Yes indeed, Sherlock Steve Holmes...! When I ...
- double-click the exe from Explorer (or 2x), the initial display for Arabic is ???
- but when I resize the window, the Arabic appears
- when I launch the exe via command.com, the initial display for Arabic is correct and stays correct
- except if I switch to full screen mode with Ctrl Return
- finally, if I launch the exe via cmd, the initial display for Arabic is ??? and no tricks can convince it to display Arabic
Now I am sure this is all documented somewhere on MSDN, right?
:boohoo:
After some thought and wine, i reverse my previous post: if you are using Unicode functions, then it shouldn't matter what code page you are using at all. The output code page should only matter during an ASCII version of WriteConsole. Since those characters are only available for a true-type font, then the full screen behavior makes sense (full screen is raster only, right?). The "official" supported Unicode ranges for Lucida Console font are:
Basic Latin
Latin-1 Supplement
Latin Extended-A
Latin Extended-B
Greek
Cyrillic
General Punctuation
Box Drawing
Block Elements
which doesn't include the arabic. As far as the command vs cmd, as the tree said to the lumberjack, I'm stumped.
-r
Quote from: redskull on March 29, 2010, 11:58:02 PM
As far as the command vs cmd, as the tree said to the lumberjack, I'm stumped.
Apparently it's a crappy WM_PAINT handler. When you launch the exe via double-click, and drag a small floating window over the arabic text, the text is redrawn in a rather, ehm, fancy way. Example:
(http://www.webalice.it/jj2006/pics/CrappyConsoleW.png)
Here is the resource file version (displays fine in FF and MSIE):
An arabic text: ١- أذكر حدثين من الحرب كان لهما تأثير على الرأي العام. END
Here I tried to move the floating window only over the lower line of text, so the upper one should be the correct display (I don't speak Arabic and have no idea what it actually means, so apologies if it says something wrong).
The Russian text is not affected by this problem, it always works fine. Note that in all cases I used both WriteConsoleW and the workaround via WideCharToMultiByte(CP_UTF8, ...)+ANSI print proposed by some sources. Results are identical.
Sorry to revive such a dead topic, but this thread has been bugging me every since. After a lot of experimenting, the weak link in the chain seems to be the font you use for the console, and not the code page/console itself. There are numerous restrictions on the fonts available for the console (they must be fixed-size, among other things), and can only be added via a registry hack. The venerable "Lucida Console" font apparently doesn't contain arabic and hebrew characters, and other fonts which can be used in the console are few and far between. I tried a "DejaVu Sans Mono" font, which displayed the arabic characters no problem (ASCII versions through code page 1256). Hebrew (1255) was still a dud.
The bottom line seems to be, for ASCII, you need to the right code page installed (both the registry entry *and* the c_XXXX.nls file), and have a font with the right characters programmed into it. For Unicode, all you need is the right font. These probably vary enormously between different versions sold in different countries. I was completly incapable of generating JJ's results (the repaint bug), on either Vista or XP; all I ever got was blocks with the Lucida Console font.
JJ - what font were you using to generate your results, and what version?
-r
Quote from: redskull on May 06, 2010, 12:14:41 PM
JJ - what font were you using to generate your results, and what version?
Lucida Console under cmd.exe, lucon.ttf created 04 August 2004, 14:00:00
My impression is that Windows substitutes the missing characters by printing them "over" the console DC using some Windows non-console font.
It's my general understanding that if a particular TT font doesn't have the appropriate character, it will draw the "notdef" character (normally the box) instead. For example, if I switch my console from Lucida to DejaVu, the previous output will change from boxes to arabic (and back again). If I copy and paste the boxes from the Lucida console into Notepad++ (with the right encoding set), they paste in correctly (with the notepad font as courier new). The previous KB article I linked to seems to be incorrect; the CONSOLE supports any character you can come up with; it's the default console font that doesn't include the needed character sets.
-r
Cite Precedence
Quote from: redskull on May 06, 2010, 12:14:41 PM
Sorry to revive such a dead topic
I have only 2 fonts available for Console.... How do I add more?
Available:
Lucinda Console
Raster Fonts
I want to print out UTF-8....
invoke SetConsoleOutputCP, 65001 ; UTF-8
push eax
invoke WriteConsoleW, sdwStdOutput, oexString, oexStringLength, esp, 0
pop eax
Quote from: redskull on May 06, 2010, 02:08:08 PM
It's my general understanding that if a particular TT font doesn't have the appropriate character, it will draw the "notdef" character (normally the box) instead.
Yes this is my problem
Quote from: redskull on May 06, 2010, 02:08:08 PM
For example, if I switch my console from Lucida to DejaVu
This is where I maybe have missed the blatently obvious
I am using Windows XP SP3
Peter,
Attached a full example for use with MasmBasic. Just open in RichMasm.exe and hit F6. You may insert int 3 to see its beauty in Olly - the OPT_ions are arranged for symbols.
hth,
Jochen
Quote from: oex on June 03, 2011, 10:36:42 PM
This is where I maybe have missed the blatently obvious
KB 247815 is the official way to do it, and lists the necessary steps (editing the registry): http://support.microsoft.com/kb/247815
The font I refered to (and still use) is "DejaVu Sans Mono". It has a fairly complete set of foreign language fonts, but being a native english speaker, I don't delve into the more esoteric Unicode points, so YMMV. Plus I prefer the way it looks: http://dejavu-fonts.org/wiki/Main_Page
-r
One more example showing what happens to redirected output - see redirect= in attachment.
For redirect=0, PrintWide.exe >ucUtf8.txt results in a UTF-8 file being created. A Unicode editor will show it correctly.
For redirect=1, PrintWide.exe >ucFull.txt results in a Unicode file being created.
The "real" Unicode version can be displayed correctly by Notepad, RichMasm etc, while hutch' new Unicode editor inserts a blank after the first char (it doesn't like BOMs - but most editors, including Notepad, add a BOM when saving a Unicode or UTF-8 file...).
The UTF version can be displayed correctly only by Notepad. The next version of RichMasm will display UTF-8 files, too - beta attached. Extract with "use folder names", then drag test_UTF-8.asm or test_Unicode.asm over RichMasm.exe to see a plain text file with a Chinese comment. Press F6 to assemble & link
Edit: New version looks for a UTF-8 BOM, and will handle *.asm files in UTF-8 format, i.e. you can add Chinese or Russian or Arabic comments and still flawlessly assemble your code. Same for "real" Unicode, but remember it bloats your source by a factor 2 - not so for little brother UTF-8.
Another font you can use for the console is Consolas in Windows Vista and Seven. I have it set up for my consoles per KB 247815. It's monospace, has slashed zeroes and supports Unicode.
Quote from: redskull on May 06, 2010, 12:14:41 PM
For Unicode, all you need is the right font. These probably vary enormously between different versions sold in different countries. I was completly incapable of generating JJ's results (the repaint bug), on either Vista or XP; all I ever got was blocks with the Lucida Console font.
I am still playing around with this problem, and to say the least, Windows misbehaves. Example output:
(http://www.webalice.it/jj2006/pics/Unicode2Console1.png)
Same console window but after clicking up & down into the scrollbar:
(http://www.webalice.it/jj2006/pics/Unicode2Console2.png)
Directly after the launch, items 802+1203 display boxes instead of text. Note these are the only lines that get wrapped. By clicking into the scrollbar, the text gets displayed properly.
You need MB version 23.06.2011 (http://www.masm32.com/board/index.php?topic=12460.0) to assemble the attachment. Under the hood, when Recall detects a Unicode textfile, it launches a routine that sets the console environment using
- SetConsoleOutputCP, CP_UTF8
- GetCurrentConsoleFont
.if the index is lower than 10 ; then it must be a raster font, so...
- GetNumberOfConsoleFonts
- SetConsoleFont to last index ; ... use the last one, which is usually a Lucida Console example
.endif
- SetConsoleTextAttribute to gray on black
The snippet below works fine, provided the languages are supported on one's PC, wrapped lines are avoided, and the user has at least once used a Lucida console in the current session. Overall, the handling of Unicode in consoles looks like amateur software, and it may be proof that Microsoft does not pay much attention to console mode users in China or Arabic countries...
Quoteinclude \masm32\MasmBasic\MasmBasic.inc ; Download (http://www.masm32.com/board/index.php?topic=12460.0)
Init
Recall "Unicode2Console.rc", L$() ; that file is Unicode, as the name says ;-)
xchg eax, ebx ; # of strings read into ebx
; ConsoleColor cGray ; grey on black is the default
For_ n=0 To ebx-1
Print L$(n), CrLf$ ; print the strings
Next
ConsoleColor cYellow ; yellow on black
For_ n=0 To ebx-1
Print L$(n), CrLf$ ; print the strings again
Next
Inkey CrLf$, "Check the wrapped text, and see what happens if you use the scrollbar"
Exit
end start
Quote from: jj2007 on June 23, 2011, 10:46:56 AM
Note these are the only lines that get wrapped. By clicking into the scrollbar, the text gets displayed properly.
I don't think it's just Unicode, as I'm inclined to belive that it's strictly a unicode program (i.e. everything gets converted before it gets displayed. Some cursory once-overs of the langauage "code page" files seems like they just map ASCII charcters to Unicode ones). It's when a charcters doesn't fit into the fixed-size box that seems to have the console stumped. As to why it works sometimes but not others, my only guess is that they put some "idiot proofing" into it to keep overhanging characters from painting pictures, but were not exactly consistent about where they tried to save you from yourself and where they left you to your own maddness. An interesting test I would like to see is erasing some of those characters and seeing what gets left behind.
When I run the test on my machine, I just get the extended ASCII set without fancy letters, but rendering and scrolling works as it should.
What I find more interesting is the regular letters that got cut off (The "nese" in Chinese), especially the half-an-n that's left over.
-r
Quote from: redskull on June 23, 2011, 11:23:44 PM
What I find more interesting is the regular letters that got cut off (The "nese" in Chinese), especially the half-an-n that's left over.
I couldn't find any solution for the missing 'nese but at least I solved the painting problem by inserting an InvalidateRect just before the Inkey. Now I am tempted to close the case for good, but still I would be curious to see what Windows 7 users with enabled Arabic & Chinese see when they run the test above.
Note the attachment above still lacks the InvalidateRect, for testing purposes. The current MasmBasic inserts it automatically if Unicode or UTF8 was detected and when it hits an Inkey statement.