News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Problems with characters and fonts

Started by RuiLoureiro, May 24, 2005, 06:41:39 PM

Previous topic - Next topic

RuiLoureiro

Hi

   I have a problem with characters like «ç», «õ», «é».
In the MASM editor, i write words like «definições» correctly. After compilation, when the program prints that word ( using WriteConsole function ) the «ç» and «õ» are replaced by others;

When i am using my string editor ( used in the file cons29.zip in my topic Console In-Out ) i write «definições» correctly. The string editor uses WriteConsole function to print.
So, the codes of the characters are not the same in both cases. Whats up ? I was reading the documentation about Fonts but i didnt find out nothing.

1.  How can we control the fonts and the characters in it ? Have you an example, please ? ( May be my friend Michael can help me ! He knows much of this things)

2. Can we use the functions described in the topic Fonts in the Win32.Hlp in this case of console applications ? In what cases ?

Mark Jones,
      When i use your codepage utility with codepages 437 or 860 i get «ç» is char 231 =þ and «õ» is char 21=char 245 =§.


p.s. «definições» means «definitions».

jorgon

Hi RuiLoureiro

Depending on the exact nature of your problem, it's possible you might get some help from my "Writing Unicode Programs" help file here.

This paper discusses to an extent, the interaction between fonts and codepages.

It's the case that WriteConsole will not reproduce Unicode characters properly in the console unless the console font is set to Lucida Console.  You can change this through console "properties".  This might be your problem.
Author of the "Go" tools (GoAsm, GoLink, GoRC, GoBug)

Jeff

hi rui,
perhaps i could be of help.  ;)  your problem with the text is the fact that some of those characters are not part of the (extended) ASCII... character set (or whatever its called).  here is the tables of the standard and extended ASCII.  i believe those characters that you are trying to print out are actually unicode.  so, if you wanted to print that out, i suppose you would need to use WriteConsoleW.  since that expects wide character strings, you must use the WORD type instead of BYTE for your string.  although, im sure using unicode is a lot more complicated than that so i cant really help you more with that.

jorgon

QuoteAlthough, im sure using unicode is a lot more complicated than that so i cant really help you more with that.
It's not complicated at all, and much easier than using codepages.
Author of the "Go" tools (GoAsm, GoLink, GoRC, GoBug)

RuiLoureiro

Hi Jeremy, Hi Jeff

     Thank you so much for your help.
      I was at this moment at your site (Jeremy)- very, very good - reading what you have there.
      I go to work about that tomorrow. By now, its all.

      Thank you
      Best regards
      Rui Loureiro

Jeff

Quote from: jorgon on May 24, 2005, 09:14:59 PM
QuoteAlthough, im sure using unicode is a lot more complicated than that so i cant really help you more with that.
It's not complicated at all, and much easier than using codepages.
I was referring to the information i put up in the post for using unicode (and the fact that i dont know anything else when it comes to using it).  ;)

RuiLoureiro

Hi, Jeremy

      I tried your HelloUnicode1 example ( the basic procs: i am using them in the code below ) and it doesnt work because:

      1. VersionInfo         gives 2  ( OK )
      2. GetModuleHandleW    gives 0  ( wrong )

      3. I tried WriteConsoleW and it is compiled but doesnt work correctly. It prints ???? and others chars, when it is used to print _StringUnicode ( defined as bytes, it is not unicode !).

   So, point 2 means that Kernel32.dll doesnt exist (it is not loaded? ) and point 3 means WriteConsoleW exist ( it works but wrong ). To me, here, the problem seems to be with unicode chars. But the problem i had before, i dont know what it is.

   How can we define strings as unicode in MASM ? (I think the strings we define in .data are compiled to .OBJ in ANSI by MASM )

Hi Jeff,
            Thank you.
...................................................................................................................................

.486
.model flat,stdcall
option casemap:none
;««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
;                                       INCLUDES
;
include \masm32\include\windows.inc
;******************************************************************************
include \masm32\include\kernel32.inc
include \masm32\include\user32.inc
include \masm32\include\masm32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib
includelib \masm32\lib\masm32.lib
;---------------------------------------------------------------------------
.data
align 4
                       dd 26
_StringUnicode         db "acções já às duas íngremes"
;....................................................
_VersionInfo           dd 148                            ;OSVERSIONINFO <>
                       db 144 dup (?)
                       db 144 dup (?)
                       
                       dd 12
  _Kernel32_dll        db "Kernel32.dll"
                       db 00h
                       
_hKernel32_dll        dd ?

                       dd 13
_WriteConsoleW        db "WriteConsoleW"
                       db 00h
                       
_pWriteConsoleW       dd ?
;++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
.data?

_hScreenBuffer         dd ?
_hInputBuffer          dd ?
;------------------------------------------------------------------------------
.code
;««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:           invoke   GetStdHandle, STD_INPUT_HANDLE     
           cmp      eax, INVALID_HANDLE_VALUE
                 je       _eStart

                 mov      _hInputBuffer, eax               
                 ;
                 invoke   GetStdHandle, STD_OUTPUT_HANDLE   
           cmp      eax, INVALID_HANDLE_VALUE
                 je       _ibStart

                 mov       _hScreenBuffer, eax

                 invoke    GetVersionExA, ADDR _VersionInfo
                 mov       edi, offset _VersionInfo
                 mov       eax, dword ptr [edi + 10h]             ; GIVES 2
                 cmp       eax, 1
                 jna       _sbStart
                 ;
                 invoke    GetModuleHandleW, ADDR _Kernel32_dll
                 or        eax, eax                               ; GIVES 0
                 jz        _sbStart
                 
                 mov       _hKernel32_dll, eax

                 invoke    GetProcAddress, eax, ADDR _WriteConsoleW
                 or        eax, eax
                 jz        _sbStart

                 mov       _pWriteConsoleW, eax
                 ; ------------
                 ; Print string
                 ; ------------
                 mov      ebx, offset _StringUnicode
                 ;
                 ; close all
                 ; ---------
_sbStart:       invoke   CloseHandle, _hScreenBuffer         
_ibStart:        invoke   CloseHandle, _hInputBuffer                           
_eStart:         invoke   ExitProcess, 0
;-------------------------------------------------------------------------------
BufferPrintW        proc
                    LOCAL   wNum:DWORD
                    pushad
                   
                    or       ecx, ecx
                    jz       short @F
                   
                    lea      ebx, wNum
                    invoke   WriteConsoleW, _hScreenBuffer,   ; handle
                       esi,              ; print buffer
                                            ecx,              ; write ECX
                                            ebx,              ; Number written
                                             0             
                    cmp     eax, 0
                    jne     short @F
                    ;
                    stc
                    popad
                    ret

@@:                 clc
                    popad
                    ret
BufferPrintW        endp


Phoenix

QuoteHow can we define strings as unicode in MASM ?

Have a look at \masm32\macros\ucmacros.asm (UNICODE support macros for MASM32):

Quotecomment * -----------------------------------------------
      macro to declare UNICODE string data in the .DATA
      section.
      SYNTAX:
      WSTR MyString,"This is a test"
      string length limit = 118 charachers
      control characters like < > etc .. cannot be used
      in the string.
      ------------------------------------------------- *

WSTR    szwFont,"Arial"

Hope it helps...


Regards, Phoenix

RuiLoureiro

Hi,
   I am in a serious doubt ! Must i use UNICODE ? Is it absolutely necessary ?
I have more than 400 fixed length strings. So, in unicode it waste >400 bytes.
But that is not the unique problem.

1.  The string

.data
      db 23
WSTR  _UniString, "é já às íngremes acções"


Where:         é       j   á        à   s        í    ...ç  õ   e   s
code:          E9 20 6A E1 20 E0 73 20 ED    E7 F5 65 73  [ hexa ]

is printed correctly using WriteConsoleW.

2.   But the string

         db 5
_varU2  dw 180
         dw 181
         dw 182
         dw 183
         dw 184

is not correctly printed because codes from 180 to 218 are not the same chars as before ( as is in the Jeff table ). This codes are for draw the windows.
With this problem, i cannot use all strings in unicode. I need to have procs for ANSI and procs for UNICODE.

Is there any help ?
Where can i see the codes for chars in unicode ?
Thanks

tenkey

A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

GregL

Could this be a simple ASCII vs. ANSI problem?

The Windows Console uses ASCII and most Windows GUI programs use ANSI.


RuiLoureiro

Thanks
1. If i am right, ASCII was defined by ANSI. So, Greg, whats the difference
   between ANSI in GUI and ASCII in Console ?

2. I found out:     UNIcodes       Chars (block)
                        0000 – 007F    Basic Latin
                        0080 – 00FF    Latin supplement
                        2500 – 25FF    BoxDrawing

3. If i am not wrong, the fonts have some blocks, not all (is there one that have all defined blocks ? )

4. I suppose Cons.App. begins with a unicode font with chars 0000-00FF. So i
   need to install a font that have the block BoxDrawing.

5. As jorgon said, this can be Lucida Console (i think windows comes with it). 
   «You can change this through console "properties".».
    How can i get "console properties" ? What is the structure of it ?

GregL

QuoteSo, Greg, whats the difference between ANSI in GUI and ASCII in Console ?

     1. For characters above 127, they are completely different.

Here are the tables:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/his_2004Main/htm/ref_character_tables.asp


MichaelW

#13
Thanks for the pointer Greg.

So it looks like the solution is to use the ANSI character set in the console. For the following test the console font must be the Lucida Console TrueType font (right click the console title bar and select properties, then the Font tab). AFAIK this will work only for Windows 2000 and later.

http://support.microsoft.com/default.aspx?scid=kb;en-us;Q99795

MSDN: SetConsoleOutputCP


; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .486                       ; create 32 bit code
    .model flat, stdcall       ; 32 bit memory model
    option casemap :none       ; case sensitive

    include \masm32\include\windows.inc
    include \masm32\include\masm32.inc
    include \masm32\include\user32.inc
    include \masm32\include\kernel32.inc
    include \masm32\include\advapi32.inc
    include \masm32\include\msvcrt.inc

    includelib \masm32\lib\masm32.lib
    includelib \masm32\lib\user32.lib
    includelib \masm32\lib\kernel32.lib
    includelib \masm32\lib\advapi32.lib
    includelib \masm32\lib\msvcrt.lib

    include \masm32\macros\macros.asm
    include \masm32\macros\ucmacros.asm

    NAME_SIZE EQU 1024

    SUBKEY EQU "SYSTEM\CurrentControlSet\Control\Nls\CodePage"

; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      hKey          dd 0
      cbValueName   dd NAME_SIZE
      cbValueData   dd NAME_SIZE
      valueName     db NAME_SIZE dup(0)
      valueData     db NAME_SIZE dup(0)
      subKey        db SUBKEY,0

      ;"é já às íngremes acções"

      wstr dw 0E9h,' ','j',0E1h,' ',0E0h,'s',' ',0EDh,'n','g','r'
           dw 'e','m','e','s',' ','a','c',0E7h,0F5h,'e','s',0
      astr db 0E9h,' ','j',0E1h,' ',0E0h,'s',' ',0EDh,'n','g','r'
           db 'e','m','e','s',' ','a','c',0E7h,0F5h,'e','s',0
    .code
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    ; -----------------------------------------------
    ; Display the value names and the data for the
    ; values that are not assigned a null string.
    ; -----------------------------------------------

    invoke RegOpenKeyEx,HKEY_LOCAL_MACHINE,ADDR subKey,0,
                        KEY_ALL_ACCESS,ADDR hKey
    xor   ebx,ebx
    .REPEAT
        invoke RegEnumValue,hKey,ebx,ADDR valueName,ADDR cbValueName,
                            NULL,NULL,ADDR valueData,ADDR cbValueData
        Switch eax
          Case ERROR_NO_MORE_ITEMS
            .BREAK
          Case ERROR_MORE_DATA
            print chr$("name buffer too small",13,10)
            .BREAK
          Case ERROR_SUCCESS
            .IF valueData != 0
                print ADDR valueName,32,32
                print ADDR valueData,13,10
            .ENDIF           
          Default
            print chr$("unexpected error",13,10)
            .BREAK
        EndSw
        mov   cbValueName,NAME_SIZE
        mov   cbValueData,NAME_SIZE
        inc   ebx
    .UNTIL 0

    print chr$(13,10,13,10)

    ; -------------------------------------
    ; Show the unicode and ASCII strings.
    ; -------------------------------------

    invoke crt__putws, ADDR wstr
    print ADDR astr,13,10,13,10

    ; -----------------------------
    ; Show the current code page.
    ; -----------------------------

    invoke GetConsoleOutputCP
    print ustr$(eax),13,10

    ; ------------------------------------
    ; Change to the WinLatin1 code page.
    ; ------------------------------------

    invoke SetConsoleOutputCP,1252
    print ustr$(eax),13,10

    ; -----------------------------
    ; Show the current code page.
    ; -----------------------------

    invoke GetConsoleOutputCP
    print ustr$(eax),13,10,13,10

    ; -------------------------------------
    ; Show the unicode and ASCII strings.
    ; -------------------------------------

    invoke crt__putws, ADDR wstr
    print ADDR astr,13,10,13,10

    ; --------------------------------
    ; Show the ASCII string as ANSI.
    ; --------------------------------

    MsgBox 0,ADDR astr,"ANSI",MB_OK

    mov   eax, input(13,10,"Press enter to exit...")
    exit
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


EDIT:

Corrected the code so it displays only the values that have something other than a null string assigned, presumably indicating that the code page is actually available.


[attachment deleted by admin]
eschew obfuscation

RuiLoureiro

Hi Greg,
           thanks. Good help. :U

Michael,
           you are a very good person ! many thanks as ever

I go to work.
stay well

ps: i was working with IBM Extended Character Set, i thk.