The MASM Forum Archive 2004 to 2012

Project Support Forums => MASM32 => Topic started by: hutch-- on May 09, 2011, 03:14:56 AM

Title: New toy, ANSI to UNICODE text converter.
Post by: hutch-- on May 09, 2011, 03:14:56 AM
I needed this one as I am working on a unicode project at the moment, paste in normal ANSI text or open an ANSI text file, click the button and it will convert the text to DW notation unicode.

The tool is not exhaustively tested but appears to be working OK on normal ANSI plain text.


This is a test of ANSI to UNICODE.

becomes

.data
align 4
[rename me] \
dw "T","h","i","s"," ","i","s"," ","a"," ","t","e","s","t"," ","o","f"," ","A","N","S","I"," ","t"
dw "o"," ","U","N","I","C","O","D","E",".",0,0
.code


The only non quoted characters are ANSI 9, 10 ans 13 and the trailing 0,0 terminator.
Title: Re: New toy, ANSI to UNICODE text converter.
Post by: donkey on May 09, 2011, 05:05:47 AM
Looks like it does ASCII to UNICODE and not ANSI, there is a difference. It does not properly handle ANSI characters 128 - 159, for example ANSI character 133 (...) Unicode 0x2026 is displayed as a black box, as are the other extended ANSI characters I tried.
Title: Re: New toy, ANSI to UNICODE text converter.
Post by: hutch-- on May 09, 2011, 05:15:10 AM
I would expect that result for high ASCII/ANSI (128 to 255) and it absolutely will not handle higher that 255. I convert unicode to BYTE data if it is the source. The tool is aimed at the normal printable range, 32 to 127 and it does most of those OK.
Title: Re: New toy, ANSI to UNICODE text converter.
Post by: hutch-- on May 09, 2011, 04:25:53 PM
Thought i should run this test on the high ASCII set and it is as I expect, font dependent, using the tool with FIXEDSYS showed a number of blocks, paste the result into an editor that uses a true type font and the complete (apart from control characters) set is displayed. What the tool does is write whatever the character is between quotes and places a comma between them.


.data
align 4
[rename me] \
dw "","","","","","","","","","","","","","","","","","","","","","","",""
dw "",""," ","!",""","#","$","%","&","'","(",")","*","+",",","-",".","/","0","1","2","3","4","5"
dw "6","7","8","9",":",";","<","=",">","?","@","A","B","C","D","E","F","G","H","I","J","K","L","M"
dw "N","O","P","Q","R","S","T","U","V","W","X","Y","Z","[","\","]","^","_","`","a","b","c","d","e"
dw "f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","{","|","}"
dw "~","","€","?",",","ƒ",",,","...","†","‡","ˆ","‰","Š","‹","Œ","?","Ž","?","?","'","'",""",""","•"
dw "–","—","˜","™","š","›","œ","?","ž","Ÿ"," ","¡","¢","£","¤","¥","¦","§","¨","©","ª","«","¬","­"
dw "®","¯","°","±","²","³","´","µ","¶","·","¸","¹","º","»","¼","½","¾","¿","À","Á","Â","Ã","Ä","Å"
dw "Æ","Ç","È","É","Ê","Ë","Ì","Í","Î","Ï","Ð","Ñ","Ò","Ó","Ô","Õ","Ö","×","Ø","Ù","Ú","Û","Ü","Ý"
dw "Þ","ß","à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ð","ñ","ò","ó","ô","õ"
dw "ö","÷","ø","ù","ú","û","ü","ý","þ",0,0
.code
Title: Re: New toy, ANSI to UNICODE text converter.
Post by: jj2007 on May 09, 2011, 05:20:39 PM
I wonder where these problems come from. The stuff below is copied/extracted from ordinary text written in a RichEdit control, and translated with a simple lodsb/stosw combi, except for the Arabic and Chinese part that comes from resource strings and is displayed only to demonstrate that the console writes Unicode. The snippet uses German umlauts and looks OK, both for console output and wide MessageBox...
One observation is that the RichEdit control says ä = 228 and ü = 252, while you need to use Alt 132 and Alt 129 to enter them if you don't have a German keyboard

(http://www.masm32.com/board/index.php?action=dlattach;topic=16643.0;id=9256)
include \masm32\MasmBasic\MasmBasic.inc

Init
Dim My$(9)
wLet My$(3)=wChr$("Hällö cöder")+wCrLf$+wChr$("Höw äre yöü?")
ConsoleColor cYellow
wPrint My$(3), wCrLf$
wPrint wRes$(803), wCrLf$
wPrint wRes$(1203), wCrLf$
wMsgBox 0, My$(3), "Hällo", MB_OK
Exit
end start


(Do not download the attachment, it is the image above disguised as ZIP)
Title: Re: New toy, ANSI to UNICODE text converter.
Post by: hutch-- on May 10, 2011, 12:34:25 AM
Unicode is the easy part, if I need it in the data section I do a simple BYTE conversion to get the exact characters without the need for remapping.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc
    include \masm32\macros\ucmacros.asm

    .data?
      value dd ?

    .data
    konnichiwa_jj_san \
      db 83,48,147,48,107,48,97,48,143,48,195,48,106,0,85,48
      db 147,48,2,48,0,0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main

    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    invoke MessageBoxW,0,OFFSET konnichiwa_jj_san,uni$("Hi JJ"),MB_OK

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start


The tool above does exactly what the output shows, it places any character within double quotes in a DW block, comma separates them, replaces ascii 9 10 + 13 with an integer and double zero terminates the result.