News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

New toy, ANSI to UNICODE text converter.

Started by hutch--, May 09, 2011, 03:14:56 AM

Previous topic - Next topic

hutch--

I needed this one as I am working on a unicode project at the moment, paste in normal ANSI text or open an ANSI text file, click the button and it will convert the text to DW notation unicode.

The tool is not exhaustively tested but appears to be working OK on normal ANSI plain text.


This is a test of ANSI to UNICODE.

becomes

.data
align 4
[rename me] \
dw "T","h","i","s"," ","i","s"," ","a"," ","t","e","s","t"," ","o","f"," ","A","N","S","I"," ","t"
dw "o"," ","U","N","I","C","O","D","E",".",0,0
.code


The only non quoted characters are ANSI 9, 10 ans 13 and the trailing 0,0 terminator.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

donkey

Looks like it does ASCII to UNICODE and not ANSI, there is a difference. It does not properly handle ANSI characters 128 - 159, for example ANSI character 133 (...) Unicode 0x2026 is displayed as a black box, as are the other extended ANSI characters I tried.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

hutch--

I would expect that result for high ASCII/ANSI (128 to 255) and it absolutely will not handle higher that 255. I convert unicode to BYTE data if it is the source. The tool is aimed at the normal printable range, 32 to 127 and it does most of those OK.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

Thought i should run this test on the high ASCII set and it is as I expect, font dependent, using the tool with FIXEDSYS showed a number of blocks, paste the result into an editor that uses a true type font and the complete (apart from control characters) set is displayed. What the tool does is write whatever the character is between quotes and places a comma between them.


.data
align 4
[rename me] \
dw "","","","","","","","","","","","","","","","","","","","","","","",""
dw "",""," ","!",""","#","$","%","&","'","(",")","*","+",",","-",".","/","0","1","2","3","4","5"
dw "6","7","8","9",":",";","<","=",">","?","@","A","B","C","D","E","F","G","H","I","J","K","L","M"
dw "N","O","P","Q","R","S","T","U","V","W","X","Y","Z","[","\","]","^","_","`","a","b","c","d","e"
dw "f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","{","|","}"
dw "~","","€","?",",","ƒ",",,","...","†","‡","ˆ","‰","Š","‹","Œ","?","Ž","?","?","'","'",""",""","•"
dw "–","—","˜","™","š","›","œ","?","ž","Ÿ"," ","¡","¢","£","¤","¥","¦","§","¨","©","ª","«","¬","­"
dw "®","¯","°","±","²","³","´","µ","¶","·","¸","¹","º","»","¼","½","¾","¿","À","Á","Â","Ã","Ä","Å"
dw "Æ","Ç","È","É","Ê","Ë","Ì","Í","Î","Ï","Ð","Ñ","Ò","Ó","Ô","Õ","Ö","×","Ø","Ù","Ú","Û","Ü","Ý"
dw "Þ","ß","à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ð","ñ","ò","ó","ô","õ"
dw "ö","÷","ø","ù","ú","û","ü","ý","þ",0,0
.code
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

I wonder where these problems come from. The stuff below is copied/extracted from ordinary text written in a RichEdit control, and translated with a simple lodsb/stosw combi, except for the Arabic and Chinese part that comes from resource strings and is displayed only to demonstrate that the console writes Unicode. The snippet uses German umlauts and looks OK, both for console output and wide MessageBox...
One observation is that the RichEdit control says ä = 228 and ü = 252, while you need to use Alt 132 and Alt 129 to enter them if you don't have a German keyboard


include \masm32\MasmBasic\MasmBasic.inc

Init
Dim My$(9)
wLet My$(3)=wChr$("Hällö cöder")+wCrLf$+wChr$("Höw äre yöü?")
ConsoleColor cYellow
wPrint My$(3), wCrLf$
wPrint wRes$(803), wCrLf$
wPrint wRes$(1203), wCrLf$
wMsgBox 0, My$(3), "Hällo", MB_OK
Exit
end start


(Do not download the attachment, it is the image above disguised as ZIP)

hutch--

Unicode is the easy part, if I need it in the data section I do a simple BYTE conversion to get the exact characters without the need for remapping.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc
    include \masm32\macros\ucmacros.asm

    .data?
      value dd ?

    .data
    konnichiwa_jj_san \
      db 83,48,147,48,107,48,97,48,143,48,195,48,106,0,85,48
      db 147,48,2,48,0,0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main

    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    invoke MessageBoxW,0,OFFSET konnichiwa_jj_san,uni$("Hi JJ"),MB_OK

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start


The tool above does exactly what the output shows, it places any character within double quotes in a DW block, comma separates them, replaces ascii 9 10 + 13 with an integer and double zero terminates the result.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php