I needed this one as I am working on a unicode project at the moment, paste in normal ANSI text or open an ANSI text file, click the button and it will convert the text to DW notation unicode.
The tool is not exhaustively tested but appears to be working OK on normal ANSI plain text.
This is a test of ANSI to UNICODE.
becomes
.data
align 4
[rename me] \
dw "T","h","i","s"," ","i","s"," ","a"," ","t","e","s","t"," ","o","f"," ","A","N","S","I"," ","t"
dw "o"," ","U","N","I","C","O","D","E",".",0,0
.code
The only non quoted characters are ANSI 9, 10 ans 13 and the trailing 0,0 terminator.
Looks like it does ASCII to UNICODE and not ANSI, there is a difference. It does not properly handle ANSI characters 128 - 159, for example ANSI character 133 (...) Unicode 0x2026 is displayed as a black box, as are the other extended ANSI characters I tried.
I would expect that result for high ASCII/ANSI (128 to 255) and it absolutely will not handle higher that 255. I convert unicode to BYTE data if it is the source. The tool is aimed at the normal printable range, 32 to 127 and it does most of those OK.
Thought i should run this test on the high ASCII set and it is as I expect, font dependent, using the tool with FIXEDSYS showed a number of blocks, paste the result into an editor that uses a true type font and the complete (apart from control characters) set is displayed. What the tool does is write whatever the character is between quotes and places a comma between them.
.data
align 4
[rename me] \
dw "","","","","","","","","","","","","","","","","","","","","","","",""
dw "",""," ","!",""","#","$","%","&","'","(",")","*","+",",","-",".","/","0","1","2","3","4","5"
dw "6","7","8","9",":",";","<","=",">","?","@","A","B","C","D","E","F","G","H","I","J","K","L","M"
dw "N","O","P","Q","R","S","T","U","V","W","X","Y","Z","[","\","]","^","_","`","a","b","c","d","e"
dw "f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","{","|","}"
dw "~","","€","?",",","ƒ",",,","...","†","‡","ˆ","‰","Š","‹","Œ","?","Ž","?","?","'","'",""",""","•"
dw "–","—","˜","™","š","›","œ","?","ž","Ÿ"," ","¡","¢","£","¤","¥","¦","§","¨","©","ª","«","¬",""
dw "®","¯","°","±","²","³","´","µ","¶","·","¸","¹","º","»","¼","½","¾","¿","À","Á","Â","Ã","Ä","Å"
dw "Æ","Ç","È","É","Ê","Ë","Ì","Í","Î","Ï","Ð","Ñ","Ò","Ó","Ô","Õ","Ö","×","Ø","Ù","Ú","Û","Ü","Ý"
dw "Þ","ß","à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ð","ñ","ò","ó","ô","õ"
dw "ö","÷","ø","ù","ú","û","ü","ý","þ",0,0
.code
I wonder where these problems come from. The stuff below is copied/extracted from ordinary text written in a RichEdit control, and translated with a simple lodsb/stosw combi, except for the Arabic and Chinese part that comes from resource strings and is displayed only to demonstrate that the console writes Unicode. The snippet uses German umlauts and looks OK, both for console output and wide MessageBox...
One observation is that the RichEdit control says ä = 228 and ü = 252, while you need to use Alt 132 and Alt 129 to enter them if you don't have a German keyboard
(http://www.masm32.com/board/index.php?action=dlattach;topic=16643.0;id=9256)
include \masm32\MasmBasic\MasmBasic.inc
Init
Dim My$(9)
wLet My$(3)=wChr$("Hällö cöder")+wCrLf$+wChr$("Höw äre yöü?")
ConsoleColor cYellow
wPrint My$(3), wCrLf$
wPrint wRes$(803), wCrLf$
wPrint wRes$(1203), wCrLf$
wMsgBox 0, My$(3), "Hällo", MB_OK
Exit
end start
(Do not download the attachment, it is the image above disguised as ZIP)
Unicode is the easy part, if I need it in the data section I do a simple BYTE conversion to get the exact characters without the need for remapping.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
include \masm32\macros\ucmacros.asm
.data?
value dd ?
.data
konnichiwa_jj_san \
db 83,48,147,48,107,48,97,48,143,48,195,48,106,0,85,48
db 147,48,2,48,0,0
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
invoke MessageBoxW,0,OFFSET konnichiwa_jj_san,uni$("Hi JJ"),MB_OK
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
The tool above does exactly what the output shows, it places any character within double quotes in a DW block, comma separates them, replaces ascii 9 10 + 13 with an integer and double zero terminates the result.