News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Optimized UNICODE Conversion

Started by bozo, November 03, 2007, 06:47:47 AM

Previous topic - Next topic

bozo

i did a quick search through the forums for unicode conversion routines, and didn't find anything suitable for this thing i'm working on.
i'm just wondering if anyone has written an optimized ANSI->UNICODE conversion routine, capable of counting its length.
currently, the code is pretty simple as:

store_string:
   movzx eax,byte ptr[esi+ecx]
   mov word ptr[edi+2*ecx],ax
   test al,al
   lea ecx,[ecx+1]
   jne store_string


so, has anyone got optimised code for this purpose, like using SSE or MMX??
thank you

asmfan

For simple conversion 1 to 2 bytes you need just few instructions

pxor xmm0,xmm0
punpcklbw xmm1,xmm0

in xmm1 you already have 8 preloaded bytes in first half of 16-byte xmm register. (All you need is sse2 cpu)
Russia is a weird place

bozo

thanks asmfan, thats just what i need!
nice avatar btw :bg

zooba

Just keep in mind that there's more to Unicode conversions than simply extending the size of a character. If you're dealing with 7-bit ANSI you're probably fine but keep in mind that Windows actually uses multi-byte character sets, which don't convert to UTF-16 simply by extending the character.

The MultiByteToWideChar function will do exactly what you want including returning the length of the new string, though it may not have the performance required. Just remember that it is possible (and really quite easy) to generate a string that won't convert to UTF-16 simply by zero-extending each character.

Cheers,

Zooba :U

bozo

hey zooba

yes, this is something i was gonna mention in original message, but as it is right now, only 7-bit ANSI characters are converted to UNICODE.
using windows API, would seriously reduce performance of code.
i did find a thread on UTF-16 conversion which is useful should i want to add full support for all character sets.

BR