The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: bozo on November 03, 2007, 06:47:47 AM

Title: Optimized UNICODE Conversion
Post by: bozo on November 03, 2007, 06:47:47 AM
i did a quick search through the forums for unicode conversion routines, and didn't find anything suitable for this thing i'm working on.
i'm just wondering if anyone has written an optimized ANSI->UNICODE conversion routine, capable of counting its length.
currently, the code is pretty simple as:

store_string:
   movzx eax,byte ptr[esi+ecx]
   mov word ptr[edi+2*ecx],ax
   test al,al
   lea ecx,[ecx+1]
   jne store_string


so, has anyone got optimised code for this purpose, like using SSE or MMX??
thank you
Title: Re: Optimized UNICODE Conversion
Post by: asmfan on November 03, 2007, 10:55:11 AM
For simple conversion 1 to 2 bytes you need just few instructions

pxor xmm0,xmm0
punpcklbw xmm1,xmm0

in xmm1 you already have 8 preloaded bytes in first half of 16-byte xmm register. (All you need is sse2 cpu)
Title: Re: Optimized UNICODE Conversion
Post by: bozo on November 03, 2007, 11:07:22 PM
thanks asmfan, thats just what i need!
nice avatar btw :bg
Title: Re: Optimized UNICODE Conversion
Post by: zooba on November 04, 2007, 12:41:29 AM
Just keep in mind that there's more to Unicode conversions than simply extending the size of a character. If you're dealing with 7-bit ANSI you're probably fine but keep in mind that Windows actually uses multi-byte character sets, which don't convert to UTF-16 simply by extending the character.

The MultiByteToWideChar (http://msdn2.microsoft.com/en-us/library/ms776413.aspx) function will do exactly what you want including returning the length of the new string, though it may not have the performance required. Just remember that it is possible (and really quite easy) to generate a string that won't convert to UTF-16 simply by zero-extending each character.

Cheers,

Zooba :U
Title: Re: Optimized UNICODE Conversion
Post by: bozo on November 05, 2007, 07:03:43 AM
hey zooba

yes, this is something i was gonna mention in original message, but as it is right now, only 7-bit ANSI characters are converted to UNICODE.
using windows API, would seriously reduce performance of code.
i did find a thread on UTF-16 conversion which is useful should i want to add full support for all character sets.

BR