News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Converting C to MASM (newbie stuff)

Started by axtens, April 14, 2007, 03:18:23 AM

Previous topic - Next topic

axtens

G'day everyone,

Well it seems I'm not as intelligent as I once thought. I really need help.

The code below is from the Unicode Consortium and converts UTF32 to UTF16. This is one of 6 similar routines. I figure if can get help with the first, I should be able to wriggle through the rest without losing too much skin.

sourceStart is a pointer to a pointer to an array of unsigned 32 bit values.
sourceEnd is a pointer to the end of the same array.
targetStart is a pointer to a pointer to an array of unsigned 16 bit values.
targetEnd is a pointer to the end of the same array.
flags is an enum, strictConversion or lenientConversion.

This stuff is from the header file and the main source:

typedef unsigned long UTF32; /* at least 32 bits */
typedef unsigned short UTF16; /* at least 16 bits */
typedef unsigned char UTF8; /* typically 8 bits */
typedef unsigned char Boolean; /* 0 or 1 */

/* Some fundamental constants */
#define UNI_REPLACEMENT_CHAR (UTF32)0x0000FFFD
#define UNI_MAX_BMP (UTF32)0x0000FFFF
#define UNI_MAX_UTF16 (UTF32)0x0010FFFF
#define UNI_MAX_UTF32 (UTF32)0x7FFFFFFF
#define UNI_MAX_LEGAL_UTF32 (UTF32)0x0010FFFF

typedef enum {
conversionOK, /* conversion successful */
sourceExhausted, /* partial character in source, but hit end */
targetExhausted, /* insuff. room in target for conversion */
sourceIllegal /* source sequence is illegal/malformed */
} ConversionResult;

typedef enum {
strictConversion = 0,
lenientConversion
} ConversionFlags;

static const int halfShift  = 10; /* used for shifting by 10 bits */

static const UTF32 halfBase = 0x0010000UL;
static const UTF32 halfMask = 0x3FFUL;

#define UNI_SUR_HIGH_START  (UTF32)0xD800
#define UNI_SUR_HIGH_END    (UTF32)0xDBFF
#define UNI_SUR_LOW_START   (UTF32)0xDC00
#define UNI_SUR_LOW_END     (UTF32)0xDFFF
#define false    0
#define true     1

My biggest hassle at the moment is how to handle the incoming pointer to pointer to array parameters.

Any ideas?

Kind regards,
Bruce.

actual routine

ConversionResult ConvertUTF32toUTF16 (
const UTF32** sourceStart, const UTF32* sourceEnd,
UTF16** targetStart, UTF16* targetEnd, ConversionFlags flags) {
    ConversionResult result = conversionOK;
    const UTF32* source = *sourceStart;
    UTF16* target = *targetStart;
    while (source < sourceEnd) {
UTF32 ch;
if (target >= targetEnd) {
     result = targetExhausted; break;
}
ch = *source++;
if (ch <= UNI_MAX_BMP) { /* Target is a character <= 0xFFFF */
     /* UTF-16 surrogate values are illegal in UTF-32; 0xffff or 0xfffe are both reserved values */
     if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END) {
      if (flags == strictConversion) {
          --source; /* return to the illegal value itself */
          result = sourceIllegal;
          break;
      } else {
      *target++ = UNI_REPLACEMENT_CHAR;
            }
     } else {
      *target++ = (UTF16)ch; /* normal case */
     }
} else if (ch > UNI_MAX_LEGAL_UTF32) {
     if (flags == strictConversion) {
      result = sourceIllegal;
     } else {
      *target++ = UNI_REPLACEMENT_CHAR;
     }
    } else {
     /* target is a character in range 0xFFFF - 0x10FFFF. */
     if (target + 1 >= targetEnd) {
  --source; /* Back up source pointer! */
  result = targetExhausted; break;
     }
     ch -= halfBase;
     *target++ = (UTF16)((ch >> halfShift) + UNI_SUR_HIGH_START);
     *target++ = (UTF16)((ch & halfMask) + UNI_SUR_LOW_START);
}
    }
    *sourceStart = source;
    *targetStart = target;
    return result;
}

Tedd

UTF32** sourceStart
mov eax,sourceStart    ; sourceStart
mov ecx,[eax]               ; *sourceStart
mov eax,[ecx]               ; **sourceStart


You could do it all through eax, but to access the next array pointer all you need to do is:

add ecx,4
mov eax,[ecx]


...and so on for each element of the array.
No snowflake in an avalanche feels responsible.

axtens

That simple??!! Wow. So much for that panic-fest.

Thanks.

Bruce.

Tedd

It's okay, pointers seem to cause people so many problems, but when you get to it low-level they're pretty simple :wink
No snowflake in an avalanche feels responsible.