News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Odd DLL calling behavior

Started by Merrick, June 03, 2005, 03:58:02 PM

Previous topic - Next topic

Merrick

This hasn't gotten much traction over in The Laboratory, so I thought I'd try it here...

This is really a VB question, but I thought I'd drop it here to see if anyone had any ideas...

I have large image data sets (typically 16 bits/pixel, 200x200 pixel images, 128 images in pairs (image and background data)
that are usually 20MB raw (or multiples of 20MB).

Sadly, this data is gathered and saved by (evil, vile, contemptible) LabView, and is therefore Big Endian. To display an image the following operations are performed:

retreive data into 20MB array if integers (2 bytes)
convert Big Endian integers into Little Endian longs (4 bytes, required since the 16 bit integers are unsigned)
calculate contrast image values (real), defined as:
    (image(x pixel, y pixel, image1) - image(x pixel, y pixel, background1)) / image(x pixel, y pixel, background1)
scale real values in each contrast image for display

So, I:
call the API to read the file into an array
wrote an assembly routine to convert the ints to longs
wrote an assembly routine to calculate the contrast images
wrote an assembly routine to find the max and min values in each contrast image for scaling

The first assembly routine is performed on the whole array in one call since the array can be treated as linear at this point.
I got a huge headache trying to work out the multidimensional arrays, so the following two steps are performed one image at a time and are looped for all images in VB.

All of this works perfectly... however...

When you get this all working, you'd naturally move it all into a subroutine, which I did. And here's the part I'm having a hard time with. It all works perfectly in the main routine. When I move it to a subroutine is crashes (literally). But upon further examination, it turns out that I can move everything EXCEPT the max/min routine to the subroutine and everything works perfectly.

Any ideas on why this behaves ths way?

Thanks,
Merrick

Here's the routine for converting Big Endian ints to Little Endian longs:


IntToLongArray proc arrayCount:dword, intInput:dword, longOutput:dword
mov ecx, arrayCount
test ecx, ecx
je doneRotate
nextRotate:
mov edx, intInput
mov ax, word ptr[edx+2*ecx-2]
rol ax, 8
cwde
cmp eax, 0
jge noAdd
add eax, 65536
noAdd:
mov edx, longOutput
mov dword ptr[edx+4*ecx-4], eax
dec ecx
test ecx, ecx
je doneRotate
jmp nextRotate
doneRotate:
ret
IntToLongArray    ENDP


And the code for calculating the contrasts:


LongToContrast2 proc longArray:dword, doubleArray:dword
finit
mov ecx, 40000 ;40,000 pixels per image
nextPixel:
mov eax, ecx ;load current pixel offset
shl eax, 2 ;4 bytes/pixel
mov edx, longArray
fild dword ptr[edx+eax-4] ;load image pixel
add eax, 160000 ;background pixel is always 160,000 bytes (40,000 pixels)
;higher in the array than its corresponding image pixel
fidiv dword ptr[edx+eax-4] ;divide by background pixel
fld1 ;subtract 1 --> shift to contrast frame
fsub
mov eax, ecx ;load current pixel offset
shl eax, 3 ;8 bytes/pixel
mov edx, doubleArray
fstp qword ptr[edx+eax-8] ;load contrast into array
dec ecx ;decrement to next pixel
test ecx, ecx
je lastPixel ;if pixel 0 --> quit
jmp nextPixel ;if pixel n --> repeat
lastPixel:
ret
LongToContrast2 ENDP



And the code for calculating the max/min values:


MaxMin proc doubleArray:dword, minValue:dword, maxValue:dword, dummy:dword
finit
mov edx, doubleArray
fld qword ptr[edx] ;load first value into st[1] (min) and
fld qword ptr[edx] ;st[2] (max) to initialize state
mov ecx, 40000 ;40,000 (200x200) pixels per image
nextPixel:
mov eax, ecx ;load current pixel offset
shl eax, 3 ;8 bytes/pixel
fld qword ptr[edx+eax-8] ;load current pixel value into st[0]
fcom st[1] ;compare st[0] to st[1] (min location)
fstsw ax
fwait
sahf
ja greaterPixel ;if st[0] > st[1] -> do second comparison
mov eax, minValue
fst qword ptr[eax] ;if st[0]<=st[1] store st[0] in minValue
fxch st[1] ;place st[0] in st[1] (new minimum)
jmp doneCompare
greaterPixel:
fcom st[2] ;compare st[0] to st[2] (max location)
fstsw ax
fwait
sahf
jb doneCompare ;if st[0] < st[2] -> done
mov eax, maxValue
fst qword ptr[eax] ;if st[0]>=st[2] store st[0] in maxValue
fxch st[2] ;place st[0] in st[2] (new maximum)
doneCompare:
mov eax, dummy
fstp qword ptr[dummy] ;pop old value off stack
dec ecx ;decrement to next pixel
test ecx, ecx
je lastPixel ;if pixel 0 --> quit
jmp nextPixel ;if pixel n --> repeat
lastPixel:
ret
MaxMin  ENDP


Sorry, but I can't get the comments to line up right. The browser seems to understand and correctly align tab characters to the left of the code, but the tab characters between the code and the comments are ignored. Any helpful suggestions (about my problem, or the formatting)?

roticv

I thought to convert between the Endians, all you need is bswap?

MichaelW

Hi Merric,

You are trying to move all of the code into a single procedure? If it works as individual procedures why not leave it as individual procedures? Assuming each of the procedures is being called only once per array the call overhead should be less than negligible. And it looks to me like it's already hard enough to understand  :bg
eschew obfuscation

Merrick

Haven't tried bswap. I'll give it a try.
I tried xchng and the timing was identical. I'd planned to modify this so I load one value into ax and another into bx then doing the swap so that I can take advantage of pipelining to cut the time in half. Put the whole thing actually loads a 20 MB file and performs all math operations in about 1.2 seconds on a 2.6 GHz P4 laptop and that routine is the fastest of the lot, so was more concerned about working out the actual "bugs" first.

Why do I want to move it into one procedure? Neatness. In the main code it's the difference between:

fred=loadBinary(filename)

and

fred=loadBinary(filename)
for i = 1 to numberOfImages
    call maxmin(fred, max(i),min(i))
next i

(sorry for the VB code)

Thanks for your suggestions.

Mirno

In the example you've posted below, you're not passing in pointers to minValue/maxValue, you're passing them directly.
You've also got a dummy variable in your assembly that isn't present in your VB code.

I don't know VB, so here's the C:

void maxmin(double*, double*, double*, double*);

int main()
{
  double *fred;
  double my_min, my_max, dummy;

  fred = loadBinary();

  for (int i = 0; i < numberOfImages; i++)
  {
    maxmin(fred, &my_min, &my_max, &dummy); // <- & means pass the address of a variable
  }
}


You need to declare the max & min values before you pass them in.

A slightly optimised (I hope) version.

MaxMin proc doubleArray:dword, minValue:dword, maxValue:dword
  finit
  mov   edx, doubleArray

  fld   REAL8 ptr[edx]        ; load first value into st[1] (min) and
  fld   REAL8 ptr[edx]        ; st[2] (max) to initialize state
  mov   ecx, 200 * 200

nextPixel:
  mov   eax, ecx              ; load current pixel offset
  shl   eax, 3                ; 8 bytes/pixel
  fld   REAL8 ptr[edx+eax-8]  ; load current pixel value into st[0]
  fcomi st[0], st[1]          ; compare st[0] to st[1] (min location)
  jnc   greaterPixel
  fxch  st[1]                 ; place st[0] in st[1] (new minimum)
  jmp   doneCompare

greaterPixel:
  fcomi st[0], st[2]          ; compare st[0] to st[2] (max location)
  jc    doneCompare
  fxch  st[2]                 ; place st[0] in st[2] (new maximum)

doneCompare:
  fstp st[0]
  sub  ecx, 1                 ; decrement to next pixel
  jz   lastPixel              ; if pixel != 0 --> carry on

lastPixel:
  mov  eax, minValue
  fstp REAL8 ptr [eax]

  mov  eax, maxValue
  fstp REAL8 ptr [eax]

  ret
MaxMin ENDP


Mirno

Merrick

Mirno,

Thanks for the helpful suggestions. You mention declaring the variable after the C code example and before the assembly example. I'm guessing you mean declared in the VB code? It is. Also, the code I showed is a free-typed fragment to give a rough idea of what the code actually is. I'm pretty sure I'm passing by reference, but will check. Thanks for the suggestion. From memory, I'm doing exactly the same thing when I call the conversion from long to contrast (in fact, I was doing them in the same loop nd it was crashing on the second call). But. I'll go double check.
Thanks for your optimization suggestions!

Merrick