Odd DLL calling behavior

Merrick · June 03, 2005, 03:58:02 PM

This hasn't gotten much traction over in The Laboratory, so I thought I'd try it here...

This is really a VB question, but I thought I'd drop it here to see if anyone had any ideas...

I have large image data sets (typically 16 bits/pixel, 200x200 pixel images, 128 images in pairs (image and background data)
that are usually 20MB raw (or multiples of 20MB).

Sadly, this data is gathered and saved by (evil, vile, contemptible) LabView, and is therefore Big Endian. To display an image the following operations are performed:

retreive data into 20MB array if integers (2 bytes)
convert Big Endian integers into Little Endian longs (4 bytes, required since the 16 bit integers are unsigned)
calculate contrast image values (real), defined as:
(image(x pixel, y pixel, image1) - image(x pixel, y pixel, background1)) / image(x pixel, y pixel, background1)
scale real values in each contrast image for display

So, I:
call the API to read the file into an array
wrote an assembly routine to convert the ints to longs
wrote an assembly routine to calculate the contrast images
wrote an assembly routine to find the max and min values in each contrast image for scaling

The first assembly routine is performed on the whole array in one call since the array can be treated as linear at this point.
I got a huge headache trying to work out the multidimensional arrays, so the following two steps are performed one image at a time and are looped for all images in VB.

All of this works perfectly... however...

When you get this all working, you'd naturally move it all into a subroutine, which I did. And here's the part I'm having a hard time with. It all works perfectly in the main routine. When I move it to a subroutine is crashes (literally). But upon further examination, it turns out that I can move everything EXCEPT the max/min routine to the subroutine and everything works perfectly.

Any ideas on why this behaves ths way?

Thanks,
Merrick

Here's the routine for converting Big Endian ints to Little Endian longs:

Code Select


IntToLongArray proc arrayCount:dword, intInput:dword, longOutput:dword
	mov ecx, arrayCount
	test ecx, ecx
	je doneRotate
	nextRotate:
		mov edx, intInput
		mov ax, word ptr[edx+2*ecx-2]
		rol ax, 8
		cwde
		cmp eax, 0
		jge noAdd
		add eax, 65536
	noAdd:
		mov edx, longOutput
		mov dword ptr[edx+4*ecx-4], eax
		dec ecx
		test ecx, ecx
		je doneRotate
		jmp nextRotate
	doneRotate:
		ret
IntToLongArray    ENDP

And the code for calculating the contrasts:

Code Select


LongToContrast2 proc longArray:dword, doubleArray:dword
	finit
	mov ecx, 40000	;40,000 pixels per image
	nextPixel:
		mov eax, ecx					;load current pixel offset
		shl eax, 2						;4 bytes/pixel
		mov edx, longArray
		fild dword ptr[edx+eax-4]		;load image pixel
		add eax, 160000					;background pixel is always 160,000 bytes (40,000 pixels)
										;higher in the array than its corresponding image pixel
		fidiv dword ptr[edx+eax-4]		;divide by background pixel
		fld1							;subtract 1 --> shift to contrast frame
		fsub
		mov eax, ecx					;load current pixel offset
		shl eax, 3						;8 bytes/pixel
		mov edx, doubleArray
		fstp qword ptr[edx+eax-8]		;load contrast into array
		dec ecx							;decrement to next pixel
		test ecx, ecx
		je lastPixel					;if pixel 0 --> quit
		jmp nextPixel					;if pixel n --> repeat
	lastPixel:
		ret
LongToContrast2 ENDP

And the code for calculating the max/min values:

Code Select


MaxMin proc doubleArray:dword, minValue:dword, maxValue:dword, dummy:dword
	finit
	mov edx, doubleArray
	fld qword ptr[edx]					;load first value into st[1] (min) and
	fld qword ptr[edx]					;st[2] (max) to initialize state
	mov ecx, 40000						;40,000 (200x200) pixels per image
	nextPixel:
		mov eax, ecx					;load current pixel offset
		shl eax, 3						;8 bytes/pixel
		fld qword ptr[edx+eax-8]		;load current pixel value into st[0]
		fcom st[1]						;compare st[0] to st[1] (min location)
		fstsw ax
		fwait
		sahf
		ja greaterPixel					;if st[0] > st[1] -> do second comparison
		mov eax, minValue
		fst qword ptr[eax]				;if st[0]<=st[1] store st[0] in minValue
		fxch st[1]							;place st[0] in st[1] (new minimum)
		jmp doneCompare
	greaterPixel:
		fcom st[2]						;compare st[0] to st[2] (max location)
		fstsw ax
		fwait
		sahf
		jb doneCompare					;if st[0] < st[2] -> done
		mov eax, maxValue
		fst qword ptr[eax]				;if st[0]>=st[2] store st[0] in maxValue
		fxch st[2]							;place st[0] in st[2] (new maximum)
	doneCompare:
		mov eax, dummy
		fstp qword ptr[dummy]			;pop old value off stack
		dec ecx							;decrement to next pixel
		test ecx, ecx
		je lastPixel					;if pixel 0 --> quit
		jmp nextPixel					;if pixel n --> repeat
	lastPixel:
	ret
MaxMin  ENDP

Sorry, but I can't get the comments to line up right. The browser seems to understand and correctly align tab characters to the left of the code, but the tab characters between the code and the comments are ignored. Any helpful suggestions (about my problem, or the formatting)?

roticv · June 03, 2005, 05:16:23 PM

I thought to convert between the Endians, all you need is bswap?

MichaelW · June 03, 2005, 06:35:55 PM

Hi Merric,

You are trying to move all of the code into a single procedure? If it works as individual procedures why not leave it as individual procedures? Assuming each of the procedures is being called only once per array the call overhead should be less than negligible. And it looks to me like it's already hard enough to understand :bg

Merrick · June 06, 2005, 03:27:14 PM

Haven't tried bswap. I'll give it a try.
I tried xchng and the timing was identical. I'd planned to modify this so I load one value into ax and another into bx then doing the swap so that I can take advantage of pipelining to cut the time in half. Put the whole thing actually loads a 20 MB file and performs all math operations in about 1.2 seconds on a 2.6 GHz P4 laptop and that routine is the fastest of the lot, so was more concerned about working out the actual "bugs" first.

Why do I want to move it into one procedure? Neatness. In the main code it's the difference between:

fred=loadBinary(filename)

and

fred=loadBinary(filename)
for i = 1 to numberOfImages
call maxmin(fred, max(i),min(i))
next i

(sorry for the VB code)

Thanks for your suggestions.

Mirno · June 06, 2005, 05:22:47 PM

In the example you've posted below, you're not passing in pointers to minValue/maxValue, you're passing them directly.
You've also got a dummy variable in your assembly that isn't present in your VB code.

I don't know VB, so here's the C:

Code Select


void maxmin(double*, double*, double*, double*);

int main()
{
  double *fred;
  double my_min, my_max, dummy;

  fred = loadBinary();

  for (int i = 0; i < numberOfImages; i++)
  {
    maxmin(fred, &my_min, &my_max, &dummy); // <- & means pass the address of a variable
  }
}

You need to declare the max & min values before you pass them in.

A slightly optimised (I hope) version.

Code Select


MaxMin proc doubleArray:dword, minValue:dword, maxValue:dword
  finit
  mov   edx, doubleArray

  fld   REAL8 ptr[edx]        ; load first value into st[1] (min) and
  fld   REAL8 ptr[edx]        ; st[2] (max) to initialize state
  mov   ecx, 200 * 200

nextPixel:
  mov   eax, ecx              ; load current pixel offset
  shl   eax, 3                ; 8 bytes/pixel
  fld   REAL8 ptr[edx+eax-8]  ; load current pixel value into st[0]
  fcomi st[0], st[1]          ; compare st[0] to st[1] (min location)
  jnc   greaterPixel
  fxch  st[1]                 ; place st[0] in st[1] (new minimum)
  jmp   doneCompare

greaterPixel:
  fcomi st[0], st[2]          ; compare st[0] to st[2] (max location)
  jc    doneCompare
  fxch  st[2]                 ; place st[0] in st[2] (new maximum)

doneCompare:
  fstp st[0]
  sub  ecx, 1                 ; decrement to next pixel
  jz   lastPixel              ; if pixel != 0 --> carry on

lastPixel:
  mov  eax, minValue
  fstp REAL8 ptr [eax]

  mov  eax, maxValue
  fstp REAL8 ptr [eax]

  ret
MaxMin ENDP

Mirno

Merrick · June 07, 2005, 06:51:27 PM

Mirno,

Thanks for the helpful suggestions. You mention declaring the variable after the C code example and before the assembly example. I'm guessing you mean declared in the VB code? It is. Also, the code I showed is a free-typed fragment to give a rough idea of what the code actually is. I'm pretty sure I'm passing by reference, but will check. Thanks for the suggestion. From memory, I'm doing exactly the same thing when I call the conversion from long to contrast (in fact, I was doing them in the same loop nd it was crashing on the second call). But. I'll go double check.
Thanks for your optimization suggestions!

Merrick

News:

Odd DLL calling behavior

Merrick

roticv

MichaelW

Merrick

Mirno

Merrick