News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

loop problem that I cannot find!

Started by a_h, September 07, 2005, 04:18:08 PM

Previous topic - Next topic

a_h

Hi folks!

Currently I'm trying to replace some inline assembly with external masm-files to - among others - remove unnecessities like


for (int comp=0; comp<count; comp++)
{
Block_Ptr = block[comp];

__asm
{
mov   eax, [Block_Ptr];
pxor    xmm0, xmm0;
movdqa [eax+0  ], xmm0;
movdqa [eax+16 ], xmm0;
movdqa [eax+32 ], xmm0;
movdqa [eax+48 ], xmm0;
movdqa [eax+64 ], xmm0;
movdqa [eax+80 ], xmm0;
movdqa [eax+96 ], xmm0;
movdqa [eax+112], xmm0;
}
}


my fastcall replacement looks like this (int count in ecx, block is defined as short *block[8] and its baseadress &block[0] is in edx):


        sub ecx, 1
pxor xmm0, xmm0;
.repeat
mov   eax, [edx+4*ecx]
movdqa [eax+0  ], xmm0;
movdqa [eax+16 ], xmm0;
movdqa [eax+32 ], xmm0;
movdqa [eax+48 ], xmm0;
movdqa [eax+64 ], xmm0;
movdqa [eax+80 ], xmm0;
movdqa [eax+96 ], xmm0;
movdqa [eax+112], xmm0;
.untilcxz

but I've the problem that my replacement does not do the same as the for-loop. Of course it counts down instead of up, but that cannot be the problem? count is never reached in the for-loop, so I substract 1 from it (1st line) and count down from count-1 to - including - 0. The loop does the same (just ascending), but yields the correct frame which mine does not.

Changing 4*ecx into 2*ecx (it's a short pointer) does not change anything.

How can I access the elements of short *block[8]?

Unfortunately I'm not able to debug that dll, so I hope somebody can help me out! Thanks a lot!

Cheers, Hannes

a_h

Hey that's my assembly present of today...use just WORD PTR [edx+4*ecx] to indicate that a short stands behing the adress!

Works with my test-procedure, lets go for the real thing!

Cheers, Hannes

a_h

Err, no. Doesn't work...

Maybe somebody has a clue?

Thanks! Cheers, Hannes

tenkey

Is that a typo, or did you really put the subtraction in front of the loop instead of within it?
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

raymond

If your block contains WORDS which you want to load into EAX using a counter as a displacement into the block, try the following:

movzx eax,word ptr[edx+ecx*2]

Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

a_h

Thanks for your tips!

@tenkey: that's correct. The passed counter-variable is 1 too large, repeat-untilcxz decrements itself the cx-register.

@raymond: the pointers itself consume 4bytes of storage, right? Just the variable behind the pointer is a short, so I've to use 4*ecx?

However the main problem is this: if I pass block[comp] (do the loop in C++ and call the asm-procdure for every array element) and remove the repeat-loop everything works.

Just if I want to to do the loop within the asm-procedure - so I've to pass the base-adress &block[0] (in edx) - nothing works. I don't even access the 1st element correctly (I set the counter to 0 so I should get the 1st element only):

        mov ecx, 0
.repeat
movzx eax, WORD PTR [edx+4*ecx]
movdqa [eax+0  ], xmm0;
movdqa [eax+16 ], xmm0;
movdqa [eax+32 ], xmm0;
movdqa [eax+48 ], xmm0;
movdqa [eax+64 ], xmm0;
movdqa [eax+80 ], xmm0;
movdqa [eax+96 ], xmm0;
movdqa [eax+112], xmm0;
.untilcxz


I've to [] once more, since the procedure is procedure(...,**short block) instead of procedure(...,*short block). Doing the same as DWORD doesn't help. However, this should be equivalent to


       procedure(...,block[0])

with

        pxor xmm0, xmm0;
movdqa [edx+0  ], xmm0;
movdqa [edx+16 ], xmm0;
movdqa [edx+32 ], xmm0;
movdqa [edx+48 ], xmm0;
movdqa [edx+64 ], xmm0;
movdqa [edx+80 ], xmm0;
movdqa [edx+96 ], xmm0;
movdqa [edx+112], xmm0;


but it isn't.

Makes me crazy!

Thanks for your help! Hannes

a_h

Boah, that one nearly made me crazy.

The solution is that simple it's a shame that it took so long.


.repeat
mov eax, DWORD PTR [edx+4*ecx-4]
movdqa [eax+0  ], xmm0;
movdqa [eax+16 ], xmm0;
movdqa [eax+32 ], xmm0;
...


@tenkey: yeah, the repeat did have a problem...it didn't loop with 0, so -4 has to be added (and the sub ecx, 1 to be left out).

@raymond: don't ask me why, but above code works. WORD PTR and/or 2*ecx gives just garbage (maybe because the pointers itself are 4 bytes, just the [eax+...] is of type short).

Thanks for your help! Cheers and have a nice evening, Hannes