Hi folks!
Currently I'm trying to replace some inline assembly with external masm-files to - among others - remove unnecessities like
for (int comp=0; comp<count; comp++)
{
Block_Ptr = block[comp];
__asm
{
mov eax, [Block_Ptr];
pxor xmm0, xmm0;
movdqa [eax+0 ], xmm0;
movdqa [eax+16 ], xmm0;
movdqa [eax+32 ], xmm0;
movdqa [eax+48 ], xmm0;
movdqa [eax+64 ], xmm0;
movdqa [eax+80 ], xmm0;
movdqa [eax+96 ], xmm0;
movdqa [eax+112], xmm0;
}
}
my fastcall replacement looks like this (int count in ecx, block is defined as short *block[8] and its baseadress &block[0] is in edx):
sub ecx, 1
pxor xmm0, xmm0;
.repeat
mov eax, [edx+4*ecx]
movdqa [eax+0 ], xmm0;
movdqa [eax+16 ], xmm0;
movdqa [eax+32 ], xmm0;
movdqa [eax+48 ], xmm0;
movdqa [eax+64 ], xmm0;
movdqa [eax+80 ], xmm0;
movdqa [eax+96 ], xmm0;
movdqa [eax+112], xmm0;
.untilcxz
but I've the problem that my replacement does not do the same as the for-loop. Of course it counts down instead of up, but that cannot be the problem? count is never reached in the for-loop, so I substract 1 from it (1st line) and count down from count-1 to - including - 0. The loop does the same (just ascending), but yields the correct frame which mine does not.
Changing 4*ecx into 2*ecx (it's a short pointer) does not change anything.
How can I access the elements of short *block[8]?
Unfortunately I'm not able to debug that dll, so I hope somebody can help me out! Thanks a lot!
Cheers, Hannes
Hey that's my assembly present of today...use just WORD PTR [edx+4*ecx] to indicate that a short stands behing the adress!
Works with my test-procedure, lets go for the real thing!
Cheers, Hannes
Err, no. Doesn't work...
Maybe somebody has a clue?
Thanks! Cheers, Hannes
Is that a typo, or did you really put the subtraction in front of the loop instead of within it?
If your block contains WORDS which you want to load into EAX using a counter as a displacement into the block, try the following:
movzx eax,word ptr[edx+ecx*2]
Raymond
Thanks for your tips!
@tenkey: that's correct. The passed counter-variable is 1 too large, repeat-untilcxz decrements itself the cx-register.
@raymond: the pointers itself consume 4bytes of storage, right? Just the variable behind the pointer is a short, so I've to use 4*ecx?
However the main problem is this: if I pass block[comp] (do the loop in C++ and call the asm-procdure for every array element) and remove the repeat-loop everything works.
Just if I want to to do the loop within the asm-procedure - so I've to pass the base-adress &block[0] (in edx) - nothing works. I don't even access the 1st element correctly (I set the counter to 0 so I should get the 1st element only):
mov ecx, 0
.repeat
movzx eax, WORD PTR [edx+4*ecx]
movdqa [eax+0 ], xmm0;
movdqa [eax+16 ], xmm0;
movdqa [eax+32 ], xmm0;
movdqa [eax+48 ], xmm0;
movdqa [eax+64 ], xmm0;
movdqa [eax+80 ], xmm0;
movdqa [eax+96 ], xmm0;
movdqa [eax+112], xmm0;
.untilcxz
I've to [] once more, since the procedure is procedure(...,**short block) instead of procedure(...,*short block). Doing the same as DWORD doesn't help. However, this should be equivalent to
procedure(...,block[0])
with
pxor xmm0, xmm0;
movdqa [edx+0 ], xmm0;
movdqa [edx+16 ], xmm0;
movdqa [edx+32 ], xmm0;
movdqa [edx+48 ], xmm0;
movdqa [edx+64 ], xmm0;
movdqa [edx+80 ], xmm0;
movdqa [edx+96 ], xmm0;
movdqa [edx+112], xmm0;
but it isn't.
Makes me crazy!
Thanks for your help! Hannes
Boah, that one nearly made me crazy.
The solution is that simple it's a shame that it took so long.
.repeat
mov eax, DWORD PTR [edx+4*ecx-4]
movdqa [eax+0 ], xmm0;
movdqa [eax+16 ], xmm0;
movdqa [eax+32 ], xmm0;
...
@tenkey: yeah, the repeat did have a problem...it didn't loop with 0, so -4 has to be added (and the sub ecx, 1 to be left out).
@raymond: don't ask me why, but above code works. WORD PTR and/or 2*ecx gives just garbage (maybe because the pointers itself are 4 bytes, just the [eax+...] is of type short).
Thanks for your help! Cheers and have a nice evening, Hannes