News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

short replacement for c shift does not work

Started by a_h, July 09, 2005, 03:37:35 PM

Previous topic - Next topic

a_h

Hi!

Being new to the forum, I first want to say hi folks! Nice that you provide us beginners such a nice place to ask questions!

Coming to my current problem. I want to write fast (masm32) MMX inline assembly for following C code (within VisualC++):

-----------------------
(CurrentBfr << (32 - BitsLeft)) >> (32 - N);
-----------------------

(all variables are unsigned ints)

So I thought using psrlq and psllq would be quite sufficient for a first start:

----------------------------------------------------------------------
unsigned int CBfr=CurrentBfr, BLeft=BitsLeft;
__asm{
      pxor   mm0, mm0
      mov   eax, BLeft
      movd   mm0, CBfr
      mov   ebx, 32
      sub   ebx, eax
      mov   eax, 32
      mov   ecx, N
      sub   eax, ecx
      psllq   mm0, bl
      psrlq   mm0, al
      movd   CBfr, mm0
}__asm emms;
------------------------------------------------------------

So this doesn't work at all. Maybe you see it immediately what's wrong here, but I've no idea (except that different truncation maybe the root for some problems).

However, maybe you can help me with some additional questions too:

-As you see I map the global variables CurrentBfr and BitsLeft to local ones, since I get errors accessing the globals from within the inline assembly of the function. Is there a different possiblity (I'm using visualstudio with the processor pack)?

-When do I need to pxor/xor a register? A mov 1, eax writes the 1 in every case into eax, doesn't it?

-If I use edx (instead of again eax) for the 2nd 32, why do I get an access violation?

I hope somebody takes the time to help me a bit out here - thanks a lot beforehand!

Cheers, Hannes

roticv

You can just use normal instruction and don't even need MMX for your code. In order to maxmise the efficiency and apply MMX correctly, you should be doing stuff like 8 bytes/ 4 words/ 2 dwords at a time.

If not, stick to normal instructions

a_h

Thanks for your reply!

I know shr/shl; but I want to optimize for the P4, hence I want to use the MMX versions which are said to be way faster.

Cheers, Hannes

roticv

MMX instruction are supposed to be used with parallelism for its powers to be exploited.

For example, you can read 4 pixels and process it in one go using mmx if each pixel takes up 1 byte. In your example, such parallelism does not exist, hence I think it is useless and wasteful to use mmx for such situtations.

a_h

Yeah, I know that MMX is mainly used for packed data. However I was told that shifting via mmx was faster than the usual shift. I just tried it, it isn't.

Besides: in above code the shift instructions psllq/psrlq need mmx registers (instead of al,bl) for the shiftcounter; that was the bug that prevented the hole from working.

Cheers, Hannes

roticv

Btw even if you are using shl/shr you have to use cl for the shift and not al or bl. Sorry for not looking hard enough at your code btw.  :toothy