The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: a_h on July 09, 2005, 03:37:35 PM

Title: short replacement for c shift does not work
Post by: a_h on July 09, 2005, 03:37:35 PM
Hi!

Being new to the forum, I first want to say hi folks! Nice that you provide us beginners such a nice place to ask questions!

Coming to my current problem. I want to write fast (masm32) MMX inline assembly for following C code (within VisualC++):

-----------------------
(CurrentBfr << (32 - BitsLeft)) >> (32 - N);
-----------------------

(all variables are unsigned ints)

So I thought using psrlq and psllq would be quite sufficient for a first start:

----------------------------------------------------------------------
unsigned int CBfr=CurrentBfr, BLeft=BitsLeft;
__asm{
      pxor   mm0, mm0
      mov   eax, BLeft
      movd   mm0, CBfr
      mov   ebx, 32
      sub   ebx, eax
      mov   eax, 32
      mov   ecx, N
      sub   eax, ecx
      psllq   mm0, bl
      psrlq   mm0, al
      movd   CBfr, mm0
}__asm emms;
------------------------------------------------------------

So this doesn't work at all. Maybe you see it immediately what's wrong here, but I've no idea (except that different truncation maybe the root for some problems).

However, maybe you can help me with some additional questions too:

-As you see I map the global variables CurrentBfr and BitsLeft to local ones, since I get errors accessing the globals from within the inline assembly of the function. Is there a different possiblity (I'm using visualstudio with the processor pack)?

-When do I need to pxor/xor a register? A mov 1, eax writes the 1 in every case into eax, doesn't it?

-If I use edx (instead of again eax) for the 2nd 32, why do I get an access violation?

I hope somebody takes the time to help me a bit out here - thanks a lot beforehand!

Cheers, Hannes
Title: Re: short replacement for c shift does not work
Post by: roticv on July 09, 2005, 05:41:17 PM
You can just use normal instruction and don't even need MMX for your code. In order to maxmise the efficiency and apply MMX correctly, you should be doing stuff like 8 bytes/ 4 words/ 2 dwords at a time.

If not, stick to normal instructions
Title: Re: short replacement for c shift does not work
Post by: a_h on July 09, 2005, 06:06:10 PM
Thanks for your reply!

I know shr/shl; but I want to optimize for the P4, hence I want to use the MMX versions which are said to be way faster.

Cheers, Hannes
Title: Re: short replacement for c shift does not work
Post by: roticv on July 10, 2005, 01:51:58 AM
MMX instruction are supposed to be used with parallelism for its powers to be exploited.

For example, you can read 4 pixels and process it in one go using mmx if each pixel takes up 1 byte. In your example, such parallelism does not exist, hence I think it is useless and wasteful to use mmx for such situtations.
Title: Re: short replacement for c shift does not work
Post by: a_h on July 10, 2005, 01:07:59 PM
Yeah, I know that MMX is mainly used for packed data. However I was told that shifting via mmx was faster than the usual shift. I just tried it, it isn't.

Besides: in above code the shift instructions psllq/psrlq need mmx registers (instead of al,bl) for the shiftcounter; that was the bug that prevented the hole from working.

Cheers, Hannes
Title: Re: short replacement for c shift does not work
Post by: roticv on July 10, 2005, 02:01:53 PM
Btw even if you are using shl/shr you have to use cl for the shift and not al or bl. Sorry for not looking hard enough at your code btw.  :toothy
Title: Re: short replacement for c shift does not work
Post by: a_h on July 13, 2005, 03:49:30 PM
Thanks for your help!
Title: Re: short replacement for c shift does not work
Post by: roticv on July 13, 2005, 03:53:53 PM
You are welcome  :U