The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: Greenhorn__ on August 23, 2011, 07:31:52 PM

Title: A mathematical question
Post by: Greenhorn__ on August 23, 2011, 07:31:52 PM
Hi,

I have following expression in C-code: (var & ~80000000)
OK, in assembler I would write:

mov eax, dword ptr var
test eax, not 80000000h

or

mov eax, dword ptr var
test eax, 7FFFFFFFh


But the Disassembly of the C-code shows me

mov eax, dword ptr var
btr eax, 1Fh


Please, could somebody explain me why btr is used and where the value 1Fh comes from ?


Regards
Greenhorn
Title: Re: A mathematical question
Post by: FORTRANS on August 23, 2011, 08:15:25 PM
Hi,

   BTR is Bit Test and Reset.  The 1F is the bit number to test.
So, test bit 31 (put it in the carry flag that you can test with JC)
and reset the bit.

Regards,

Steve
Title: Re: A mathematical question
Post by: jj2007 on August 23, 2011, 08:21:27 PM
test leaves eax intact, btr doesn't.

Another reason might be that btr is 0.17 cycles faster, and one byte shorter :wink
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
146     cycles for 100*test
129     cycles for 100*btr
Title: Re: A mathematical question
Post by: qWord on August 23, 2011, 08:41:32 PM
Quote from: Greenhorn__ on August 23, 2011, 07:31:52 PMOK, in assembler I would write:

mov eax, dword ptr var
test eax, 7FFFFFFFh
If var is only used for testing, it is probably better to write:
test dword ptr var, 7FFFFFFFh
Title: Re: A mathematical question
Post by: jj2007 on August 23, 2011, 09:05:56 PM
Quote from: qWord on August 23, 2011, 08:41:32 PM
If var is only used for testing, it is probably better to write:
test dword ptr var, 7FFFFFFFh

Half a cycle faster. It is a 10-byte instruction, by the way, but the mov eax, var/test eax combi is also 2*5 bytes. If you care for size, use bt var, 1Fh - 8 bytes, or test byte ptr var+3, 7Fh - 7 bytes.
Title: Re: A mathematical question
Post by: FORTRANS on August 23, 2011, 09:20:19 PM
Quote from: jj2007 on August 23, 2011, 08:21:27 PM
test leaves eax intact, btr doesn't.

Hi,

   But BTR is acting on EAX so var is left intact.  A bit strange
none the less.

Regards,

Steve N.
Title: Re: A mathematical question
Post by: clive on August 23, 2011, 09:36:51 PM
Edit: Nevermind, EAX is changed, var is not, assuming it's not held in the register.

SHL EAX,1 and checking the zero flag would also work, and may be faster on some older architectures.

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
210     cycles for 100*test
874     cycles for 100*btr

210     cycles for 100*test
871     cycles for 100*btr


Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
211     cycles for 100*test
873     cycles for 100*btr
206     cycles for 100*shl

211     cycles for 100*test
871     cycles for 100*btr
203     cycles for 100*shl
Title: Re: A mathematical question
Post by: dedndave on August 23, 2011, 11:04:28 PM
the compiler probably uses BTR so that successive operations may act on the result of the preceeding one

all-in-all, it's a good optimization...
smaller than the code you might have written   :P
reasonably fast
allows chained operations such as those you might expect to see in compiled code
Title: Re: A mathematical question
Post by: jj2007 on August 24, 2011, 05:41:54 AM
Quote from: clive on August 23, 2011, 09:36:51 PM
SHL EAX,1 and checking the zero flag would also work, and may be faster on some older architectures.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
146     cycles for 100*test
129     cycles for 100*btr
129     cycles for 100*bt
97      cycles for 100*test var
97      cycles for 100*bt var
97      cycles for 100*test byte var
103     cycles for 100*shl eax


Clive,
Pretty fast indeed. I am surprised how much slower is bt on the P4...!
By the way: Zero flag? opcodes.hlp: "The Carry Flag contains the last bit shifted out"
Title: Re: A mathematical question
Post by: MichaelW on August 24, 2011, 06:02:16 AM
Running on a P3:

pre-P4 (SSE1)
147     cycles for 100*test
130     cycles for 100*btr
131     cycles for 100*bt
97      cycles for 100*test var
98      cycles for 100*bt var
97      cycles for 100*test byte var
97      cycles for 100*shl eax

146     cycles for 100*test
131     cycles for 100*btr
131     cycles for 100*bt
97      cycles for 100*test var
99      cycles for 100*bt var
97      cycles for 100*test byte var
97      cycles for 100*shl eax
Title: Re: A mathematical question
Post by: FORTRANS on August 24, 2011, 02:32:37 PM
Hi,

   P-III, Windows 2000.  Celeron M, Windows XP.  P MMX, Windows 98.

Regards,

Steve N.

G:\WORK\TEMP>testorbt
pre-P4 (SSE1)
147     cycles for 100*test
131     cycles for 100*btr
131     cycles for 100*bt
98      cycles for 100*test var
99      cycles for 100*bt var
98      cycles for 100*test byte var
98      cycles for 100*shl eax

147     cycles for 100*test
131     cycles for 100*btr
131     cycles for 100*bt
98      cycles for 100*test var
99      cycles for 100*bt var
98      cycles for 100*test byte var
99      cycles for 100*shl eax


--- ok ---

Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
147 cycles for 100*test
130 cycles for 100*btr
129 cycles for 100*bt
96 cycles for 100*test var
97 cycles for 100*bt var
98 cycles for 100*test byte var
104 cycles for 100*shl eax

147 cycles for 100*test
131 cycles for 100*btr
129 cycles for 100*bt
99 cycles for 100*test var
98 cycles for 100*bt var
97 cycles for 100*test byte var
104 cycles for 100*shl eax


--- ok ---

pre-P4209 cycles for 100*test
928 cycles for 100*btr
617 cycles for 100*bt
307 cycles for 100*test var
519 cycles for 100*bt var
305 cycles for 100*test byte var
308 cycles for 100*shl eax

210 cycles for 100*test
928 cycles for 100*btr
615 cycles for 100*bt
305 cycles for 100*test var
512 cycles for 100*bt var
311 cycles for 100*test byte var
311 cycles for 100*shl eax


--- ok ---
Title: Re: A mathematical question
Post by: Greenhorn__ on August 24, 2011, 04:45:54 PM
Quote from: FORTRANS on August 23, 2011, 08:15:25 PM
Hi,

   BTR is Bit Test and Reset.  The 1F is the bit number to test.
So, test bit 31 (put it in the carry flag that you can test with JC)
and reset the bit.

Regards,

Steve

Ah, OK. Now I understand.
Thank you all very much, Ladies and Gentlemen.

I've read the AMD Manual about BTR but misunderstood the explanation of it. I thought that the carry flag is being resetted ...  ::)
Now I don't know if my math skills are more worse than my english skills or vice versa.
But surely my assembler skills are more worse than the other both.  :toothy

Oh, and yes, originally it's for an assignment and not for evaluation, sorry, my fault. But now I know the advantages in both cases, thanks.
Also thanks for the hint to SHL. ;)


Regards
Greenhorn

AMD Phenom(tm) II X4 970 Processor (SSE3)
95 cycles for 100*test
128 cycles for 100*btr

94 cycles for 100*test
128 cycles for 100*btr


--- ok ---