Hi,
I have following expression in C-code: (var & ~80000000)
OK, in assembler I would write:
mov eax, dword ptr var
test eax, not 80000000h
or
mov eax, dword ptr var
test eax, 7FFFFFFFh
But the Disassembly of the C-code shows me
mov eax, dword ptr var
btr eax, 1Fh
Please, could somebody explain me why btr is used and where the value 1Fh comes from ?
Regards
Greenhorn
Hi,
BTR is Bit Test and Reset. The 1F is the bit number to test.
So, test bit 31 (put it in the carry flag that you can test with JC)
and reset the bit.
Regards,
Steve
test leaves eax intact, btr doesn't.
Another reason might be that btr is 0.17 cycles faster, and one byte shorter :wink
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
146 cycles for 100*test
129 cycles for 100*btr
Quote from: Greenhorn__ on August 23, 2011, 07:31:52 PMOK, in assembler I would write:
mov eax, dword ptr var
test eax, 7FFFFFFFh
If
var is only used for testing, it is probably better to write:
test dword ptr var, 7FFFFFFFh
Quote from: qWord on August 23, 2011, 08:41:32 PM
If var is only used for testing, it is probably better to write:
test dword ptr var, 7FFFFFFFh
Half a cycle faster. It is a 10-byte instruction, by the way, but the
mov eax, var/test eax combi is also 2*5 bytes. If you care for size, use
bt var, 1Fh - 8 bytes, or
test byte ptr var+3, 7Fh - 7 bytes.
Quote from: jj2007 on August 23, 2011, 08:21:27 PM
test leaves eax intact, btr doesn't.
Hi,
But BTR is acting on EAX so var is left intact. A bit strange
none the less.
Regards,
Steve N.
Edit: Nevermind, EAX is changed, var is not, assuming it's not held in the register.
SHL EAX,1 and checking the zero flag would also work, and may be faster on some older architectures.
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
210 cycles for 100*test
874 cycles for 100*btr
210 cycles for 100*test
871 cycles for 100*btr
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
211 cycles for 100*test
873 cycles for 100*btr
206 cycles for 100*shl
211 cycles for 100*test
871 cycles for 100*btr
203 cycles for 100*shl
the compiler probably uses BTR so that successive operations may act on the result of the preceeding one
all-in-all, it's a good optimization...
smaller than the code you might have written :P
reasonably fast
allows chained operations such as those you might expect to see in compiled code
Quote from: clive on August 23, 2011, 09:36:51 PM
SHL EAX,1 and checking the zero flag would also work, and may be faster on some older architectures.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
146 cycles for 100*test
129 cycles for 100*btr
129 cycles for 100*bt
97 cycles for 100*test var
97 cycles for 100*bt var
97 cycles for 100*test byte var
103 cycles for 100*shl eax
Clive,
Pretty fast indeed. I am surprised how much slower is bt on the P4...!
By the way: Zero flag? opcodes.hlp: "The Carry Flag contains the last bit shifted out"
Running on a P3:
pre-P4 (SSE1)
147 cycles for 100*test
130 cycles for 100*btr
131 cycles for 100*bt
97 cycles for 100*test var
98 cycles for 100*bt var
97 cycles for 100*test byte var
97 cycles for 100*shl eax
146 cycles for 100*test
131 cycles for 100*btr
131 cycles for 100*bt
97 cycles for 100*test var
99 cycles for 100*bt var
97 cycles for 100*test byte var
97 cycles for 100*shl eax
Hi,
P-III, Windows 2000. Celeron M, Windows XP. P MMX, Windows 98.
Regards,
Steve N.
G:\WORK\TEMP>testorbt
pre-P4 (SSE1)
147 cycles for 100*test
131 cycles for 100*btr
131 cycles for 100*bt
98 cycles for 100*test var
99 cycles for 100*bt var
98 cycles for 100*test byte var
98 cycles for 100*shl eax
147 cycles for 100*test
131 cycles for 100*btr
131 cycles for 100*bt
98 cycles for 100*test var
99 cycles for 100*bt var
98 cycles for 100*test byte var
99 cycles for 100*shl eax
--- ok ---
Mobile Intel(R) Celeron(R) processor 600MHz (SSE2)
147 cycles for 100*test
130 cycles for 100*btr
129 cycles for 100*bt
96 cycles for 100*test var
97 cycles for 100*bt var
98 cycles for 100*test byte var
104 cycles for 100*shl eax
147 cycles for 100*test
131 cycles for 100*btr
129 cycles for 100*bt
99 cycles for 100*test var
98 cycles for 100*bt var
97 cycles for 100*test byte var
104 cycles for 100*shl eax
--- ok ---
pre-P4209 cycles for 100*test
928 cycles for 100*btr
617 cycles for 100*bt
307 cycles for 100*test var
519 cycles for 100*bt var
305 cycles for 100*test byte var
308 cycles for 100*shl eax
210 cycles for 100*test
928 cycles for 100*btr
615 cycles for 100*bt
305 cycles for 100*test var
512 cycles for 100*bt var
311 cycles for 100*test byte var
311 cycles for 100*shl eax
--- ok ---
Quote from: FORTRANS on August 23, 2011, 08:15:25 PM
Hi,
BTR is Bit Test and Reset. The 1F is the bit number to test.
So, test bit 31 (put it in the carry flag that you can test with JC)
and reset the bit.
Regards,
Steve
Ah, OK. Now I understand.
Thank you all very much, Ladies and Gentlemen.
I've read the AMD Manual about BTR but misunderstood the explanation of it. I thought that the carry flag is being resetted ... ::)
Now I don't know if my math skills are more worse than my english skills or vice versa.
But surely my assembler skills are more worse than the other both. :toothy
Oh, and yes, originally it's for an assignment and not for evaluation, sorry, my fault. But now I know the advantages in both cases, thanks.
Also thanks for the hint to SHL. ;)
Regards
Greenhorn
AMD Phenom(tm) II X4 970 Processor (SSE3)
95 cycles for 100*test
128 cycles for 100*btr
94 cycles for 100*test
128 cycles for 100*btr
--- ok ---