News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

A mathematical question

Started by Greenhorn__, August 23, 2011, 07:31:52 PM

Previous topic - Next topic

Greenhorn__

Hi,

I have following expression in C-code: (var & ~80000000)
OK, in assembler I would write:

mov eax, dword ptr var
test eax, not 80000000h

or

mov eax, dword ptr var
test eax, 7FFFFFFFh


But the Disassembly of the C-code shows me

mov eax, dword ptr var
btr eax, 1Fh


Please, could somebody explain me why btr is used and where the value 1Fh comes from ?


Regards
Greenhorn
You can fool some of the people all of the time, and all of the people some of the time, but you can not fool all of the people all of the time.
(Abraham Lincoln)

FORTRANS

Hi,

   BTR is Bit Test and Reset.  The 1F is the bit number to test.
So, test bit 31 (put it in the carry flag that you can test with JC)
and reset the bit.

Regards,

Steve

jj2007

test leaves eax intact, btr doesn't.

Another reason might be that btr is 0.17 cycles faster, and one byte shorter :wink
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
146     cycles for 100*test
129     cycles for 100*btr

qWord

Quote from: Greenhorn__ on August 23, 2011, 07:31:52 PMOK, in assembler I would write:

mov eax, dword ptr var
test eax, 7FFFFFFFh
If var is only used for testing, it is probably better to write:
test dword ptr var, 7FFFFFFFh
FPU in a trice: SmplMath
It's that simple!

jj2007

Quote from: qWord on August 23, 2011, 08:41:32 PM
If var is only used for testing, it is probably better to write:
test dword ptr var, 7FFFFFFFh

Half a cycle faster. It is a 10-byte instruction, by the way, but the mov eax, var/test eax combi is also 2*5 bytes. If you care for size, use bt var, 1Fh - 8 bytes, or test byte ptr var+3, 7Fh - 7 bytes.

FORTRANS

Quote from: jj2007 on August 23, 2011, 08:21:27 PM
test leaves eax intact, btr doesn't.

Hi,

   But BTR is acting on EAX so var is left intact.  A bit strange
none the less.

Regards,

Steve N.

clive

Edit: Nevermind, EAX is changed, var is not, assuming it's not held in the register.

SHL EAX,1 and checking the zero flag would also work, and may be faster on some older architectures.

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
210     cycles for 100*test
874     cycles for 100*btr

210     cycles for 100*test
871     cycles for 100*btr


Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
211     cycles for 100*test
873     cycles for 100*btr
206     cycles for 100*shl

211     cycles for 100*test
871     cycles for 100*btr
203     cycles for 100*shl
It could be a random act of randomness. Those happen a lot as well.

dedndave

the compiler probably uses BTR so that successive operations may act on the result of the preceeding one

all-in-all, it's a good optimization...
smaller than the code you might have written   :P
reasonably fast
allows chained operations such as those you might expect to see in compiled code

jj2007

Quote from: clive on August 23, 2011, 09:36:51 PM
SHL EAX,1 and checking the zero flag would also work, and may be faster on some older architectures.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
146     cycles for 100*test
129     cycles for 100*btr
129     cycles for 100*bt
97      cycles for 100*test var
97      cycles for 100*bt var
97      cycles for 100*test byte var
103     cycles for 100*shl eax


Clive,
Pretty fast indeed. I am surprised how much slower is bt on the P4...!
By the way: Zero flag? opcodes.hlp: "The Carry Flag contains the last bit shifted out"

MichaelW

Running on a P3:

pre-P4 (SSE1)
147     cycles for 100*test
130     cycles for 100*btr
131     cycles for 100*bt
97      cycles for 100*test var
98      cycles for 100*bt var
97      cycles for 100*test byte var
97      cycles for 100*shl eax

146     cycles for 100*test
131     cycles for 100*btr
131     cycles for 100*bt
97      cycles for 100*test var
99      cycles for 100*bt var
97      cycles for 100*test byte var
97      cycles for 100*shl eax
eschew obfuscation

FORTRANS

Hi,

   P-III, Windows 2000.  Celeron M, Windows XP.  P MMX, Windows 98.

Regards,

Steve N.

G:\WORK\TEMP>testorbt
pre-P4 (SSE1)
147     cycles for 100*test
131     cycles for 100*btr
131     cycles for 100*bt
98      cycles for 100*test var
99      cycles for 100*bt var
98      cycles for 100*test byte var
98      cycles for 100*shl eax

147     cycles for 100*test
131     cycles for 100*btr
131     cycles for 100*bt
98      cycles for 100*test var
99      cycles for 100*bt var
98      cycles for 100*test byte var
99      cycles for 100*shl eax


--- ok ---

Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
147 cycles for 100*test
130 cycles for 100*btr
129 cycles for 100*bt
96 cycles for 100*test var
97 cycles for 100*bt var
98 cycles for 100*test byte var
104 cycles for 100*shl eax

147 cycles for 100*test
131 cycles for 100*btr
129 cycles for 100*bt
99 cycles for 100*test var
98 cycles for 100*bt var
97 cycles for 100*test byte var
104 cycles for 100*shl eax


--- ok ---

pre-P4209 cycles for 100*test
928 cycles for 100*btr
617 cycles for 100*bt
307 cycles for 100*test var
519 cycles for 100*bt var
305 cycles for 100*test byte var
308 cycles for 100*shl eax

210 cycles for 100*test
928 cycles for 100*btr
615 cycles for 100*bt
305 cycles for 100*test var
512 cycles for 100*bt var
311 cycles for 100*test byte var
311 cycles for 100*shl eax


--- ok ---

Greenhorn__

Quote from: FORTRANS on August 23, 2011, 08:15:25 PM
Hi,

   BTR is Bit Test and Reset.  The 1F is the bit number to test.
So, test bit 31 (put it in the carry flag that you can test with JC)
and reset the bit.

Regards,

Steve

Ah, OK. Now I understand.
Thank you all very much, Ladies and Gentlemen.

I've read the AMD Manual about BTR but misunderstood the explanation of it. I thought that the carry flag is being resetted ...  ::)
Now I don't know if my math skills are more worse than my english skills or vice versa.
But surely my assembler skills are more worse than the other both.  :toothy

Oh, and yes, originally it's for an assignment and not for evaluation, sorry, my fault. But now I know the advantages in both cases, thanks.
Also thanks for the hint to SHL. ;)


Regards
Greenhorn

AMD Phenom(tm) II X4 970 Processor (SSE3)
95 cycles for 100*test
128 cycles for 100*btr

94 cycles for 100*test
128 cycles for 100*btr


--- ok ---
You can fool some of the people all of the time, and all of the people some of the time, but you can not fool all of the people all of the time.
(Abraham Lincoln)