News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Test eax,eax

Started by Farabi, August 05, 2008, 09:25:46 AM

Previous topic - Next topic

hutch--

 :bg

I can safely say I never use the "uses" notation, it adds code where I don't want it. RE using TEST, as everyone has said in here, TEST is an AND operation where the results are not written to the register, only to effected flags and that is what makes it a TEST rather than an AND. In the middle of an intensive algo it does make a difference, not only in being faster but not having to reset the effected register if it needs to be retained.

Outside of intensive algorithms it hardly matters,


.if eax == 0
  ; do something
.endif


In high level code like API grinding this does the job fine but at an algorithm level where timing is critical, its trash.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on August 06, 2008, 02:30:33 PM
I can safely say I never use the "uses" notation, it adds code where I don't want it.
Agreed :bg
Quote

.if eax == 0
  ; do something
.endif


In high level code like API grinding this does the job fine but at an algorithm level where timing is critical, its trash.

Trash? I am deeply disappointed that you talk so badly about high level constructs. Ok, I have admit to admit that .if eax==0, as shown in the disassembly below, ruthlessly destroys the valuable contents of eax by overwriting it with its current values, but that is, as far as I can see, the only difference to the low level construct. So who volunteers to demonstrate that or eax, eax, apart from being cruel and destructive, is also slower than test eax, eax?

test eax, eax
jne @F
push eax
pop eax
@@: nop

.if eax==0
push eax
pop eax
.endif
nop

Address   Hex dump    Command
00401006  85C0         test eax, eax
00401008  75 02        jne short Test_And.0040100C
0040100A  50             push eax
0040100B  58             pop eax
0040100C  90             nop

0040100D  0BC0         or eax, eax
0040100F  75 02       jne short Test_And.00401013
00401011  50             push eax
00401012  58             pop eax
00401013  90             nop

PBrennick

JJ,
Sometimes I think you start an argument to no purpose except to post. As has already been stated here - listen closely, TEST does a compare without a write; or ALWAYS does a write so obviously, since it is doing another operation it HAS to be slower.

BTW, in this case, argument is the same as discussion and both are nondestructive  even though a lot of writing gets done.  :bg

-- Paul
The GeneSys Project is available from:
The Repository or My crappy website

jj2007

Quote from: PBrennick on August 06, 2008, 05:05:05 PM
JJ,
Sometimes I think you start an argument to no purpose except to post. As has already been stated here - listen closely, TEST does a compare without a write; or ALWAYS does a write so obviously, since it is doing another operation it HAS to be slower.

BTW, in this case, argument is the same as discussion and both are nondestructive  even though a lot of writing gets done.  :bg

-- Paul


Paul, all I have done is to ask whether it does matter, in times of speed. I cannot convince Michael W's macro to yield anything else than ZERO cycles - for both variants.

0 cycles for test eax, eax
0 cycles for or eax, eax

[attachment deleted by admin]

PBrennick

JJ,
It was nothing deeply personal, just an observation. About your test revealing 0 cycles for both instructions, obviously that is NOT the correct result for EITHER instruction so probably his macro is masking the true result because you do not have enough instructions to reveal it. EVERY instruction uses a certain amount of CPU instructions. Don't you realize this? Even a NOP uses cycles.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

PBrennick

JJ,

Quote
OR - Inclusive Logical OR
        Usage:  OR      dest,src
        Modifies flags: CF OF PF SF ZF (AF undefined)
        Logical inclusive OR of the two operands returning the result in
        the destination.  Any bit set in either operand will be set in the
        destination.
                                 Clocks                 Size
        Operands         808x  286   386   486          Bytes
        reg,reg           3     2     2     1             2

        mem,reg         16+EA   7     7     3            2-4  (W88=24+EA)
        reg,mem          9+EA   7     6     2            2-4  (W88=13+EA)
        reg,immed         4     3     2     1            3-4
        mem8,immed8     17+EA   7     7     3            3-6

        mem16,immed16   25+EA   7     7     3            3-6
        accum,immed       4     3     2     1            2-3

        0C ib OR AL, imm8 AL OR imm8
        0D iw OR AX, imm16 AX OR imm16
        0D id OR EAX, imm32 EAX OR imm32
        80 /1 ib OR r/m8,imm8 r/m8 OR imm8

        81 /1 iw OR r/m16,imm16 r/m16 OR imm16
        81 /1 id OR r/m32,imm32 r/m32 OR imm32
        83 /1 ib OR r/m16,imm8 r/m16 OR imm8 (sign-extended)
        83 /1 ib OR r/m32,imm8 r/m32 OR imm8 (sign-extended)
        08 / r OR r/m8,r8 r/m8 OR r8

        09 / r OR r/m16,r16 r/m16 OR r16
        09 / r OR r/m32,r32 r/m32 OR r32
        0A / r OR r8,r/m8 r8 OR r/m8
        0B / r OR r16,r/m16 r16 OR r/m16
        0B / r OR r32,r/m32 r32 OR r/m32

TEST - Test For Bit Pattern
        Usage:  TEST    dest,src
        Modifies flags: CF OF PF SF ZF (AF undefined)
        Performs a logical AND of the two operands updating the flags
        register without saving the result.
                                 Clocks                 Size
        Operands         808x  286   386   486          Bytes
        reg,reg           3     2     1     1             2

        reg,mem          9+EA   6     5     1            2-4  (W88=13+EA)
        mem,reg          9+EA   6     5     2            2-4  (W88=13+EA)
        reg,immed         5     3     2     1            3-4
        mem,immed       11+EA   6     5     2            3-6

        accum,immed       4     3     2     1            2-3

        A8 ib TEST AL, imm8 AND imm8 with AL; set SF, ZF, PF according to result
        A9 iw TEST AX, imm16 AND imm16 with AX; set SF, ZF, PF according to result
        A9 id TEST EAX, imm32 AND imm32 with EAX; set SF, ZF, PF according to result
        F6 /0 ib TEST r/m8,imm8 AND imm8 with r/m8; set SF, ZF, PF according to result

        F7 /0 iw TEST r/m16,imm16 AND imm16 with r/m16; set SF, ZF, PF according to result
        F7 /0 id TEST r/m32,imm32 AND imm32 with r/m32; set SF, ZF, PF according to result
        84 / r TEST r/m8,r8 AND r8 with r/m8; set SF, ZF, PF according to result
        85 / r TEST r/m16,r16 AND r16 with r/m16; set SF, ZF, PF according to result

        85 / r TEST r/m32,r32 AND r32 with r/m32; set SF, ZF, PF according to result

So, you can see when it matters and when it does not. Whenever you have a question like this, my advice is not to be too dependant on software as it can AFFECT the problem. GO to the reference material and live with the yeilded knowledge.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

jj2007

Quote from: PBrennick on August 06, 2008, 07:46:01 PM
About your test revealing 0 cycles for both instructions, obviously that is NOT the correct result for EITHER instruction so probably his macro is masking the true result because you do not have enough instructions to reveal it. EVERY instruction uses a certain amount of CPU instructions.
Of course, you are right. My only question is: So what? Does it matter, if an algo that was hammered to death by Michael, Lingo, Hutch and other champions runs at 155.5 instead of 156 cycles? At the expense that speed-hungry newbies start losing their precious time with Jxx? Re your last post: Do you really believe I did not study opcodes.hlp? By the way, it says 1 cycle for test and or, for reg/reg on a 486, which is probably outdated. In practice, or eax, eax is indeed a tiny little bit slower.

EDIT: And I cannot even confirm the last sentence...
204 ms for test eax, eax
192 ms for or eax, eax

... for a lousy 1,000,000,000 loops.

Code is attached.

[attachment deleted by admin]

Bill Cravener

QuoteDoes it matter, if an algo that was hammered to death by Michael, Lingo, Hutch and other champions runs at 155.5 instead of 156 cycles?

Not really, but its sure fun to watch! 
My MASM32 Examples.

"Prejudice does not arise from low intelligence it arises from conservative ideals to which people of low intelligence are drawn." ~ Isaidthat

Mark_Larson

  I don't use destructive forms either. 

  I also use ESI and other registers without saving them for speed.

  I also CHEAT and use MMX registers to temporarily save register values.  I don't ever do MMX programming.  I always use SSE/SSE2.  So I can actually save it in the beginning of the program if need be.


movd  MM0,ESP

;do code here

movd  ESP,MM0

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

hutch--

Mark,

Have you been known to use GOTO as well.  :bg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

PBrennick

Mark,
You know, you can be around programming and programmers for years and find that there is always 'something new under the sun.'

I hardly ever use mmx instructions, myself (the Calculator being the only exception), but it absolutely, as far as I can recall, ever occurred to me to make use of the unused mmx registers. I must play with that.

JJ,
Your mention of whether it hardly matters about a few cycles is the doom of all optiimizers who do not consider how many occurences there may be of such an instruction. Also, having taught programming, I insist to say that it is just plain, poor programming practice. It is code like that that gives an optimizer on a board like this a piece of cake. Consider if that instruction was in a message loop?

-- Paul
The GeneSys Project is available from:
The Repository or My crappy website

Mark Jones

Well, while most of us will never reach Lingo's level of optimization skill, :wink, every little "trick" shaves off cycles. And for most code, optimization is pointless nowadays. But, instances always arise when the fastest or smallest code is necessary, and these various optimizations come into play. As assembly programmers, we all know this, and use the various "tricks" we've learned over the years to write exceptional code.

Why on Earth we're arguing about something so trivial is beyond me... :lol

Quote from: AMD x2 x64 dual-code AM2 4000+ (WinXP x32)151 ms for test eax, eax
177 ms for or eax, eax

TEST is definitely faster on AMD. Was faster on an AMD Athlon XP 2500+ also.
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

PBrennick

Mark,
I agree with the triviallity of it all and tried to reach him about it; but, I think he loves these little discussions. I even posted from the INTEL specifications but it was to no avail.

At this point it is fun, but tomorrow and another topic...

-- Paul
The GeneSys Project is available from:
The Repository or My crappy website

Mark_Larson

Quote from: hutch-- on August 07, 2008, 02:01:51 PM
Mark,

Have you been known to use GOTO as well.  :bg

hehee.  Yes I have.  I have no qualms about doing anything to get the fastest code  :bg
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Mark_Larson

Quote from: PBrennick on August 07, 2008, 03:11:21 PM
Mark,
You know, you can be around programming and programmers for years and find that there is always 'something new under the sun.'

I hardly ever use mmx instructions, myself (the Calculator being the only exception), but it absolutely, as far as I can recall, ever occurred to me to make use of the unused mmx registers. I must play with that.

Yepper. That is why I still read the boards even though I might not post due to being too  busy.  I picked up a trick from JJ the other day.  Using ffree st(7)  in place of finit. 

I have the tip I gave you above plus 60 others on my optimization website.  The trick I just told you is actually trick #1 on the webpage.

   http://www.mark.masmcode.com/

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm