News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Test Or Compare?

Started by Neil, May 04, 2009, 01:39:05 PM

Previous topic - Next topic

Neil

I have a piece of code :-

                     cmp eax,9
                     ja @F

if I change it to :-

                     test eax,9
                     ja @F

I get totally unpredictable results, am I missing something here?

dedndave

TEST sets the flags as if you had used AND, but does not modify the destination register
CMP sets the flags as if you had used SUB, but does not modify the destination register

TEST is logical, CMP is arithmetic

TEST eax,9
JA @F

it is important to note that logical instructions always clear the CARRY FLAG (CF)
that includes AND, OR, XOR, TEST (the NOT instruction alters no flags)
the JA instruction branches if the ZERO FLAG (ZF) is clear (i.e. not zero) and the CF is clear
normally, after a TEST instruction, a JZ or JNZ instruction is used
we know the CF is clear, so use a branch that does not look at it

Neil

thanks dedndave, my brain wasn't in gear :dazzled: it's the JA instruction that's causing the problem, I thought that test might be quicker than cmp (hope this doesn't start another war :P)

UtillMasm

what's these mean?

cmp eax,0
jl far 75550000

test eax,eax
jl far 75550000

dedndave

lol - those aren't really new wars - they have been going on since the forum existed, i think
i think TEST and CMP are the same, time-wise
choose the one that most aptly suits the application
is the CMP eax,9 an arithmetic test or a logical test?
if it is arithmetic, use CMP
this helps make code more legible

Neil

Just had a look at opcodes.chm in the help file & here's what it says:-

CMP reg,mem           486  2 clock cycles &  2 to 4 bytes size

TEST reg,rmem         486  1 clock cycle   &  2 to 4 bytes sze

So it appears that TEST REG,MEM is twice as fast As CMP REG,MEM.

I have a feeling that someone will prove me wrong :toothy

dedndave

lol - i dunno UtillMasm

there is no "far", that i know of

JL is a signed comparison
it branches if the OVERFLOW FLAG (OF) is not equal to the SIGN FLAG (SF)
TEST always clears the OF too, i think
so, after a TEST instruction, it is the same as JS - never see it used that way
after TEST eax,0, it would never branch

dedndave

hmmmm - good to know, Neil
i am new to 32-bit code, so i have to re-memorize all the dang timings
they were so much simpler for the 8088

oops - you grabbed different forms of the instructions ?
reg,mem
reg,rmem

you are comparing a register to an immediate value
reg,immed

jj2007

#8
Quote from: Neil on May 04, 2009, 02:17:38 PM
Just had a look at opcodes.chm in the help file & here's what it says:-

CMP reg,mem           486  2 clock cycles &  2 to 4 bytes size

TEST reg,rmem         486  1 clock cycle   &  2 to 4 bytes sze

So it appears that TEST REG,MEM is twice as fast As CMP REG,MEM.

I have a feeling that someone will prove me wrong :toothy

Nobody in this forum would do such a horrible thing :naughty:

But test yourself... :bg

.nolist
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm

LOOP_COUNT = 1000000

.code
start:
REPEAT 3
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT 10
cmp eax, 123
cmp ecx, 123
cmp edx, 123
cmp edi, 123
ENDM
counter_end
print str$(eax), 9, "cycles for 40*cmp", 13, 10

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT 10
test eax, 123
test ecx, 123
test edx, 123
test edi, 123
ENDM
counter_end
print str$(eax), 9, "cycles for 40*test ----", 13, 10

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT 100
cmp eax, 123
cmp ecx, 123
cmp edx, 123
cmp edi, 123
ENDM
counter_end
print str$(eax), 9, "cycles for 400*cmp", 13, 10

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT 100
test eax, 123
test ecx, 123
test edx, 123
test edi, 123
ENDM
counter_end
print str$(eax), 9, "cycles for 400*test", 13, 10, 10
ENDM

inkey "--- ok ---"
exit
end start


EDIT: Results for a Celeron M:
21      cycles for 40*cmp
15      cycles for 40*test reg, 123
21      cycles for 40*test reg, reg

262     cycles for 400*cmp
195     cycles for 400*test reg, 123
262     cycles for 400*test reg, reg


21      cycles for 40*cmp
15      cycles for 40*test reg, 123
21      cycles for 40*test reg, reg

262     cycles for 400*cmp
195     cycles for 400*test reg, 123
262     cycles for 400*test reg, reg


21      cycles for 40*cmp
15      cycles for 40*test reg, 123
21      cycles for 40*test reg, reg

262     cycles for 400*cmp
195     cycles for 400*test reg, 123
262     cycles for 400*test reg, reg

hutch--

Normally a test for zero is faster with the mnemonic TEST than CPM REG, 0 but from memory this varies with hardware. Most Intel processors are faster with TEST and Intel recommend using TEST for this purpose.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

mitchi

It's pretty much equally fast here on a recent Intel Processor (Intel E8500).

Neil

Thanks jj, all the tests have identical cycle counts, so I won't bother changing any of my code.

dedndave

UtillMasm is giving me a headache, Hutch
I want him censored

dedndave

for 0, i always used OR (or TEST) reg,reg
OR EAX,EAX - is this not faster than immediate ?
Neil is using 9, but inquiring minds want to know

FORTRANS

Quote from: hutch-- on May 04, 2009, 02:31:26 PM
Normally a test for zero is faster with the mnemonic TEST than CPM REG, 0 but from memory this varies with hardware. Most Intel processors are faster with TEST and Intel recommend using TEST for this purpose.

  For those that have fun debugging their code after seemingly
minor edits, note that:

       TEST    AX,AX
       JZ      @F

       CMP     AX,AX
       JZ      @F


have different results.  Almost as much fun as mixing up signed
and unsigned conditional jumps.

Cheers,

Steve N.