What is test eax,eax mean? And what is it used for?
Hi Onan,
test eax,eax is used to check if
eax is equal to 0. If this is the case then the
ZERO flag is set to
TRUE.
An example :
invoke VirtualAlloc,0,eax,MEM_COMMIT,PAGE_READWRITE ; allocate memory
test eax,eax
jnz @f
If VirtualAlloc fails, it returns 0. If the value of eax is not zero then the ZERO flag is set to FALSE. This means that the code will jump to the nearest forward label.
From \masm32\help\opcodes.hlp :
QuoteTEST - Test For Bit Pattern
Usage: TEST dest,src
Modifies flags: CF OF PF SF ZF (AF undefined)
Performs a logical AND of the two operands updating the flags
register without saving the result.
TEST is Only a little different from AND.
Both of you gentlemen make correct points but the most important point is that, while AND is a destructive test; Test is not. This means, if you use AND, the destination is modified and the flags set accordingly. If you use TEST, the flags are modified accordingly but the destination variable (and the source) are not modified. TEST does an AND of the 2 variables but does not change them, in other words.
-- Paul
Quote from: PBrennick on August 05, 2008, 01:29:47 PM
while AND is a destructive test; Test is not
Another non-destructive test is
or eax, eax
Timings may differ on the various platforms but OR EAX,EAX and TEST EAX,EAX are generally much faster than CMP EAX,0 or similar.
Likewise, if it is necessary to perform many tests, it may be even faster to use a look-up table. There should be many posts here describing how to use a lookup table (see the XLAT instruction.)
Thanks, I understad it now.
Quote from: jj2007 on August 05, 2008, 03:10:18 PM
Quote from: PBrennick on August 05, 2008, 01:29:47 PM
while AND is a destructive test; Test is not
Another non-destructive test is or eax, eax
"
or eax,eax" is as much a destructive test as the "
and eax,eax".
They both entail writing the result to the register (even though there is no change in these cases) while test does not write to any general purpose register.
Also remember that TEST does not only set the zero flag, it will set any of the flags for example if you need to know if the number is negative or if you use it with a constant you can check the value of a bit in eax, eg...
test EAX, 1
jnz @F ; << jump if bit 0 is set
this would be equivalent to the BT opcode, except it can check multiple bits for example TEST EAX, 6 will result in a zero flag if either bit 1 or bit 2 is not set. It is a very useful opcode and a relatively fast one.
Donkey
TEST is a very useful mnemonic, test any bit and not change the value in the register.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
mov edx, 00000000000000010000000000000000b ; set bit 17
test edx, 00000000000000010000000000000000b ; test bit 17
jz notset
print "Bit 17 is set",13,10
jmp quit
notset:
print "Bit 17 is NOT set",13,10
quit:
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
Quote from: raymond on August 06, 2008, 03:05:25 AM
"or eax,eax" is as much a destructive test as the "and eax,eax".
They both entail writing the result to the register (even though there is no change in these cases) while test does not write to any general purpose register.
What is your definition of "destructive"?
"test" does a virtual 'and' on the destination operand with the source operand, but does not write the result back.
"and" does an 'and' on the destination operand with the source operand, and writes the result back to the destination operand.
"or" does an 'or' on the destination operand with the source operand, and writes the result back to the destination operand.
(a related example is 'cmp' and 'sub' -- cmp does a virtual sub, but doesn't write the result back to the destination.)
If you "and" or "or" a value with itself, the result will be the original value (with only the flags being modified), but these instructions are 'destructive' in that they write the result back to the destination operand. They may write the exact same value back, but they still essentially destroy the previous value (it's modified, even if not changed.)
Farabi: "test eax,eax,", "and eax,eax", "or eax,eax", "cmp eax,0" are all just ways of checking if eax == 0. They don't change the value in eax, but if the result of the operation is 0, then the zero-flag is set.
If you 'and' a number with itself, you will get the same number as the result, and the zero-flag will be set if the result of an operation (i.e. that number) is zero. And the same for if you 'or' a number with itself.
"cmp" also works the same -- if you subtract 0 from a number, the result will be the same, and the zero-flag will be set if that result is zero (which must mean the original number was zero: 0 - 0 = 0).
Ok, what you write is technically and philosophically correct, but let me put it more bluntly: Does it matter? Will the CPU police put me in jail if I destroy the valuable contents of eax with or eax, eax? Will the OS police hit me over the head if I insert a mov esi, esi in my callback procedure although I forgot uses esi?
QuoteWill the OS police hit me over the head if I insert a mov esi, esi in my callback procedure although I forgot uses esi?
This has nothing to do with the fact that "or" is destructive.
(BTW, I very often use the esi, edi, ebx (and even the ebp if I need it) registers without saving them
IF they don't need to be saved. :clap:)
Quote from: jj2007 on August 06, 2008, 11:55:00 AM
Ok, what you write is technically and philosophically correct, but let me put it more bluntly: Does it matter?
Obviously, no - not really.
However, one could argue that's it's more efficient, since writing back a value requires an extra micro-op that could really be skipped otherwise.
Quote
Will the CPU police put me in jail if I destroy the valuable contents of eax with or eax, eax? Will the OS police hit me over the head if I insert a mov esi, esi in my callback procedure although I forgot uses esi?
We can always hope, my friend :bdg
:bg
I can safely say I never use the "uses" notation, it adds code where I don't want it. RE using TEST, as everyone has said in here, TEST is an AND operation where the results are not written to the register, only to effected flags and that is what makes it a TEST rather than an AND. In the middle of an intensive algo it does make a difference, not only in being faster but not having to reset the effected register if it needs to be retained.
Outside of intensive algorithms it hardly matters,
.if eax == 0
; do something
.endif
In high level code like API grinding this does the job fine but at an algorithm level where timing is critical, its trash.
Quote from: hutch-- on August 06, 2008, 02:30:33 PM
I can safely say I never use the "uses" notation, it adds code where I don't want it.
Agreed :bg
Quote
.if eax == 0
; do something
.endif
In high level code like API grinding this does the job fine but at an algorithm level where timing is critical, its trash.
Trash? I am deeply disappointed that you talk so badly about high level constructs. Ok, I have admit to admit that
.if eax==0, as shown in the disassembly below, ruthlessly destroys the valuable contents of eax by overwriting it with its current values, but that is, as far as I can see, the only difference to the low level construct. So who volunteers to demonstrate that
or eax, eax, apart from being cruel and destructive, is also slower than
test eax, eax?
test eax, eax
jne @F
push eax
pop eax
@@: nop
.if eax==0
push eax
pop eax
.endif
nop
Address Hex dump Command
00401006 85C0 test eax, eax
00401008 75 02 jne short Test_And.0040100C
0040100A 50 push eax
0040100B 58 pop eax
0040100C 90 nop
0040100D 0BC0 or eax, eax
0040100F 75 02 jne short Test_And.00401013
00401011 50 push eax
00401012 58 pop eax
00401013 90 nop
JJ,
Sometimes I think you start an argument to no purpose except to post. As has already been stated here - listen closely, TEST does a compare without a write; or ALWAYS does a write so obviously, since it is doing another operation it HAS to be slower.
BTW, in this case, argument is the same as discussion and both are nondestructive even though a lot of writing gets done. :bg
-- Paul
Quote from: PBrennick on August 06, 2008, 05:05:05 PM
JJ,
Sometimes I think you start an argument to no purpose except to post. As has already been stated here - listen closely, TEST does a compare without a write; or ALWAYS does a write so obviously, since it is doing another operation it HAS to be slower.
BTW, in this case, argument is the same as discussion and both are nondestructive even though a lot of writing gets done. :bg
-- Paul
Paul, all I have done is to ask whether it does
matter, in times of speed. I cannot convince Michael W's macro to yield anything else than ZERO cycles - for both variants.
0 cycles for test eax, eax
0 cycles for or eax, eax
[attachment deleted by admin]
JJ,
It was nothing deeply personal, just an observation. About your test revealing 0 cycles for both instructions, obviously that is NOT the correct result for EITHER instruction so probably his macro is masking the true result because you do not have enough instructions to reveal it. EVERY instruction uses a certain amount of CPU instructions. Don't you realize this? Even a NOP uses cycles.
Paul
JJ,
Quote
OR - Inclusive Logical OR
Usage: OR dest,src
Modifies flags: CF OF PF SF ZF (AF undefined)
Logical inclusive OR of the two operands returning the result in
the destination. Any bit set in either operand will be set in the
destination.
Clocks Size
Operands 808x 286 386 486 Bytes
reg,reg 3 2 2 1 2
mem,reg 16+EA 7 7 3 2-4 (W88=24+EA)
reg,mem 9+EA 7 6 2 2-4 (W88=13+EA)
reg,immed 4 3 2 1 3-4
mem8,immed8 17+EA 7 7 3 3-6
mem16,immed16 25+EA 7 7 3 3-6
accum,immed 4 3 2 1 2-3
0C ib OR AL, imm8 AL OR imm8
0D iw OR AX, imm16 AX OR imm16
0D id OR EAX, imm32 EAX OR imm32
80 /1 ib OR r/m8,imm8 r/m8 OR imm8
81 /1 iw OR r/m16,imm16 r/m16 OR imm16
81 /1 id OR r/m32,imm32 r/m32 OR imm32
83 /1 ib OR r/m16,imm8 r/m16 OR imm8 (sign-extended)
83 /1 ib OR r/m32,imm8 r/m32 OR imm8 (sign-extended)
08 / r OR r/m8,r8 r/m8 OR r8
09 / r OR r/m16,r16 r/m16 OR r16
09 / r OR r/m32,r32 r/m32 OR r32
0A / r OR r8,r/m8 r8 OR r/m8
0B / r OR r16,r/m16 r16 OR r/m16
0B / r OR r32,r/m32 r32 OR r/m32
TEST - Test For Bit Pattern
Usage: TEST dest,src
Modifies flags: CF OF PF SF ZF (AF undefined)
Performs a logical AND of the two operands updating the flags
register without saving the result.
Clocks Size
Operands 808x 286 386 486 Bytes
reg,reg 3 2 1 1 2
reg,mem 9+EA 6 5 1 2-4 (W88=13+EA)
mem,reg 9+EA 6 5 2 2-4 (W88=13+EA)
reg,immed 5 3 2 1 3-4
mem,immed 11+EA 6 5 2 3-6
accum,immed 4 3 2 1 2-3
A8 ib TEST AL, imm8 AND imm8 with AL; set SF, ZF, PF according to result
A9 iw TEST AX, imm16 AND imm16 with AX; set SF, ZF, PF according to result
A9 id TEST EAX, imm32 AND imm32 with EAX; set SF, ZF, PF according to result
F6 /0 ib TEST r/m8,imm8 AND imm8 with r/m8; set SF, ZF, PF according to result
F7 /0 iw TEST r/m16,imm16 AND imm16 with r/m16; set SF, ZF, PF according to result
F7 /0 id TEST r/m32,imm32 AND imm32 with r/m32; set SF, ZF, PF according to result
84 / r TEST r/m8,r8 AND r8 with r/m8; set SF, ZF, PF according to result
85 / r TEST r/m16,r16 AND r16 with r/m16; set SF, ZF, PF according to result
85 / r TEST r/m32,r32 AND r32 with r/m32; set SF, ZF, PF according to result
So, you can see when it matters and when it does not. Whenever you have a question like this, my advice is not to be too dependant on software as it can AFFECT the problem. GO to the reference material and live with the yeilded knowledge.
Paul
Quote from: PBrennick on August 06, 2008, 07:46:01 PM
About your test revealing 0 cycles for both instructions, obviously that is NOT the correct result for EITHER instruction so probably his macro is masking the true result because you do not have enough instructions to reveal it. EVERY instruction uses a certain amount of CPU instructions.
Of course, you are right. My only question is: So what? Does it matter, if an algo that was hammered to death by Michael, Lingo, Hutch and other champions runs at 155.5 instead of 156 cycles? At the expense that speed-hungry newbies start losing their precious time with Jxx? Re your last post: Do you really believe I did not study opcodes.hlp? By the way, it says 1 cycle for
test and
or, for reg/reg on a 486, which is probably outdated. In practice, or eax, eax is indeed a
tiny little bit slower.
EDIT: And I cannot even confirm the last sentence...
204 ms for test eax, eax
192 ms for or eax, eax
... for a lousy 1,000,000,000 loops.
Code is attached.
[attachment deleted by admin]
QuoteDoes it matter, if an algo that was hammered to death by Michael, Lingo, Hutch and other champions runs at 155.5 instead of 156 cycles?
Not really, but its sure fun to watch! (http://www.quickersoft.com/wink.gif)
I don't use destructive forms either.
I also use ESI and other registers without saving them for speed.
I also CHEAT and use MMX registers to temporarily save register values. I don't ever do MMX programming. I always use SSE/SSE2. So I can actually save it in the beginning of the program if need be.
movd MM0,ESP
;do code here
movd ESP,MM0
Mark,
Have you been known to use GOTO as well. :bg
Mark,
You know, you can be around programming and programmers for years and find that there is always 'something new under the sun.'
I hardly ever use mmx instructions, myself (the Calculator being the only exception), but it absolutely, as far as I can recall, ever occurred to me to make use of the unused mmx registers. I must play with that.
JJ,
Your mention of whether it hardly matters about a few cycles is the doom of all optiimizers who do not consider how many occurences there may be of such an instruction. Also, having taught programming, I insist to say that it is just plain, poor programming practice. It is code like that that gives an optimizer on a board like this a piece of cake. Consider if that instruction was in a message loop?
-- Paul
Well, while most of us will never reach Lingo's level of optimization skill, :wink, every little "trick" shaves off cycles. And for most code, optimization is pointless nowadays. But, instances always arise when the fastest or smallest code is necessary, and these various optimizations come into play. As assembly programmers, we all know this, and use the various "tricks" we've learned over the years to write exceptional code.
Why on Earth we're arguing about something so trivial is beyond me... :lol
Quote from: AMD x2 x64 dual-code AM2 4000+ (WinXP x32)151 ms for test eax, eax
177 ms for or eax, eax
TEST is definitely faster on AMD. Was faster on an AMD Athlon XP 2500+ also.
Mark,
I agree with the triviallity of it all and tried to reach him about it; but, I think he loves these little discussions. I even posted from the INTEL specifications but it was to no avail.
At this point it is fun, but tomorrow and another topic...
-- Paul
Quote from: hutch-- on August 07, 2008, 02:01:51 PM
Mark,
Have you been known to use GOTO as well. :bg
hehee. Yes I have. I have no qualms about doing anything to get the fastest code :bg
Quote from: PBrennick on August 07, 2008, 03:11:21 PM
Mark,
You know, you can be around programming and programmers for years and find that there is always 'something new under the sun.'
I hardly ever use mmx instructions, myself (the Calculator being the only exception), but it absolutely, as far as I can recall, ever occurred to me to make use of the unused mmx registers. I must play with that.
Yepper. That is why I still read the boards even though I might not post due to being too busy. I picked up a trick from JJ the other day. Using ffree st(7) in place of finit.
I have the tip I gave you above plus 60 others on my optimization website. The trick I just told you is actually trick #1 on the webpage.
http://www.mark.masmcode.com/
Quote from: PBrennick on August 07, 2008, 03:11:21 PM
JJ,
Your mention of whether it hardly matters about a few cycles is the doom of all optiimizers who do not consider how many occurences there may be of such an instruction. Also, having taught programming, I insist to say that it is just plain, poor programming practice.
The developers of MASM decided to use
or eax, eax instead
test eax, eax for the
.if eax==0 high level construct. Either they were plain, poor programmers, or they looked at their timings and decided it didn't really matter, or they wanted to test if they could destroy the CPU (attention: irony!!).
Quote from: PBrennick on August 07, 2008, 04:01:22 PM
Mark,
I agree with the triviallity of it all and tried to reach him about it; but, I think he loves these little discussions. I even posted from the INTEL specifications but it was to no avail.
Thanks for posting the Intel specifications. You should read them carefully: They say that both
or eax, eax and
test eax, eax use 2 bytes of code and need one cycle on a 486.
I am just flabbergasted how such a simple question - can I use .if eax==0, or do I have to study the jxx stuff? - can turn into an ideological debate.
Quote from: jj2007 on August 07, 2008, 05:30:11 PM
Quote from: PBrennick on August 07, 2008, 03:11:21 PM
JJ,
Your mention of whether it hardly matters about a few cycles is the doom of all optiimizers who do not consider how many occurences there may be of such an instruction. Also, having taught programming, I insist to say that it is just plain, poor programming practice.
The developers of MASM decided to use or eax, eax instead test eax, eax for the .if eax==0 high level construct. Either they were plain, poor programmers, or they looked at their timings and decided it didn't really matter, or they wanted to test if they could destroy the CPU (attention: irony!!).
Quote from: PBrennick on August 07, 2008, 04:01:22 PM
Mark,
I agree with the triviallity of it all and tried to reach him about it; but, I think he loves these little discussions. I even posted from the INTEL specifications but it was to no avail.
Thanks for posting the Intel specifications. You should read them carefully: They say that both or eax, eax and test eax, eax use 2 bytes of code and need one cycle on a 486.
I am just flabbergasted how such a simple question - can I use .if eax==0, or do I have to study the jxx stuff? - can turn into an ideological debate.
there are other bugs in MASM. Don't freak out. A while back Hutch-- was trying different ALIGN values and one of the ALIGNS actually modified the EFLAGS!
I recommend using Jxx
Quote from: Mark_Larson on August 07, 2008, 05:35:34 PM
there are other bugs in MASM. Don't freak out. A while back Hutch-- was trying different ALIGN values and one of the ALIGNS actually modified the EFLAGS!
Bugs are little beasts that make code fail. Translating
.if eax==0to
or eax, eax
je somewhere
does not make code fail, as the result is identical with the
test eax, eax variant. By the way, JWasm translates
.if eaxto
and eax, eax
je short somewhereThe same code under Masm translates to
or eax, eax
je short somewhereBoth variants are obviously "destructive" and therefore a case for the CPU police. Programmers to jail, hooray!
This is getting ridiculous, I'm outta here, this topic, I mean. JJ, do as you please. If you do not intend to listen, you should not ask. The bottleneck for the or instruction is in terms of memory --> register or register --> memory. Yes I did look and that is what I saw. That is why I said in an earlier post that if it is not a good idea at least on one occasion, then it should be avoided as it can become a habit and WILL be used when it DOES matter. I did not say it exactly like that but that is what I meant.
You know, JJ, there are times, when giving advice on this board, I say the wrong thing and when it is rightly pointed out to me, I have the good grace to accept it, wipe the egg off my face and move on. I do not like it and may even grumble; but, I accept the fact that I am wrong. You need to print out these words and post them on your wall. No one is perfect, but hopefully most of us know better than to beat a dead dog.
-- Paul
On my P3, in a contrived test that does not use the result, OR is faster, but in a more realistic test that does use the result there is no measurable difference.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
invoke Sleep, 4000
REPEAT 10
counter_begin 1000,HIGH_PRIORITY_CLASS
REPEAT 10
test eax, eax
test ebx, ebx
test ecx, ecx
test edx, edx
ENDM
counter_end
print ustr$(eax)," cycles, test",13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
REPEAT 10
or eax, eax
or ebx, ebx
or ecx, ecx
or edx, edx
ENDM
counter_end
print ustr$(eax)," cycles, or",13,10
ENDM
print chr$(13,10)
REPEAT 10
counter_begin 1000,HIGH_PRIORITY_CLASS
xor eax, eax
mov ebx, 1
REPEAT 10
test eax, eax
jnz @F
test ebx, ebx
jnz @F
@@:
ENDM
counter_end
print ustr$(eax)," cycles, test",13,10
counter_begin 1000,HIGH_PRIORITY_CLASS
xor eax, eax
mov ebx, 1
REPEAT 10
or eax, eax
jnz @F
or ebx, ebx
jnz @F
@@:
ENDM
counter_end
print ustr$(eax)," cycles, or",13,10
ENDM
print chr$(13,10)
inkey "Press any key to exit..."
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
22 cycles, test
15 cycles, or
22 cycles, test
15 cycles, or
22 cycles, test
15 cycles, or
22 cycles, test
15 cycles, or
22 cycles, test
15 cycles, or
22 cycles, test
15 cycles, or
22 cycles, test
22 cycles, or
22 cycles, test
15 cycles, or
22 cycles, test
15 cycles, or
22 cycles, test
15 cycles, or
21 cycles, test
21 cycles, or
21 cycles, test
21 cycles, or
21 cycles, test
21 cycles, or
21 cycles, test
21 cycles, or
21 cycles, test
21 cycles, or
21 cycles, test
21 cycles, or
21 cycles, test
21 cycles, or
21 cycles, test
21 cycles, or
21 cycles, test
21 cycles, or
21 cycles, test
21 cycles, or
Quote from: MichaelW on August 07, 2008, 08:18:15 PM
there is no measurable difference.
Thanxalot, Michael. This is one half of the story: The Masm developers didn't care because they knew there was no difference.
Quote from: jj2007 on August 07, 2008, 06:40:41 PM
Bugs are little beasts that make code fail. Translating
.if eax==0
to
or eax, eax
jne somewhere
does not make code fail
This is the other half of the story: I posted this some hours ago, and over 20 people including some bots have seen it. Nobody noticed my (unvoluntary) error. If we push newbies, with folkloristic stories about CPU's that "must" be slower for instructions that do "real" hard destructive work, towards using
jxx instead of the high level constructs, then we reinstall the reputation of assembler being a secret code for grim-looking men carrying punchcards with them.
Michael,
None of those tests involve a memory to registe,r or register to memory operation which is where the bottleneck is as I already said.
Also, Michael, something for you to think about in terms of your macros is the fact that they do not perform any cache loading so a test between instructions where one writes to the cache and another doesn't is not really valid.
And finally, back to the original point of all this OR 'is' destructive.
-- Paul
when i used TASM where no invoke was available.
if a register returned either TRUE or FALSE for success, i'd use:
dec eax
js @fail
or if it was -1, like after calling CreateFile
inc eax
jz @fail
xchg eax,ecx
jecxz @fail
Michael, would first 2 examples be faster than OR/TEST? ..probably not.
QuoteI also CHEAT and use MMX registers to temporarily save register values.
Correct me if I'm wrong but the MMX registers are the same ones used by the FPU. For those who may not be aware of it already, numerous WinXP functions (notably MessageBox and similar) may trash the FPU/MMX registers (this was not happening with Win98).
So, using those registers to store data may be safe as long as those trashing APIs are not used. However, there is no information as to which ones are at fault. Beware at least of those which may need to perform some calculations (even if only 2+2) such as window sizing of any kind.
:boohoo:
I am still fascinated at the level of waffle surrounding this "debate". The difference between high and low level code is well understood, the relevance of techniques depend on WHERE it is being applied and the criterion of what matters and what does not resides in the eye of the beholder.
RE bashing low level code to death, we have a Laborotory just for doing that as enough assembler programmers are interested in chasing speed where it matters and the deciding factors of how and where the designs are applied are also in the eye of the beholder. Prescribing "preferred" code design based on personal arbitrary criteria is a waste of space, argue about why blondes are more fun than brunettes, whether you are a Republicrat or a Democan, whether you drink tea of coffee for breakfast but spare us stuff that is objectively tested in the LAB and subjectively applied by the beholder.
Spare a thought for the chip designers at Intel and why they bothered to distinguish between an AND and a TEST, what the logical difference is between a weak inclusive "OR", and exclusive or "XOR" and the logic of "NOT", "NEG" and various others. I doubt at the time of the design of the original x86 that it was done to waste chip space, they all have their place, just use what works to do the job.
Quote from: raymond on August 08, 2008, 12:56:10 AM
QuoteI also CHEAT and use MMX registers to temporarily save register values.
Correct me if I'm wrong but the MMX registers are the same ones used by the FPU. For those who may not be aware of it already, numerous WinXP functions (notably MessageBox and similar) may trash the FPU/MMX registers (this was not happening with Win98).
So, using those registers to store data may be safe as long as those trashing APIs are not used. However, there is no information as to which ones are at fault. Beware at least of those which may need to perform some calculations (even if only 2+2) such as window sizing of any kind.
Raymond, this is really interesting. Do you have any reference saying which API's misbehave? I just tried with an ordinary MessageBox, but MMX registers did not change.
(at bottom of ShowQwMMX proc posted above)
int 3
invoke MessageBox, 0, chr$("MMX registers trashed?"), chr$("Test:"), MB_OK
emms ; be nice to others: cleanup
ret
ShowQwMMX endp
Quote from: raymond on August 08, 2008, 12:56:10 AM
QuoteI also CHEAT and use MMX registers to temporarily save register values.
Correct me if I'm wrong but the MMX registers are the same ones used by the FPU. For those who may not be aware of it already, numerous WinXP functions (notably MessageBox and similar) may trash the FPU/MMX registers (this was not happening with Win98).
So, using those registers to store data may be safe as long as those trashing APIs are not used. However, there is no information as to which ones are at fault. Beware at least of those which may need to perform some calculations (even if only 2+2) such as window sizing of any kind.
VC++ starts using mm0 and moves UPWARDS mm1, then mm2, etc. I start at the back (mm7) and move DOWNWARDS, so it makes it safer :) Since the FP and MMX registters are the same the same thing applies for how VC++ uses the FP registers.
As JJ is hinting at, it would sure be useful to have such a list, unfortunately I doubt we will find one and looking at it API by API would be a nightmare. It is useful to know that the MessageBox is safe (thank you JJ). If anyone finds any that should be on a warn off list, perhaps generating a list of these APIs in either or both of our SDKs would be very helpful to programmers.
-- Paul
Quote from: PBrennick on August 08, 2008, 01:56:16 PMIt is useful to know that the MessageBox is safe
Paul, I made some quick tests, and observed no changes of MMX registers when
emms was used on exit of my proc; however, I saw occasional changes when emms was
not used. It is not easy to reproduce, though. May Raymond has hands-on experience with the MessageBox...?
Googling for various mmx trash overwrite etc variants yields nothing; which leads to the suspicion that either very few people use mmx in their codes, or the problem is limited to a handful of exotic API's. We should ask Microsoft directly :wink
QuoteQuote from: PBrennick on Today at 09:56:16 AM
It is useful to know that the MessageBox is safe
As far as I'm concerned,
MessageBox is NOT safe.I've been caught a few times with algos returning garbage when data was left temporarily in FPU registers while caling MessageBox to alert the user. You can check that out with Ollydebug and watch what is happening with the FPU registers after calling the function.
The MS programmers found a new toy with the MMX registers (at least while developing WinXP, I have no experience with ME nor 2000) and seem to have used them whenever a calculation is required (even if it involves only integers). That's why I would put all APIs displaying a window on the watch list. If you suspect that an API requires some calculation, it should also be put on the watch list.
Quote from: raymond on August 08, 2008, 03:41:58 PM
I've been caught a few times with algos returning garbage when data was left temporarily in FPU registers while caling MessageBox to alert the user. You can check that out with Ollydebug and watch what is happening with the FPU registers after calling the function.
This is what I did, and it surely affected FPU registers. MMX were changed if emms was not used, but in my limited number of tests I could not observe changes to mmx registers when displaying a message box
and emms was used on exit proc. As we all know, FPU and MMX registers are physically the same, but emms might trigger something that saves future mmx content. Just guessing, sorry... I would still like to find some hard evidence from Microsoft & Co.
I have seen the effect before with a particular compiler I use from time to time and it has the ugly workaround of resetting the fp stack after every API call because it internally uses fp fr integer maths. It is a mistake for an OS to act in an unpredictable manner but in the cae of Mark's register usage the solution is simple, don't mix API code with code that uses FP registers. In the examples I have seen from Mark over time he uses this method within algorithms where there is no API code so its a perfectly reasonable technique in that context.
You hardly need to save ESP into an MMX register for a MessageBoxA call and this is the case with most API calls.
Quote from: hutch-- on August 09, 2008, 01:58:41 AM
It is a mistake for an OS to act in an unpredictable manner...
Oh crud, you mean we need to start safeguarding our applications
against windows? :lol
:bg
Whaddya mean "start" ? :bdg
Quote from: hutch-- on August 09, 2008, 03:15:33 PM
Whaddya mean "start" ? :bdg
:cheekygreen:
P.S. Forum is back to normal speed, script kiddie has left..
Quote from: hutch-- on August 09, 2008, 01:58:41 AM
I have seen the effect before with a particular compiler I use from time to time and it has the ugly workaround of resetting the fp stack after every API call because it internally uses fp fr integer maths. It is a mistake for an OS to act in an unpredictable manner but in the cae of Mark's register usage the solution is simple, don't mix API code with code that uses FP registers. In the examples I have seen from Mark over time he uses this method within algorithms where there is no API code so its a perfectly reasonable technique in that context.
You hardly need to save ESP into an MMX register for a MessageBoxA call and this is the case with most API calls.
He is correct a mundo :bg And I am tall :bg
Mark
JJ,
Quote
P.S. Forum is back to normal speed, script kiddie has left..
I use a dial-up and have to because of the living arrangement that I have. To make matters worse, it is one of those 'free' type accounts, so when the forum bogs down I usually get tossed more frquently than a frizzbee. I sure wish something could be done about clowns of that nature. They give jerks a bad name.
-- Paul
:bg
Paul,
> I usually get tossed more frquently than a frizzbee.
This would truly be interesting, I have fallen off ladders, motorbikes, pushbikes and landed anywhere from head to toe but I have never done a "frisby" yet, that would be a new experience. :P
Quote from: hutch-- on August 09, 2008, 11:16:25 PM
a "frisby" yet, that would be a new experience. :P
never turned around your computer to see what's wrong ? :lol
If a website is being bogged down by idiots, why in the world would I have to turn my computer around? Or am I missing something, landing on my head a lot may be having an effect.
BTW: we are witnessing 'the evolution of a topic.'
-- Paul
:bg
> never turned around your computer to see what's wrong ?
I was falling off things long before computes.
Besides, back then, it was a slipstick (never liked an abacus). :bg
-- Paul
Quote from: PBrennick on August 10, 2008, 03:21:24 AM
If a website is being bogged down by idiots, why in the world would I have to turn my computer around? Or am I missing something, landing on my head a lot may be having an effect.
maybe because it's the first thing i do ! :lol
NightWare,
This is interesting. Tell me, if you turn your computer around, does the website load faster? I just tried it and for some unfortunate reason, I am not having the same success as you.
-- Paul
Quote from: PBrennick on August 10, 2008, 10:12:31 PM
This is interesting. Tell me, if you turn your computer around, does the website load faster? I just tried it and for some unfortunate reason, I am not having the same success as you.
then you have a better (rtc) connection than mine, recent computer but old cables... :lol
Quote from: Mark_Larson on August 07, 2008, 01:39:19 PM
I also CHEAT and use MMX registers to temporarily save register values. I don't ever do MMX programming. I always use SSE/SSE2. So I can actually save it in the beginning of the program if need be.
You can also use the SSE2 registers the same way. And you don't have to worry about overlapping registers with the FP registers. However it is slower. So it's a balance.
I just checked my core 2 duo optimization book and the delays are the same. I know P4 is one cycle slower for the MOVD using the SSE2 registers.
movd xmm0,esp
;do code here
movd esp,xmm0