The MASM programmer's guide (quite old) states that using
OR eax, eax
jz my_label
is preferable to
CMP eax, 0
jz my_label
It states that OR produces smaller and faster code since it does not use an immediate number as an operand.
OR eax, eax probably has a shorter encoding. But by now I'm guessing that OR and CMP probably take no more than 1 cycle each, and that they probably can both be parallelized, equally well, within the processor.
Other than the shorter encoding, is there any real benefit to using OR vs CMP to test for zero?
Also, why does MASM encode "CMP eax, 4294967295" as "83 F8 FF"?
It doesn't look like it's storing the immediate 32-bit value as a 32-bit value, but as a 16-bit value instead?
00000000 83 F8 FF cmp EAX, 4294967295
00000003 0B C0 or EAX, EAX
I used: "ml /c /Fl test2.asm" to generate the listing.
Yet when I use a different number, it gives me:
00000000 1 3D C45D94DF cmp EAX, 3294467295
Don't forget 'test eax,eax' :bg
Let the war begin!
Quote from: sinsi on July 23, 2010, 05:46:15 AM
Don't forget 'test eax,eax' :bg
Let the war begin!
Yeah :cheekygreen:
And remember: no prisoners - kill them register destructors right away!!
There's also the JECXZ if the register being tested for zero value is ECX :bg
:bg
> There's also the JECXZ if the register being tested for zero value is ECX <<< BANG
Intel preferred is TEST REG, REG.
What's wrong with JECXZ? I use it a lot.
Quote from: hutch-- on July 23, 2010, 08:49:47 AM
:bg
> There's also the JECXZ if the register being tested for zero value is ECX <<< BANG
Intel preferred is TEST REG, REG.
By serendipitous coincidence I just happened across this statement in an old Abrash book. He's talking about the Pentium, which is long before later architectures came about, but perhaps it gives some insight into why TEST REG, REG:
"Also, start using TEST reg, reg instead of AND reg, rreg or OR reg, reg to test whether a register is zero. The reason is that TEST, unlike AND and OR, never modifies the target register. Although in this particular case AND and OR don't modify the target register either, the Pentium has no way of knowing that ahead of time, so if AND or OR goes through the U-pipe, the Pentium may have to shut down the V-pipe for a cycle to avoid potential dependencies on the result of the AND or OR. TEST suffers from no such potential dependencies."
That's a nice little insight into why TEST reg, reg was the preferred way with the Pentium... and very well may still be the preferred method with later superscalar architectures for similar reasons.
I'll use TEST reg, reg.
:bg
Chris,
IIIIIItttttttt iiiisssssss ssssssooooooo sssssslllllloooooowwwwwwwwwwwww you can count it with your fingers in comparison.
Quote from: cork on July 23, 2010, 05:08:21 AM
But by now I'm guessing that OR and CMP probably take no more than 1 cycle each...
Just for future reference, the sooner you abandon the obsolete and outdated idea of "instructions taking cycles", the better off you'll be. The time it takes instructions to execute is not so much determined by the instruction itself, but moreso by all the other instructions surrounding it.
-r
Quote from: redskull on July 23, 2010, 11:49:34 AM
Quote from: cork on July 23, 2010, 05:08:21 AM
But by now I'm guessing that OR and CMP probably take no more than 1 cycle each...
Just for future reference, the sooner you abandon the obsolete and outdated idea of "instructions taking cycles", the better off you'll be. The time it takes instructions to execute is not so much determined by the instruction itself, but moreso by all the other instructions surrounding it.
-r
Right. I've been reading a little bit about how the multiple pipes in a core are fed and instruction pairings. Mostly out of curiosity - I don't know enough yet to make actual use of the information.
Quote from: cork on July 23, 2010, 12:03:30 PM
Right. I've been reading a little bit about how the multiple pipes in a core are fed and instruction pairings. Mostly out of curiosity - I don't know enough yet to make actual use of the information.
I assume you are reading the "graphics black book", which is certainly a tremendous read and one of the best computer books ever written (I still have the paper copy up on a shelf), but beware that even the Pentium material is wildly outdated; new CPU's don't really care about pairing all that much. Just be cautious, and take everything you read with a hefty grain of salt. Midpredictions and dependencies slow down code much, much more than any individual execution time. But as long as you have the curiosity, that's all you really need.
-r
Thanks redskull. Yep, that's the book allright - Graphics Programming Black Book.
A fanastic book that I can't praise enough is is "Inside the Machine by Jon Stokes". Two more information-dense sources are Inner Loops by Rick Booth and Pentium Processor Optimization Tools by Michael L. Schmit. Combined with Abrash's Graphics Programming Black Book and Agner Fog's online PDFs (wow), I'm in over my head - and lovin' it. :dance:
Quote from: cork on July 23, 2010, 05:20:59 AM
Also, why does MASM encode "CMP eax, 4294967295" as "83 F8 FF"?
It doesn't look like it's storing the immediate 32-bit value as a 32-bit value, but as a 16-bit value instead?
00000000 83 F8 FF cmp EAX, 4294967295
It's a sign extended BYTE, which MASM has optimized, it could encode it as a DWORD if the number won't fit in signed 8-bits OR it is a relocation target and MASM doesn't know the final value.
0-7F will encode as 00-7F, FFFFFF80-FFFFFFFF as 80-FF
MASM 6.14 .LST
00000032 83 F8 12 cmp eax,012h
00000035 3D 00001234 cmp eax,01234h
0000003A 3D 12345678 cmp eax,012345678h
0000003F 3D 000000FF cmp eax,0FFh
00000044 3D 0000FFFF cmp eax,0FFFFh
00000049 83 F8 FF cmp eax,0FFFFFFFFh
0000004C 83 F8 FF cmp eax,-1
0000004F 83 F8 81 cmp eax,-127
00000052 83 F8 80 cmp eax,-128
00000055 83 F8 80 cmp eax,0FFFFFF80h
00000058 83 F8 7F cmp eax,127
0000005B 83 F8 7F cmp eax,07Fh
0000005E 3D 00000080 cmp eax,128
Be aware that some disassemblers (DUMPBIN 2.xx, 3.xx, 5.xx from MSVC 2.x, 4.x & 5.0, and others) will decode these incorrectly
00000032: 83 F8 12 cmp eax,12h
00000035: 3D 34 12 00 00 cmp eax,1234h
0000003A: 3D 78 56 34 12 cmp eax,12345678h
0000003F: 3D FF 00 00 00 cmp eax,0FFh
00000044: 3D FF FF 00 00 cmp eax,0FFFFh
00000049: 83 F8 FF cmp eax,0FFh WRONG
0000004C: 83 F8 FF cmp eax,0FFh WRONG
0000004F: 83 F8 81 cmp eax,81h WRONG
00000052: 83 F8 80 cmp eax,80h WRONG
00000055: 83 F8 80 cmp eax,80h WRONG
00000058: 83 F8 7F cmp eax,7Fh
0000005B: 83 F8 7F cmp eax,7Fh
0000005E: 3D 80 00 00 00 cmp eax,80h
Fixed in DUMPBIN 6.xx+
00000032: 83 F8 12 cmp eax,12h
00000035: 3D 34 12 00 00 cmp eax,1234h
0000003A: 3D 78 56 34 12 cmp eax,12345678h
0000003F: 3D FF 00 00 00 cmp eax,0FFh
00000044: 3D FF FF 00 00 cmp eax,0FFFFh
00000049: 83 F8 FF cmp eax,0FFFFFFFFh
0000004C: 83 F8 FF cmp eax,0FFFFFFFFh
0000004F: 83 F8 81 cmp eax,0FFFFFF81h
00000052: 83 F8 80 cmp eax,0FFFFFF80h
00000055: 83 F8 80 cmp eax,0FFFFFF80h
00000058: 83 F8 7F cmp eax,7Fh
0000005B: 83 F8 7F cmp eax,7Fh
0000005E: 3D 80 00 00 00 cmp eax,80h
Also ADD, ADC, SUB, SBB have 8-bit sign extended formats.
Using more compact opcode forms won't help much for decoding/execution speed, but will improved code density, and caching