Print Page - OR eax, eax

Title: OR eax, eax
Post by: cork on July 23, 2010, 05:08:21 AM

The MASM programmer's guide (quite old) states that using
OR eax, eax
jz my_label

is preferable to
CMP eax, 0
jz my_label

It states that OR produces smaller and faster code since it does not use an immediate number as an operand.

OR eax, eax probably has a shorter encoding. But by now I'm guessing that OR and CMP probably take no more than 1 cycle each, and that they probably can both be parallelized, equally well, within the processor.

Other than the shorter encoding, is there any real benefit to using OR vs CMP to test for zero?

Title: Re: OR eax, eax
Post by: cork on July 23, 2010, 05:20:59 AM

Also, why does MASM encode "CMP eax, 4294967295" as "83 F8 FF"?
It doesn't look like it's storing the immediate 32-bit value as a 32-bit value, but as a 16-bit value instead?

00000000 83 F8 FF        cmp EAX, 4294967295
00000003 0B C0        or EAX, EAX

I used: "ml /c /Fl test2.asm" to generate the listing.

Yet when I use a different number, it gives me:

00000000 1 3D C45D94DF     cmp EAX, 3294467295

Title: Re: OR eax, eax
Post by: sinsi on July 23, 2010, 05:46:15 AM

Don't forget 'test eax,eax' :bg

Let the war begin!

Title: Re: OR eax, eax
Post by: jj2007 on July 23, 2010, 06:01:07 AM

Quote from: sinsi on July 23, 2010, 05:46:15 AM
Don't forget 'test eax,eax' :bg

Let the war begin!

Yeah :cheekygreen:
And remember: no prisoners - kill them register destructors right away!!

Title: Re: OR eax, eax
Post by: gwapo on July 23, 2010, 06:11:51 AM

There's also the JECXZ if the register being tested for zero value is ECX :bg

Title: Re: OR eax, eax
Post by: hutch-- on July 23, 2010, 08:49:47 AM

:bg

> There's also the JECXZ if the register being tested for zero value is ECX <<< BANG

Intel preferred is TEST REG, REG.

Title: Re: OR eax, eax
Post by: gwapo on July 23, 2010, 09:46:38 AM

What's wrong with JECXZ? I use it a lot.

Title: Re: OR eax, eax
Post by: cork on July 23, 2010, 10:33:14 AM

Quote from: hutch-- on July 23, 2010, 08:49:47 AM
:bg

> There's also the JECXZ if the register being tested for zero value is ECX <<< BANG

Intel preferred is TEST REG, REG.

By serendipitous coincidence I just happened across this statement in an old Abrash book. He's talking about the Pentium, which is long before later architectures came about, but perhaps it gives some insight into why TEST REG, REG:

"Also, start using TEST reg, reg instead of AND reg, rreg or OR reg, reg to test whether a register is zero. The reason is that TEST, unlike AND and OR, never modifies the target register. Although in this particular case AND and OR don't modify the target register either, the Pentium has no way of knowing that ahead of time, so if AND or OR goes through the U-pipe, the Pentium may have to shut down the V-pipe for a cycle to avoid potential dependencies on the result of the AND or OR. TEST suffers from no such potential dependencies."

That's a nice little insight into why TEST reg, reg was the preferred way with the Pentium... and very well may still be the preferred method with later superscalar architectures for similar reasons.

I'll use TEST reg, reg.

Title: Re: OR eax, eax
Post by: hutch-- on July 23, 2010, 10:56:18 AM

:bg

Chris,

IIIIIItttttttt iiiisssssss ssssssooooooo sssssslllllloooooowwwwwwwwwwwww you can count it with your fingers in comparison.

Title: Re: OR eax, eax
Post by: redskull on July 23, 2010, 11:49:34 AM

Quote from: cork on July 23, 2010, 05:08:21 AM
But by now I'm guessing that OR and CMP probably take no more than 1 cycle each...

Just for future reference, the sooner you abandon the obsolete and outdated idea of "instructions taking cycles", the better off you'll be. The time it takes instructions to execute is not so much determined by the instruction itself, but moreso by all the other instructions surrounding it.

-r

Title: Re: OR eax, eax
Post by: cork on July 23, 2010, 12:03:30 PM

Quote from: redskull on July 23, 2010, 11:49:34 AM
Quote from: cork on July 23, 2010, 05:08:21 AM
But by now I'm guessing that OR and CMP probably take no more than 1 cycle each...

Just for future reference, the sooner you abandon the obsolete and outdated idea of "instructions taking cycles", the better off you'll be. The time it takes instructions to execute is not so much determined by the instruction itself, but moreso by all the other instructions surrounding it.

-r

Right. I've been reading a little bit about how the multiple pipes in a core are fed and instruction pairings. Mostly out of curiosity - I don't know enough yet to make actual use of the information.

Title: Re: OR eax, eax
Post by: redskull on July 23, 2010, 12:38:57 PM

Quote from: cork on July 23, 2010, 12:03:30 PM
Right. I've been reading a little bit about how the multiple pipes in a core are fed and instruction pairings. Mostly out of curiosity - I don't know enough yet to make actual use of the information.

I assume you are reading the "graphics black book", which is certainly a tremendous read and one of the best computer books ever written (I still have the paper copy up on a shelf), but beware that even the Pentium material is wildly outdated; new CPU's don't really care about pairing all that much. Just be cautious, and take everything you read with a hefty grain of salt. Midpredictions and dependencies slow down code much, much more than any individual execution time. But as long as you have the curiosity, that's all you really need.

-r

Title: Re: OR eax, eax
Post by: cork on July 23, 2010, 01:27:37 PM

Thanks redskull. Yep, that's the book allright - Graphics Programming Black Book.

A fanastic book that I can't praise enough is is "Inside the Machine by Jon Stokes". Two more information-dense sources are Inner Loops by Rick Booth and Pentium Processor Optimization Tools by Michael L. Schmit. Combined with Abrash's Graphics Programming Black Book and Agner Fog's online PDFs (wow), I'm in over my head - and lovin' it. :dance:

Title: Re: OR eax, eax
Post by: clive on July 23, 2010, 02:28:24 PM

Quote from: cork on July 23, 2010, 05:20:59 AM
Also, why does MASM encode "CMP eax, 4294967295" as "83 F8 FF"?
It doesn't look like it's storing the immediate 32-bit value as a 32-bit value, but as a 16-bit value instead?

00000000 83 F8 FF cmp EAX, 4294967295

It's a sign extended BYTE, which MASM has optimized, it could encode it as a DWORD if the number won't fit in signed 8-bits OR it is a relocation target and MASM doesn't know the final value.

0-7F will encode as 00-7F, FFFFFF80-FFFFFFFF as 80-FF

MASM 6.14 .LST

Code Select

 00000032  83 F8 12		                cmp     eax,012h
 00000035  3D 00001234		                cmp     eax,01234h
 0000003A  3D 12345678		                cmp     eax,012345678h
 0000003F  3D 000000FF		                cmp     eax,0FFh
 00000044  3D 0000FFFF		                cmp     eax,0FFFFh
 00000049  83 F8 FF		                cmp     eax,0FFFFFFFFh
 0000004C  83 F8 FF		                cmp     eax,-1
 0000004F  83 F8 81		                cmp     eax,-127
 00000052  83 F8 80		                cmp     eax,-128
 00000055  83 F8 80		                cmp     eax,0FFFFFF80h
 00000058  83 F8 7F		                cmp     eax,127
 0000005B  83 F8 7F		                cmp     eax,07Fh
 0000005E  3D 00000080		                cmp     eax,128

Be aware that some disassemblers (DUMPBIN 2.xx, 3.xx, 5.xx from MSVC 2.x, 4.x & 5.0, and others) will decode these incorrectly

Code Select

  00000032: 83 F8 12           cmp         eax,12h
  00000035: 3D 34 12 00 00     cmp         eax,1234h
  0000003A: 3D 78 56 34 12     cmp         eax,12345678h
  0000003F: 3D FF 00 00 00     cmp         eax,0FFh
  00000044: 3D FF FF 00 00     cmp         eax,0FFFFh
  00000049: 83 F8 FF           cmp         eax,0FFh  WRONG
  0000004C: 83 F8 FF           cmp         eax,0FFh WRONG
  0000004F: 83 F8 81           cmp         eax,81h WRONG
  00000052: 83 F8 80           cmp         eax,80h WRONG
  00000055: 83 F8 80           cmp         eax,80h WRONG
  00000058: 83 F8 7F           cmp         eax,7Fh
  0000005B: 83 F8 7F           cmp         eax,7Fh
  0000005E: 3D 80 00 00 00     cmp         eax,80h

Fixed in DUMPBIN 6.xx+

Code Select

  00000032: 83 F8 12           cmp         eax,12h
  00000035: 3D 34 12 00 00     cmp         eax,1234h
  0000003A: 3D 78 56 34 12     cmp         eax,12345678h
  0000003F: 3D FF 00 00 00     cmp         eax,0FFh
  00000044: 3D FF FF 00 00     cmp         eax,0FFFFh
  00000049: 83 F8 FF           cmp         eax,0FFFFFFFFh
  0000004C: 83 F8 FF           cmp         eax,0FFFFFFFFh
  0000004F: 83 F8 81           cmp         eax,0FFFFFF81h
  00000052: 83 F8 80           cmp         eax,0FFFFFF80h
  00000055: 83 F8 80           cmp         eax,0FFFFFF80h
  00000058: 83 F8 7F           cmp         eax,7Fh
  0000005B: 83 F8 7F           cmp         eax,7Fh
  0000005E: 3D 80 00 00 00     cmp         eax,80h

Also ADD, ADC, SUB, SBB have 8-bit sign extended formats.

Using more compact opcode forms won't help much for decoding/execution speed, but will improved code density, and caching

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: cork on July 23, 2010, 05:08:21 AM