News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

OR eax, eax

Started by cork, July 23, 2010, 05:08:21 AM

Previous topic - Next topic

cork

The MASM programmer's guide (quite old) states that using
  OR eax, eax
  jz  my_label

is preferable to
  CMP eax, 0
  jz my_label

It states that OR produces smaller and faster code since it does not use an immediate number as an operand.

OR eax, eax probably has a shorter encoding. But by now I'm guessing that OR and CMP probably take no more than 1 cycle each, and that they probably can both be parallelized, equally well, within the processor.

Other than the shorter encoding, is there any real benefit to using OR vs CMP to test for zero?

cork

Also, why does MASM encode "CMP eax, 4294967295" as "83 F8 FF"?
It doesn't look like it's storing the immediate 32-bit value as a 32-bit value, but as a 16-bit value instead?

00000000  83 F8 FF          cmp EAX, 4294967295
00000003  0B C0          or EAX, EAX

I used: "ml /c /Fl test2.asm" to generate the listing.

Yet when I use a different number, it gives me:

00000000   1   3D C45D94DF            cmp EAX, 3294467295

sinsi

Don't forget 'test eax,eax'  :bg

Let the war begin!
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

Quote from: sinsi on July 23, 2010, 05:46:15 AM
Don't forget 'test eax,eax'  :bg

Let the war begin!

Yeah :cheekygreen:
And remember: no prisoners - kill them register destructors right away!!

gwapo

There's also the JECXZ if the register being tested for zero value is ECX  :bg

hutch--

 :bg

> There's also the JECXZ if the register being tested for zero value is ECX  <<< BANG

Intel preferred is TEST REG, REG.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

gwapo

What's wrong with JECXZ? I use it a lot.

cork

Quote from: hutch-- on July 23, 2010, 08:49:47 AM
:bg

> There's also the JECXZ if the register being tested for zero value is ECX  <<< BANG

Intel preferred is TEST REG, REG.

By serendipitous coincidence I just happened across this statement in an old Abrash book. He's talking about the Pentium, which is long before later architectures came about, but perhaps it gives some insight into why TEST REG, REG:

"Also, start using TEST reg, reg instead of AND reg, rreg or OR reg, reg to test whether a register is zero. The reason is that TEST, unlike AND and OR, never modifies the target register. Although in this particular case AND and OR don't modify the target register either, the Pentium has no way of knowing that ahead of time, so if AND or OR goes through the U-pipe, the Pentium may have to shut down the V-pipe for a cycle to avoid potential dependencies on the result of the AND or OR. TEST suffers from no such potential dependencies."

That's a nice little insight into why TEST reg, reg was the preferred way with the Pentium... and very well may still be the preferred method with later superscalar architectures for similar reasons.

I'll use TEST reg, reg.

hutch--

 :bg

Chris,

IIIIIItttttttt iiiisssssss ssssssooooooo sssssslllllloooooowwwwwwwwwwwww you can count it with your fingers in comparison.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

redskull

Quote from: cork on July 23, 2010, 05:08:21 AM
But by now I'm guessing that OR and CMP probably take no more than 1 cycle each...

Just for future reference, the sooner you abandon the obsolete and outdated idea of "instructions taking cycles", the better off you'll be.  The time it takes instructions to execute is not so much determined by the instruction itself, but moreso by all the other instructions surrounding it.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

cork

Quote from: redskull on July 23, 2010, 11:49:34 AM
Quote from: cork on July 23, 2010, 05:08:21 AM
But by now I'm guessing that OR and CMP probably take no more than 1 cycle each...

Just for future reference, the sooner you abandon the obsolete and outdated idea of "instructions taking cycles", the better off you'll be.  The time it takes instructions to execute is not so much determined by the instruction itself, but moreso by all the other instructions surrounding it.

-r

Right. I've been reading a little bit about how the multiple pipes in a core are fed and instruction pairings. Mostly out of curiosity - I don't know enough yet to make actual use of the information.

redskull

Quote from: cork on July 23, 2010, 12:03:30 PM
Right. I've been reading a little bit about how the multiple pipes in a core are fed and instruction pairings. Mostly out of curiosity - I don't know enough yet to make actual use of the information.


I assume you are reading the "graphics black book", which is certainly a tremendous read and one of the best computer books ever written (I still have the paper copy up on a shelf), but beware that even the Pentium material is wildly outdated; new CPU's don't really care about pairing all that much. Just be cautious, and take everything you read with a hefty grain of salt.  Midpredictions and dependencies slow down code much, much more than any individual execution time.  But as long as you have the curiosity, that's all you really need.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

cork

Thanks redskull. Yep, that's the book allright - Graphics Programming Black Book.

A fanastic book that I can't praise enough is is "Inside the Machine by Jon Stokes". Two more information-dense sources are Inner Loops by Rick Booth and Pentium Processor Optimization Tools by Michael L. Schmit. Combined with Abrash's Graphics Programming Black Book and Agner Fog's online PDFs (wow), I'm in over my head - and lovin' it.  :dance:

clive

Quote from: cork on July 23, 2010, 05:20:59 AM
Also, why does MASM encode "CMP eax, 4294967295" as "83 F8 FF"?
It doesn't look like it's storing the immediate 32-bit value as a 32-bit value, but as a 16-bit value instead?

00000000  83 F8 FF          cmp EAX, 4294967295

It's a sign extended BYTE, which MASM has optimized, it could encode it as a DWORD if the number won't fit in signed 8-bits OR it is a relocation target and MASM doesn't know the final value.

0-7F will encode as 00-7F, FFFFFF80-FFFFFFFF as 80-FF

MASM 6.14 .LST

00000032  83 F8 12                 cmp     eax,012h
00000035  3D 00001234                 cmp     eax,01234h
0000003A  3D 12345678                 cmp     eax,012345678h
0000003F  3D 000000FF                 cmp     eax,0FFh
00000044  3D 0000FFFF                 cmp     eax,0FFFFh
00000049  83 F8 FF                 cmp     eax,0FFFFFFFFh
0000004C  83 F8 FF                 cmp     eax,-1
0000004F  83 F8 81                 cmp     eax,-127
00000052  83 F8 80                 cmp     eax,-128
00000055  83 F8 80                 cmp     eax,0FFFFFF80h
00000058  83 F8 7F                 cmp     eax,127
0000005B  83 F8 7F                 cmp     eax,07Fh
0000005E  3D 00000080                 cmp     eax,128


Be aware that some disassemblers (DUMPBIN 2.xx, 3.xx, 5.xx from MSVC 2.x, 4.x & 5.0, and others) will decode these incorrectly

  00000032: 83 F8 12           cmp         eax,12h
  00000035: 3D 34 12 00 00     cmp         eax,1234h
  0000003A: 3D 78 56 34 12     cmp         eax,12345678h
  0000003F: 3D FF 00 00 00     cmp         eax,0FFh
  00000044: 3D FF FF 00 00     cmp         eax,0FFFFh
  00000049: 83 F8 FF           cmp         eax,0FFh  WRONG
  0000004C: 83 F8 FF           cmp         eax,0FFh WRONG
  0000004F: 83 F8 81           cmp         eax,81h WRONG
  00000052: 83 F8 80           cmp         eax,80h WRONG
  00000055: 83 F8 80           cmp         eax,80h WRONG
  00000058: 83 F8 7F           cmp         eax,7Fh
  0000005B: 83 F8 7F           cmp         eax,7Fh
  0000005E: 3D 80 00 00 00     cmp         eax,80h


Fixed in DUMPBIN 6.xx+

  00000032: 83 F8 12           cmp         eax,12h
  00000035: 3D 34 12 00 00     cmp         eax,1234h
  0000003A: 3D 78 56 34 12     cmp         eax,12345678h
  0000003F: 3D FF 00 00 00     cmp         eax,0FFh
  00000044: 3D FF FF 00 00     cmp         eax,0FFFFh
  00000049: 83 F8 FF           cmp         eax,0FFFFFFFFh
  0000004C: 83 F8 FF           cmp         eax,0FFFFFFFFh
  0000004F: 83 F8 81           cmp         eax,0FFFFFF81h
  00000052: 83 F8 80           cmp         eax,0FFFFFF80h
  00000055: 83 F8 80           cmp         eax,0FFFFFF80h
  00000058: 83 F8 7F           cmp         eax,7Fh
  0000005B: 83 F8 7F           cmp         eax,7Fh
  0000005E: 3D 80 00 00 00     cmp         eax,80h


Also ADD, ADC, SUB, SBB have 8-bit sign extended formats.

Using more compact opcode forms won't help much for decoding/execution speed, but will improved code density, and caching
It could be a random act of randomness. Those happen a lot as well.