Discussion on instructions

PBrennick · August 12, 2008, 12:13:17 AM

Up here in New England, we have been decimated by rainfall this year. Is this happening in other parts of the world? Not much of a summer, just a handful of days in the 90s and the nights are quite chilly!

-- Paul

jdoe · August 12, 2008, 01:16:06 AM

Quote from: PBrennick on August 12, 2008, 12:13:17 AM
Up here in New England, we have been decimated by rainfall this year. Is this happening in other parts of the world? Not much of a summer, just a handful of days in the 90s and the nights are quite chilly!

It's the same here in New France (just a few miles from you I guess). Rain, rain and rain.
In the holyday season we are a little egocentric and we don't mind about the positives effects of rain :green2

GregL · August 12, 2008, 01:56:41 AM

Here in the northwest US, our spring was very rainy, overcast and cool, much more so than normal. The garden got off to a really slow start. But now our summer has been sunny and dry for the most part, but not excessively warm.

japheth · August 12, 2008, 11:39:21 AM

Quote from: hutch-- on August 11, 2008, 01:56:55 PM
The comon notion of "destructive" is one of where a register is WRITTEN TO.

Code Select Expand
mov eax, eax

The register remains the same but it has had itself written back to itself.

No. If this were true, then the term "non-destructive write" would be nonsense. However, it isn't. A destructive write is a write which changes content. IMO that's a matter of course.

If the "destructive" is refering to a cache content which is modified, then this has to be said explicitly, it's surely not "common notion".

sinsi · August 12, 2008, 12:01:12 PM

I seem to remember reading somewhere that a simple "or eax,eax" to test for eax=0 was a bad idea and to use "test eax,eax" (Agner maybe?), because of the stall.
I remember because typing "or" was easier than typing "test", but my habit now is to use "test".

Off topic, but I use "sub eax,eax" rather than "xor eax,eax" because it sounds better in my brain when I say it...

hutch-- · August 12, 2008, 12:55:27 PM

japheth,

> No. If this were true, then the term "non-destructive write" would be nonsense.

The term "non destructive write" IS nonsense. A write is a write is a write. code like,

Code Select


mov eax, eax

differs from code like,

Code Select


mov eax, ecx

only in what is written to EAX.

Mark Jones · August 12, 2008, 02:30:20 PM

I agree with Sinsi here. Using an Athlon box, if I take "de-facto standard code" which uses instructions like ADD REG,1 and CMP REG,0 and replace those with INC REG and TEST REG,REG then the code generally runs much faster.

Of course there are inherent differences in the various processors and now this code will run like crap on an Intel CPU... ::)

Therefore, we must all use SSE5 to test for zero instead! :bdg :lol

jj2007 · August 12, 2008, 02:54:40 PM

Quote from: sinsi on August 12, 2008, 12:01:12 PM
I seem to remember reading somewhere that a simple "or eax,eax" to test for eax=0 was a bad idea and to use "test eax,eax" (Agner maybe?), because of the stall.

Agner, microarchitecture, page 45:
If the SUB ECX,EAX instruction in the first triplet is changed to CMP ECX,EAX then ECX is not written to, and we will get a stall

Hmmm... so we must use a destructive instruction to avoid a stall?

I don't want to appear stupid, but I am still waiting for an explanation, better: a URL, explaining which cache is affected by or eax, eax
I have tried Google with a highly specific destructive "test eax" "or eax" search, but it yields only 15 hits, most of them irrelevant; and this thread here is on top (yeah, they are fast at Google :U).

EDIT: Slowly going mad. I want to understand these things... I even downloaded the Intel® Architecture Optimization Reference Manual here. 322 pages, 330 times cache, 0 times destructive.

MichaelW · August 12, 2008, 03:50:53 PM

The only relevant thing I could find in Agner Fog's recent optimization manuals was the statement that TEST can cause partial flags stalls when followed by LAHF or PUSHF(D), where under the same conditions AND and OR will not, but I think this is not really worth considering because of the unusual conditions.

From the IA-32 Intel Architecture Optimization Reference Manual (24896611.pdf):

Quote
Use test when comparing a value in a register with zero. Test essentially ands the operands together without writing to a destination register. Test is preferred over and because and produces an extra result register. Test is better than cmp ..., 0 because the instruction size is smaller.

Assembly/Compiler Coding Rule 50. (ML impact, M generality)
Use the test instruction instead of and when the result of the logical and is not used. This saves uops in execution. Use a test if a register with itself instead of a cmp of the register to zero, this saves the need to encode the zero and saves encoding space. Avoid comparing a constant to a memory operand. It is preferable to load the memory operand and compare the constant to a register.

I think the reasonable approach would be to assume that under normal conditions TEST may be faster. I tend to use TEST, same as I tend to use XOR reg, reg instead of MOV reg, 0, even though AFAIK the former had a speed advantage only on the 8086/88.

jj2007 · August 12, 2008, 04:03:15 PM

Quote from: MichaelW on August 12, 2008, 03:50:53 PM
The only relevant thing I could find in Agner Fog's recent optimization manuals was the statement that TEST can cause partial flags stalls when followed by LAHF or PUSHF(D), where under the same conditions AND and OR will not, but I think this is not really worth considering because of the unusual conditions.

Agreed.

Quote
From the IA-32 Intel Architecture Optimization Reference Manual (24896611.pdf):
Quote
Use test when comparing a value in a register with zero. Test essentially ands the operands together without writing to a destination register. Test is preferred over and because and produces an extra result register.

Which may indeed affect speed, although it's probably not the original physical register, due to register renaming. The accent is on "might", because setting flags is also a write operation, and who knows whether the two writes can be performed in parallel or not...

Quote
I think the reasonable approach would be to assume that under normal conditions TEST may be faster. I tend to use TEST, same as I tend to use XOR reg, reg instead of MOV reg, 0, even though AFAIK the former had a speed advantage only on the 8086/88.

The whole point is and was whether this potential hypothetical but yet unproven gain is important enough to incite newbies to fumble with jxx instead of using .if eax==0... oh well. By the way, I also use test if for some reason I don't want the high level syntax.
Thanks for a sober post, Michael.

Rockoon · August 26, 2008, 01:36:05 PM

Any instruction sequence with more uops than another "functionally equivilent" sequence could very well effect performance on intel cpu's with a Trace Cache <-- a cache

We sometimes forget that we arent programming for 386's anymore.

Why is there an arguement about this?

An extra operation, even if it does nothing functional, is still an extra operation. That extra operation may not carry with it a performance penalty (today) but its a silly arguement to run headlong into your code with the idea that extra operations are "OK" if they don't effect performance.

Given two methods that perform equivilently but one of them actualy does less work, there is no debate at all about which one is superior. Really. No debate at all.

Choosing the extra operation in equivilent performance situations is just being stuborn, and promoting it would be irresponsible.

As with all performance-related questions, first you should determine which methods are most performant in real world scenarios (which doesnt mean some tight loop around code fragments.) Then you break ties using logic and common sense, considering relevant factors.

Relevant factors may include resource usage, but may also consider other existing and future architectures (legacy & longevity) or even power usage. You might shrug off power usage for your particular application, but others may not be so comfortable in doing so. Maybe their code has a target of a trillion hours of execution time spread over a hundred million customers. That small per-iteration power savings really starts to look like something at these scales. Whats relevant is very situational.

Quote from: jj2007 on August 12, 2008, 04:03:15 PM
The whole point is and was whether this potential hypothetical but yet unproven gain is important enough to incite newbies to fumble with jxx instead of using .if eax==0... oh well. By the way, I also use test if for some reason I don't want the high level syntax.
Thanks for a sober post, Michael.

Fumble with the jxx instructions?

vs what? Thinking that .if is a real instruction?

Newbies, right?

Should newbies really be using macros that emit more than 1 instruction, ever?
Should newbies be shielded from the details of the flags register?

The flags register should be thrust upon newbies, because nearly every instruction executed within a program writes to it, and the best tweak-style optimizations leverage that fact.

Mark_Larson · August 26, 2008, 04:34:49 PM

well said Rockoon.

Kram

jj2007 · August 26, 2008, 05:19:39 PM

Quote from: Rockoon on August 26, 2008, 01:36:05 PM
Why is there an arguement about this?

Because "your" side produces no evidence (lab tests, timings), just hearsay

Quote
You might shrug off power usage for your particular application

I never would. Show many how many % of electricity you can save by poking your code directly into memory, instead of fumbling with hl constructs

Quote
Fumble with the jxx instructions?
vs what? Thinking that .if is a real instruction?

Microsoft programmers thought 10 years ago that .if was a real Masm instruction. It still works in Masm 9.0

It is always a pleasure to see your blood pressure rising, Rockoon :green

MichaelW · August 26, 2008, 05:31:41 PM

MASM 6.0 was released ~1991.

jj2007 · August 26, 2008, 05:55:13 PM

Quote from: MichaelW on August 26, 2008, 05:31:41 PM
MASM 6.0 was released ~1991.

So we can consider .if "evil legacy code" :toothy

News:

Discussion on instructions

jdoe

japheth