News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Discussion on instructions

Started by jj2007, August 10, 2008, 02:01:21 PM

Previous topic - Next topic

PBrennick

Up here in New England, we have been decimated by rainfall this year. Is this happening in other parts of the world? Not much of a summer, just a handful of days in the 90s and the nights are quite chilly!

-- Paul
The GeneSys Project is available from:
The Repository or My crappy website

jdoe

Quote from: PBrennick on August 12, 2008, 12:13:17 AM
Up here in New England, we have been decimated by rainfall this year. Is this happening in other parts of the world? Not much of a summer, just a handful of days in the 90s and the nights are quite chilly!

It's the same here in New France (just a few miles from you I guess). Rain, rain and rain.
In the holyday season we are a little egocentric and we don't mind about the positives effects of rain   :green2


GregL

Here in the northwest US, our spring was very rainy, overcast and cool, much more so than normal. The garden got off to a really slow start. But now our summer has been sunny and dry for the most part, but not excessively warm.


japheth

Quote from: hutch-- on August 11, 2008, 01:56:55 PM
The comon notion of "destructive" is one of where a register is WRITTEN TO.


mov eax, eax


The register remains the same but it has had itself written back to itself.

No. If this were true, then the term "non-destructive write" would be nonsense. However, it isn't. A destructive write is a write which changes content. IMO that's a matter of course.

If the "destructive" is refering to a cache content which is modified, then this has to be said explicitly, it's surely not "common notion".

sinsi

I seem to remember reading somewhere that a simple "or eax,eax" to test for eax=0 was a bad idea and to use "test eax,eax" (Agner maybe?), because of the stall.
I remember because typing "or" was easier than typing "test", but my habit now is to use "test".

Off topic, but I use "sub eax,eax" rather than "xor eax,eax" because it sounds better in my brain when I say it...
Light travels faster than sound, that's why some people seem bright until you hear them.

hutch--

japheth,

> No. If this were true, then the term "non-destructive write" would be nonsense.

The term "non destructive write" IS nonsense. A write is a write is a write. code like,


mov eax, eax


differs from code like,


mov eax, ecx


only in what is written to EAX.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Mark Jones

I agree with Sinsi here. Using an Athlon box, if I take "de-facto standard code" which uses instructions like  ADD REG,1  and  CMP REG,0  and replace those with  INC REG  and  TEST REG,REG  then the code generally runs much faster.

Of course there are inherent differences in the various processors and now this code will run like crap on an Intel CPU... ::)

Therefore, we must all use SSE5 to test for zero instead! :bdg :lol
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

jj2007

Quote from: sinsi on August 12, 2008, 12:01:12 PM
I seem to remember reading somewhere that a simple "or eax,eax" to test for eax=0 was a bad idea and to use "test eax,eax" (Agner maybe?), because of the stall.

Agner, microarchitecture, page 45:
If the SUB ECX,EAX instruction in the first triplet is changed to CMP ECX,EAX then ECX is not written to, and we will get a stall

Hmmm... so we must use a destructive instruction to avoid a stall?

I don't want to appear stupid, but I am still waiting for an explanation, better: a URL, explaining which cache is affected by or eax, eax
I have tried Google with a highly specific destructive "test eax" "or eax" search, but it yields only 15 hits, most of them irrelevant; and this thread here is on top (yeah, they are fast at Google :U).

EDIT: Slowly going mad. I want to understand these things... I even downloaded the Intel® Architecture Optimization Reference Manual here. 322 pages, 330 times cache, 0 times destructive.

MichaelW

The only relevant thing I could find in Agner Fog's recent optimization manuals was the statement that TEST can cause partial flags stalls when followed by LAHF or PUSHF(D), where under the same conditions AND and OR will not, but I think this is not really worth considering because of the unusual conditions.

From the IA-32 Intel Architecture Optimization Reference Manual (24896611.pdf):
Quote
Use test when comparing a value in a register with zero. Test essentially ands the operands together without writing to a destination register. Test is preferred over and because and produces an extra result register. Test is better than cmp ..., 0 because the instruction size is smaller.

Assembly/Compiler Coding Rule 50. (ML impact, M generality)
Use the test instruction instead of and when the result of the logical and is not used. This saves uops in execution. Use a test if a register with itself instead of a cmp of the register to zero, this saves the need to encode the zero and saves encoding space. Avoid comparing a constant to a memory operand. It is preferable to load the memory operand and compare the constant to a register.

I think the reasonable approach would be to assume that under normal conditions TEST may be faster. I tend to use TEST, same as I tend to use XOR reg, reg instead of MOV reg, 0, even though AFAIK the former had a speed advantage only on the 8086/88.
eschew obfuscation

jj2007

Quote from: MichaelW on August 12, 2008, 03:50:53 PM
The only relevant thing I could find in Agner Fog's recent optimization manuals was the statement that TEST can cause partial flags stalls when followed by LAHF or PUSHF(D), where under the same conditions AND and OR will not, but I think this is not really worth considering because of the unusual conditions.
Agreed.
Quote
From the IA-32 Intel Architecture Optimization Reference Manual (24896611.pdf):
Quote
Use test when comparing a value in a register with zero. Test essentially ands the operands together without writing to a destination register. Test is preferred over and because and produces an extra result register.
Which may indeed affect speed, although it's probably not the original physical register, due to register renaming. The accent is on "might", because setting flags is also a write operation, and who knows whether the two writes can be performed in parallel or not...

Quote
I think the reasonable approach would be to assume that under normal conditions TEST may be faster. I tend to use TEST, same as I tend to use XOR reg, reg instead of MOV reg, 0, even though AFAIK the former had a speed advantage only on the 8086/88.
The whole point is and was whether this potential hypothetical but yet unproven gain is important enough to incite newbies to fumble with jxx instead of using .if eax==0... oh well. By the way, I also use test if for some reason I don't want the high level syntax.
Thanks for a sober post, Michael.

Rockoon

Any instruction sequence with more uops than another "functionally equivilent" sequence could very well effect performance on intel cpu's with a Trace Cache  <-- a cache

We sometimes forget that we arent programming for 386's anymore.

Why is there an arguement about this?

An extra operation, even if it does nothing functional, is still an extra operation. That extra operation may not carry with it a performance penalty (today) but its a silly arguement to run headlong into your code with the idea that extra operations are "OK" if they don't effect performance.

Given two methods that perform equivilently but one of them actualy does less work, there is no debate at all about which one is superior. Really. No debate at all.

Choosing the extra operation in equivilent performance situations is just being stuborn, and promoting it would be irresponsible.

As with all performance-related questions, first you should determine which methods are most performant in real world scenarios (which doesnt mean some tight loop around code fragments.) Then you break ties using logic and common sense, considering relevant factors.

Relevant factors may include resource usage, but may also consider other existing and future architectures (legacy & longevity) or even power usage. You might shrug off power usage for your particular application, but others may not be so comfortable in doing so. Maybe their code has a target of a trillion hours of execution time spread over a hundred million customers. That small per-iteration power savings really starts to look like something at these scales. Whats relevant is very situational.

Quote from: jj2007 on August 12, 2008, 04:03:15 PM
The whole point is and was whether this potential hypothetical but yet unproven gain is important enough to incite newbies to fumble with jxx instead of using .if eax==0... oh well. By the way, I also use test if for some reason I don't want the high level syntax.
Thanks for a sober post, Michael.

Fumble with the jxx instructions?

vs what? Thinking that .if is a real instruction?

Newbies, right?

Should newbies really be using macros that emit more than 1 instruction, ever?
Should newbies be shielded from the details of the flags register?

The flags register should be thrust upon newbies, because nearly every instruction executed within a program writes to it, and the best tweak-style optimizations leverage that fact.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

Mark_Larson

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

jj2007

Quote from: Rockoon on August 26, 2008, 01:36:05 PM
Why is there an arguement about this?
Because "your" side produces no evidence (lab tests, timings), just hearsay

Quote
You might shrug off power usage for your particular application
I never would. Show many how many % of electricity you can save by poking your code directly into memory, instead of fumbling with hl constructs

Quote
Fumble with the jxx instructions?
vs what? Thinking that .if is a real instruction?

Microsoft programmers thought 10 years ago that .if was a real Masm instruction. It still works in Masm 9.0

It is always a pleasure to see your blood pressure rising, Rockoon  :green

MichaelW

eschew obfuscation

jj2007

Quote from: MichaelW on August 26, 2008, 05:31:41 PM
MASM 6.0 was released ~1991.


So we can consider .if "evil legacy code"  :toothy