News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

register usage performance

Started by jag, February 10, 2007, 07:12:12 PM

Previous topic - Next topic

jag

I am wondering which would be faster:

mov eax, [eax]
mov eax, [eax]
mov edi, [eax]

or

mov edi, [edi]
mov edi, [edi]
mov edi, [edi]

Is it faster performance-wise to use the same register. Does this somehow make the cpu have to look at only 1 register and do stuff faster?

u

Tested, and it's exactly the same number of cycles on AMD cpus. (at least on Sempron 3000+)
Please use a smaller graphic in your signature.

jag

Ya, I tested too after I found out about the code-timing macros. Thanks for the help Ultrano.

The results are not what I would expect. Using eax instead of edi I would have thought to yield better performance. Also, using the same register over and over again too I wouild have thought to boost performance.

hutch--

jag,

There is more to speed that trying to isolate read/write access to one register. While some registers have special tasks using some of older instructions, none are intrinsically faster than the other for general purpose read/write tasks but you can slow code down by trying to use a single register for too many tasks. If speed matters it is often advisable to free up as many registers as possible, you can read Mark Larson's optimisation page on some of these techniques, and then distribute registers where you get the best speed gains.

You can routinely access 7 registers and if you are desperate you can also use ESP if you know what you are doing.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

zooba

Register renaming should mean it makes no difference. However, putting other, unrelated instructions between the reads may let you do more at the same time with less penalty.

Cheers,

Zooba :U

jag

Ya I already know about putting other isntructions in dependency chains. Thanks all.

Polizei

According to the hard theory, both pairs must have exactly the same clocks, and should take exactly the same time to execute. That's why, as zooba said above, because of the register renaming. I can speak only for Intel CPUs, though as far as I remember, I was always been using an Intel CPU. Everybody knows that the accumulator is the fast register for all operations. Yeah, but Pentim, Pentium Pro, Pentium MMX, Pentium 2 and Pentium 3 have 40 temporary accumulator registers, in order to provide faster execution of various code. Pentium 4, however, has 128 temporary accumulators. Also, the speed of a code that is to be executed is not only the clocks it take to complete, but the time it takes. For more information, read the "Pentium Optimization Tips" by Agner Fog. Sorry that I cannot provide an URL for the files.

Human

well to be ohnest 1st version should be faster
due mov edi,[edi] must use edi as index and then write again to it so it makes stall, but as your code isnt pairable at all, every command makes stall so it looks like both are the same. when you mix it with some other code to make it pairable then you should see difference with rdtsc.

Polizei

I think the same, that the first version may be faster, though it's using another register in the third line, but I don't think that someone can test it, and that someone can really see the difference about time.

Rockoon

There should be no register preference in regards to execution units.. but decoding time may be different when operations on rax/eax/ax have shorter opcodes.

When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.