... who refuses to push flags because it's so awfully slow. Here is a workaround :bg
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
2608 cycles for pushf/popf
306 cycles for lahf/sahf
893 cycles for lahf/sahf with push/pop eax
2607 cycles for pushf/popf
306 cycles for lahf/sahf
892 cycles for lahf/sahf with push/pop eax
2608 cycles for pushf/popf
306 cycles for lahf/sahf
882 cycles for lahf/sahf with push/pop eax
:bg
LAHF/PUSHFD aren't too bad - it's SAHF/POPFD that are slow - the difference is that they may alter flags
also - CLD/STD are slow and probably STI/CLI (never tried those - lol)
STC, CLC, and CMC seem to be ok :U
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
9693 cycles for pushf/popf
851 cycles for lahf/sahf
1405 cycles for lahf/sahf with push/pop eax
9726 cycles for pushf/popf
847 cycles for lahf/sahf
1405 cycles for lahf/sahf with push/pop eax
9685 cycles for pushf/popf
845 cycles for lahf/sahf
1404 cycles for lahf/sahf with push/pop eax
i am still waiting for the egg :bg
Quote from: dedndave on April 11, 2012, 11:13:10 PM
i am still waiting for the egg :bg
Hey, I gave you a factor 11 speed up for your prescott w/handbrakes on :bg
:P
Hi,
Another set of data points.
Cheers,
Steve
pre-P4 (SSE1)
2521 cycles for pushf/popf
301 cycles for lahf/sahf
1208 cycles for lahf/sahf with push/pop eax
2521 cycles for pushf/popf
301 cycles for lahf/sahf
1208 cycles for lahf/sahf with push/pop eax
2522 cycles for pushf/popf
301 cycles for lahf/sahf
1209 cycles for lahf/sahf with push/pop eax
--- ok ---
pre-P4919 cycles for pushf/popf
613 cycles for lahf/sahf
715 cycles for lahf/sahf with push/pop eax
918 cycles for pushf/popf
614 cycles for lahf/sahf
718 cycles for lahf/sahf with push/pop eax
924 cycles for pushf/popf
614 cycles for lahf/sahf
715 cycles for lahf/sahf with push/pop eax
--- ok ---
Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
2727 cycles for pushf/popf
309 cycles for lahf/sahf
696 cycles for lahf/sahf with push/pop eax
2728 cycles for pushf/popf
309 cycles for lahf/sahf
721 cycles for lahf/sahf with push/pop eax
2727 cycles for pushf/popf
309 cycles for lahf/sahf
680 cycles for lahf/sahf with push/pop eax
--- ok ---
In case somebody is interested in this...
AMD Athlon(tm) II X4 635 Processor (SSE3)
1495 cycles for pushf/popf
395 cycles for lahf/sahf
883 cycles for lahf/sahf with push/pop eax
1500 cycles for pushf/popf
395 cycles for lahf/sahf
883 cycles for lahf/sahf with push/pop eax
1501 cycles for pushf/popf
394 cycles for lahf/sahf
883 cycles for lahf/sahf with push/pop eax
--- ok ---
Thanks. So it seems that on all CPUs a lahf/sahf combi is a much faster way to save the flags. Good to know :P
yah - if only they stored the overflow flag ::)
Isn't this technology something like trying to tune the last 2% of performance out of a T model Ford ?
Quote from: hutch-- on April 13, 2012, 06:13:59 AM
Isn't this technology something like trying to tune the last 2% of performance out of a T model Ford ?
It is true that there is rarely a reason to save and restore flags in a speed-critical loop. On the other hand, 3 instead of 15 cycles is a good argument. Unless you have a better technology to offer, of course.
:bg
OK, 3%. :P