News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

STD instruction

Started by dedndave, September 24, 2009, 12:36:21 PM

Previous topic - Next topic

MichaelW


CPU 0: Fam 6 Mod 7 xFam 0 xMod 0 Type 0 Step 3 MMX SSE Cores: 1

...CLD
10      clock cycles
10      clock cycles
10      clock cycles

CLD...CLD
18      clock cycles
18      clock cycles
18      clock cycles

STD...CLD
18      clock cycles
18      clock cycles
18      clock cycles


It occurred to me that the problem could be due to an "errata" in your processor that the BIOS is correcting by applying a micro-code patch.

http://cseweb.ucsd.edu/~calder/papers/ICCD-06-HWPatch.pdf
eschew obfuscation

dedndave

you may be right, Michael - that would make sense
i didn't see a specific mention of STD in that paper - that doesn't mean it isn't one of the cases he is speaking of
i was thinking it may be an overhead problem in the out-of-order operation scheme
the processor is looking through the code-stream for something else to chew on
and - STD causes that mechanism to operate more slowly due to the number of possible affected instructions
i am not a big fan of out-of-order instructions - lol
it kind of takes some of the fun out of programming in assembler, even if does speed things up

EDIT - btw - that guy has several other interesting papers, as well - great site   :U

EDIT again - if i run the STD in a timer loop alone - i get 13 cycles (CLD after the timer loop)

dedndave

Hutch ? Astro ? Jochen ?
i thought you guys wanted to run this for me
i linked it again in case you missed that post
thanks

http://www.masm32.com/board/index.php?action=dlattach;topic=12368.0;id=6694

cobold

Dave,

I don't think this will help you, but I ran dftime on my machine (WIN XP SP2) with following "funny" results:


CPU 0: AMD Athlon(TM) XP 2000+ MMX+ SSE 3DNow!+ Cores: 1

...CLD
1       clock cycles
0       clock cycles
0       clock cycles

CLD...CLD
2       clock cycles
2       clock cycles
2       clock cycles

STD...CLD
0       clock cycles
0       clock cycles
0       clock cycles

Press any key to continue ...


rgds
cobold

dedndave

actually, it does help - lol
i can see that it is fast in all three cases

jj2007

Quote from: dedndave on September 28, 2009, 06:00:50 PM
Hutch ? Astro ? Jochen ?
i thought you guys wanted to run this for me
i linked it again in case you missed that post
thanks

http://www.masm32.com/board/index.php?action=dlattach;topic=12368.0;id=6694

Here it is :bg

CPU 0: Intel(R) Celeron(R) M CPU        420  @ 1.60GHz MMX SSE3 Cores: 1

...CLD
10      clock cycles
10      clock cycles
10      clock cycles

CLD...CLD
19      clock cycles
19      clock cycles
19      clock cycles

STD...CLD
19      clock cycles
19      clock cycles
19      clock cycles

BlackVortex

CPU 0: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz MMX SSE4.1 Cores: 2

...CLD
1       clock cycles
1       clock cycles
1       clock cycles

CLD...CLD
4       clock cycles
4       clock cycles
4       clock cycles

STD...CLD
52      clock cycles
52      clock cycles
52      clock cycles

FORTRANS


CPU 0: Intel(R) Pentium(R) M processor 1.70GHz MMX SSE2 Cores: 1

...CLD
10   clock cycles
10   clock cycles
10   clock cycles

CLD...CLD
19   clock cycles
19   clock cycles
19   clock cycles

STD...CLD
19   clock cycles
19   clock cycles
19   clock cycles

Press any key to continue ...
CPU 0: Fam 6 Mod 8 xFam 0 xMod 0 Type 0 Step 3 MMX SSE Cores: 1

...CLD
10   clock cycles
10   clock cycles
10   clock cycles

CLD...CLD
18   clock cycles
18   clock cycles
18   clock cycles

STD...CLD
18   clock cycles
18   clock cycles
18   clock cycles

Press any key to continue ...


HTH,

Steve N.

dedndave

thanks all
i am surprised to see it act up on the core duo

sinsi

Added a couple of tests, just for completeness.

CPU 0: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz MMX SSSE3 Cores: 4

...STD
15      clock cycles
15      clock cycles
15      clock cycles

...CLD
2       clock cycles
2       clock cycles
2       clock cycles

CLD...CLD
6       clock cycles
6       clock cycles
6       clock cycles

STD...CLD
53      clock cycles
52      clock cycles
52      clock cycles

STD...STD
28      clock cycles
28      clock cycles
28      clock cycles
Light travels faster than sound, that's why some people seem bright until you hear them.

dedndave

thanks Sinsi
50 cycles is still longer than it should be - although not nearly as bad as my p4 prescott - lol

ThexDarksider

I ran that too.

CPU 0: Intel(R) Atom(TM) CPU N270   @ 1.60GHz MMX SSSE3 Cores: 2

...CLD
30      clock cycles
19      clock cycles
22      clock cycles

CLD...CLD
30      clock cycles
30      clock cycles
29      clock cycles

STD...CLD
89      clock cycles
87      clock cycles
90      clock cycles


My CPU usage is at 30% minimal, mostly around 45%. If that matters, lol.

What are STD and CLD for, anyway? Are they for hooking interrupts or something?

Magnum

STD sets the direction flag and CLD clears the direction flag.

They are commonly used when searching.

Andy
Have a great day,
                         Andy

FORTRANS

Hi,

   The direction flag controls how the string instructions are used.
The string instructions use DI (EDI) and SI (ESI) to access memory.


        MOVSB           ; Move String Byte.

Is equivalent to

        MOV     BYTE PTR [DI],[SI]      ; Move (copy) the byte from where DS:SI
                                        ; points to where ES:DI is pointing.
                                        ; Of couse that kind of move is normally illegal,
                                        ; which is why the MOVS is useful.
        INC     DI      ; If the direction flag is clear.
        INC     SI      ; If the flag is set these would be DECrements.


HTH,

Steve N.

ThexDarksider

Oh I think I understand now. :bg