News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

STD instruction

Started by dedndave, September 24, 2009, 12:36:21 PM

Previous topic - Next topic

Astro

Sorry - chaos here as usual...

CPU 0: Intel(R) Core(TM)2 CPU          6700  @ 2.66GHz MMX SSSE3 Cores: 2

...CLD
2       clock cycles
2       clock cycles
2       clock cycles

CLD...CLD
6       clock cycles
6       clock cycles
6       clock cycles

STD...CLD
52      clock cycles
52      clock cycles
52      clock cycles


STD...CLD pair is a bit variable - between 52 and 56 (e.g. 56, 52, 52).

Best regards,
Astro.

dsouza123


CPU 0: Intel(R) Core(TM)2 Duo CPU     P9500  @ 2.53GHz MMX SSE4.1 Cores: 2

...CLD
1       clock cycles
1       clock cycles
1       clock cycles

CLD...CLD
4       clock cycles
4       clock cycles
4       clock cycles

STD...CLD
52      clock cycles
53      clock cycles
51      clock cycles


and


CPU 0: AMD Athlon(tm) Processor MMX+ 3DNow!+ Cores: 1

...CLD
1       clock cycles
1       clock cycles
1       clock cycles

CLD...CLD
2       clock cycles
2       clock cycles
2       clock cycles

STD...CLD
1       clock cycles
0       clock cycles
0       clock cycles

UtillMasm

and
C:\mytest>DFtime.exe
CPU 0: Genuine Intel(R) CPU           T2400  @ 1.83GHz MMX SSE3 Cores: 2

...CLD
10      clock cycles
10      clock cycles
10      clock cycles

CLD...CLD
20      clock cycles
20      clock cycles
20      clock cycles

STD...CLD
20      clock cycles
20      clock cycles
20      clock cycles

Press any key to continue ...

C:\mytest>

frktons

This is quite an old thread, but, I've to say it is the first time a testing
routine gives me the correct SSE version, so here I post nevertheless.

Compliment Dave, you got my correct CPU  :U

CPU 0: Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz MMX SSSE3 Cores: 2

...CLD
2       clock cycles
2       clock cycles
2       clock cycles

CLD...CLD
6       clock cycles
6       clock cycles
6       clock cycles

STD...CLD
52      clock cycles
52      clock cycles
52      clock cycles

Press any key to continue ...


By the way, what happened to this preliminary version?
Is it still preliminary or it is now a grown one?

Frank
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

if you use the forum search tool, you can find newer versions
however, i had to put the overall project on hold
a complete implementation requires that i learn KMD's - it's on my list  :bg

mineiro


CPU 0: Intel(R) Pentium(R) Dual  CPU  E2160  @ 1.80GHz MMX SSSE3 Cores: 2

...CLD
2       clock cycles
2       clock cycles
2       clock cycles

CLD...CLD
6       clock cycles
6       clock cycles
6       clock cycles

STD...CLD
52      clock cycles
52      clock cycles
52      clock cycles

Press any key to continue ...

Rockoon

Things havent changed on AMD's:

CPU 0: AMD Phenom(tm) II X6 1055T Processor MMX+ SSE4a 3DNow!+ Cores: 6

...CLD
-1      clock cycles
0       clock cycles
-1      clock cycles

CLD...CLD
-1      clock cycles
-1      clock cycles
-1      clock cycles

STD...CLD
-1      clock cycles
-1      clock cycles
-1      clock cycles
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

KeepingRealBusy

Quote from: dedndave on September 28, 2009, 01:42:38 PM
ok guys - i got a test that shows the issue on my machine...

CPU 0: Intel(R) Pentium(R) 4 CPU 3.00GHz MMX SSE3 Cores: 2

...CLD
49      clock cycles
49      clock cycles
49      clock cycles

CLD...CLD
104     clock cycles
104     clock cycles
104     clock cycles

STD...CLD
239     clock cycles
238     clock cycles
238     clock cycles

program and source attached...

Dave,

Here are my times on my P4:


CPU 0: Intel(R) Pentium(R) 4 CPU 3.20GHz MMX SSE2 Cores: 2

...CLD
45      clock cycles
45      clock cycles
45      clock cycles

CLD...CLD
93      clock cycles
93      clock cycles
93      clock cycles

STD...CLD
93      clock cycles
94      clock cycles
94      clock cycles

Press any key to continue ...


Dave

xandaz

   May i ask what u use to check how much cycles are spent?
   Thanks and bye

dedndave

i used MichaelW's timing macros

i found a solution that works pretty well
in this example, i wanted a cleared DF
if you want a set DF, you still need STD   :tdown
        pushfd
        pop     edx
        test    dh,40h
        jz      @F

        cld

@@:

;do string operations here

        test    dh,40h
        jz      @F

        std

@@:

the second test is only needed if you want to return the DF to it's original state
if you just want to leave it cleared, omit that part
the net effect of the above code is the same as pushf/popf, but can be faster
PUSHFD is ok, but POPFD, CLD, and STD are slow
this code, when modified to 16-bit, solves the issue mentioned in the 16-bit sub-forum thread, as well
http://www.masm32.com/board/index.php?topic=14699.msg119416#msg119416
i still don't really have a handle on that problem yet - lol

KeepingRealBusy

dedndave,

Do you have any words of wisdom about the following two timings, note they are both P4's? Any reason that the last "STD...CLD" should be so different? For your P4, each test doubled in time, but for my P4, the last two tests were the same?

Quote
Posted by: dedndave
ok guys - i got a test that shows the issue on my machine...

CPU 0: Intel(R) Pentium(R) 4 CPU 3.00GHz MMX SSE3 Cores: 2

...CLD
49      clock cycles
49      clock cycles
49      clock cycles

CLD...CLD
104     clock cycles
104     clock cycles
104     clock cycles

STD...CLD
239     clock cycles
238     clock cycles
238     clock cycles

program and source attached...

Dave,

Quote
Posted by: KeepingRealBusy
Here are my times on my P4:

Code:
CPU 0: Intel(R) Pentium(R) 4 CPU 3.20GHz MMX SSE2 Cores: 2

...CLD
45      clock cycles
45      clock cycles
45      clock cycles

CLD...CLD
93      clock cycles
93      clock cycles
93      clock cycles

STD...CLD
93      clock cycles
94      clock cycles
94      clock cycles

Press any key to continue ...

Dave

Dave.


dedndave

i don't know about any words of wisdom - lol

but - the last one is better - not worse
it is possible that the OS has something to do with it
i am not sure about the precise mechanics - maybe Clive or one of the other guys can shed some light
it may have something to do with the OS checking for priviledge level

some time ago, MichaelW and i were playing with the IRET instruction
we wanted to use it as a serializing instruction to replace CPUID in the timing macros
i was surprised by how long it takes
but, i guess it is a similar case
the OS needs to verify that the target address of the IRET is allowable

this appears to be the same kind of issue
altering some of the flags requires lower (logically higher) priviledge
so, when you change any flags directly, tests have to be made to insure it is allowed
this could be upgraded in newer CPU's, as they only test the priviledge-critical flags