News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fast saving of FPU registers

Started by jj2007, February 09, 2012, 03:40:16 PM

Previous topic - Next topic

FORTRANS

Hi,

  Here are some more.

Cheers,

Steve N.


{P-III}

pre-P4 (SSE1)
211   cycles for fsave/frstor
51   cycles for FpuSave 4
16   cycles for FpuSave 3
12   cycles for FpuSave 2
4   cycles for FpuSave 1

195   cycles for fsave/frstor
23   cycles for FpuSave 4
21   cycles for FpuSave 3
9   cycles for FpuSave 2
19   cycles for FpuSave 1

198   cycles for fsave/frstor
24   cycles for FpuSave 4
16   cycles for FpuSave 3
22   cycles for FpuSave 2
4   cycles for FpuSave 1

183   cycles for fsave/frstor
23   cycles for FpuSave 4
49   cycles for FpuSave 3
9   cycles for FpuSave 2
13   cycles for FpuSave 1

198   cycles for fsave/frstor
58   cycles for FpuSave 4
16   cycles for FpuSave 3
13   cycles for FpuSave 2
4   cycles for FpuSave 1


--- ok ---

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
227   cycles for fsave/frstor
32   cycles for FpuSave 4
15   cycles for FpuSave 3
14   cycles for FpuSave 2
3   cycles for FpuSave 1

227   cycles for fsave/frstor
20   cycles for FpuSave 4
15   cycles for FpuSave 3
8   cycles for FpuSave 2
18   cycles for FpuSave 1

230   cycles for fsave/frstor
20   cycles for FpuSave 4
15   cycles for FpuSave 3
14   cycles for FpuSave 2
3   cycles for FpuSave 1

212   cycles for fsave/frstor
20   cycles for FpuSave 4
31   cycles for FpuSave 3
8   cycles for FpuSave 2
10   cycles for FpuSave 1

212   cycles for fsave/frstor
33   cycles for FpuSave 4
15   cycles for FpuSave 3
15   cycles for FpuSave 2
3   cycles for FpuSave 1


--- ok ---

Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
212   cycles for fsave/frstor
32   cycles for FpuSave 4
15   cycles for FpuSave 3
15   cycles for FpuSave 2
3   cycles for FpuSave 1

214   cycles for fsave/frstor
20   cycles for FpuSave 4
15   cycles for FpuSave 3
8   cycles for FpuSave 2
17   cycles for FpuSave 1

215   cycles for fsave/frstor
20   cycles for FpuSave 4
15   cycles for FpuSave 3
13   cycles for FpuSave 2
3   cycles for FpuSave 1

197   cycles for fsave/frstor
20   cycles for FpuSave 4
29   cycles for FpuSave 3
8   cycles for FpuSave 2
10   cycles for FpuSave 1

198   cycles for fsave/frstor
34   cycles for FpuSave 4
15   cycles for FpuSave 3
15   cycles for FpuSave 2
3   cycles for FpuSave 1


--- ok ---

{P-MMX}

pre-P4181   cycles for fsave/frstor
86   cycles for FpuSave 4
34   cycles for FpuSave 3
42   cycles for FpuSave 2
13   cycles for FpuSave 1

181   cycles for fsave/frstor
44   cycles for FpuSave 4
61   cycles for FpuSave 3
24   cycles for FpuSave 2
22   cycles for FpuSave 1

180   cycles for fsave/frstor
80   cycles for FpuSave 4
33   cycles for FpuSave 3
41   cycles for FpuSave 2
13   cycles for FpuSave 1

179   cycles for fsave/frstor
43   cycles for FpuSave 4
61   cycles for FpuSave 3
23   cycles for FpuSave 2
22   cycles for FpuSave 1

180   cycles for fsave/frstor
80   cycles for FpuSave 4
33   cycles for FpuSave 3
42   cycles for FpuSave 2
13   cycles for FpuSave 1


--- ok ---

rags

AMD Athlon(tm) II X2 215 Processor (SSE3)
384     cycles for fsave/frstor
116     cycles for FpuSave 4
80      cycles for FpuSave 3
124     cycles for FpuSave 2
53      cycles for FpuSave 1

398     cycles for fsave/frstor
118     cycles for FpuSave 4
161     cycles for FpuSave 3
58      cycles for FpuSave 2
52      cycles for FpuSave 1

407     cycles for fsave/frstor
222     cycles for FpuSave 4
79      cycles for FpuSave 3
57      cycles for FpuSave 2
52      cycles for FpuSave 1

388     cycles for fsave/frstor
94      cycles for FpuSave 4
81      cycles for FpuSave 3
56      cycles for FpuSave 2
93      cycles for FpuSave 1

428     cycles for fsave/frstor
115     cycles for FpuSave 4
80      cycles for FpuSave 3
132     cycles for FpuSave 2
52      cycles for FpuSave 1
God made Man, but the monkey applied the glue -DEVO

jj2007

Thanks to everybody :U

I think we can leave it "as is" - it has been sufficiently proven that partial saving is a lot faster than fsave/frstor. It will hardly find a useful application in speeding up an innermost loop, but ok we learnt something ;-)