The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: jj2007 on February 09, 2012, 03:40:16 PM

Title: Fast saving of FPU registers
Post by: jj2007 on February 09, 2012, 03:40:16 PM
Sometimes you need to use the full FPU, and if you need to use in between the str$ macro or similar routines, you may discover that FPU regs ST5 ... ST7 are lost.
The fsave/frstor combi is designed to care for that scenario, but it is slow. Since MasmBasic (http://www.masm32.com/board/index.php?topic=12460) Str$ does indeed trash two FPU regs (6+7), I needed a way to save them in specific cases:

FpuSave   ; no args means save 2 regs: ST(6) and ST(7)
Print Str$("This is a test with eax=%i", eax)   ; Str$ trashes ST(6) and ST(7)
FpuRestore   ; get ST(6) and ST(7) back, correct the stack

For saving 4 FPU regs, the macros are about a factor 15 faster than the fsave/frstor combination.
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
1193    cycles for fsave/frstor
81      cycles for FpuSave 4
60      cycles for FpuSave 3
41      cycles for FpuSave 2
32      cycles for FpuSave 1

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
298     cycles for fsave/frstor
25      cycles for FpuSave 4
19      cycles for FpuSave 3
16      cycles for FpuSave 2
9       cycles for FpuSave 1


Source attached (MB not required, it's plain Masm32).

EDIT: Attachment removed, see new version below qWord's warning.
Title: Re: Fast saving of FPU registers
Post by: oex on February 09, 2012, 05:46:08 PM

Intel(R) Core(TM) i3-2310M CPU @ 2.10GHz (SSE4)
376     cycles for fsave/frstor
63      cycles for FpuSave 4
43      cycles for FpuSave 3
25      cycles for FpuSave 2
6       cycles for FpuSave 1

435     cycles for fsave/frstor
65      cycles for FpuSave 4
44      cycles for FpuSave 3
25      cycles for FpuSave 2
8       cycles for FpuSave 1
Title: Re: Fast saving of FPU registers
Post by: clive on February 09, 2012, 05:54:02 PM
AMD Phenom(tm) II X6 1055T Processor (SSE3)
286     cycles for fsave/frstor
63      cycles for FpuSave 4
47      cycles for FpuSave 3
35      cycles for FpuSave 2
30      cycles for FpuSave 1

285     cycles for fsave/frstor
56      cycles for FpuSave 4
43      cycles for FpuSave 3
41      cycles for FpuSave 2
30      cycles for FpuSave 1
Title: Re: Fast saving of FPU registers
Post by: qWord on February 09, 2012, 06:58:00 PM
hi,
your macros missalgin the stack for all numregs=uneven. It should be N*16 instead of N*10.
Title: Re: Fast saving of FPU registers
Post by: jj2007 on February 09, 2012, 07:23:54 PM
Quote from: qWord on February 09, 2012, 06:58:00 PM
hi,
your macros missalgin the stack for all numregs=uneven. It should be N*16 instead of N*10.

Thanks, qWord. You are absolutely right. It could be N*12, too, but *10 is definitely a fat bug :red
New version attached. The MasmBasic library (http://www.masm32.com/board/index.php?topic=12460.0) has also been updated, as MasmBasic9February2012b.zip - "b" like "bug free" :bg

By the way:
FpuSave MACRO numregs:=<2>
LOCAL ct
  FpuSaveRegs=numregs
  FpuSaveSize=10  ; <<<<<<<<<<<<<<<< bad size
...
.code
FillFPU
FpuSave 3
MsgBox 0, "You won't see this", "Hi", MB_OK
FpuRestore


At least on Win XP, that shows a crippled MessageBox, due to the misaligned stack. And if you check in Olly, you will see that a simple MessageBox trashes all FPU regs :snooty:
Title: Re: Fast saving of FPU registers
Post by: clive on February 09, 2012, 07:28:57 PM
Intel(R) Pentium(R) D CPU 3.00GHz (SSE3)
1099    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

1104    cycles for fsave/frstor
79      cycles for FpuSave 4
111     cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

1103    cycles for fsave/frstor
113     cycles for FpuSave 4
59      cycles for FpuSave 3
42      cycles for FpuSave 2
32      cycles for FpuSave 1

1007    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

999     cycles for fsave/frstor
79      cycles for FpuSave 4
61      cycles for FpuSave 3
106     cycles for FpuSave 2
32      cycles for FpuSave 1


AMD Phenom(tm) II X6 1055T Processor (SSE3)
285     cycles for fsave/frstor
34      cycles for FpuSave 4
23      cycles for FpuSave 3
37      cycles for FpuSave 2
16      cycles for FpuSave 1

285     cycles for fsave/frstor
35      cycles for FpuSave 4
49      cycles for FpuSave 3
17      cycles for FpuSave 2
15      cycles for FpuSave 1

286     cycles for fsave/frstor
71      cycles for FpuSave 4
24      cycles for FpuSave 3
17      cycles for FpuSave 2
15      cycles for FpuSave 1

286     cycles for fsave/frstor
34      cycles for FpuSave 4
24      cycles for FpuSave 3
16      cycles for FpuSave 2
27      cycles for FpuSave 1

285     cycles for fsave/frstor
34      cycles for FpuSave 4
24      cycles for FpuSave 3
39      cycles for FpuSave 2
15      cycles for FpuSave 1
Title: Re: Fast saving of FPU registers
Post by: oex on February 09, 2012, 08:34:53 PM

Intel(R) Core(TM) i3-2310M CPU @ 2.10GHz (SSE4)
348     cycles for fsave/frstor
58      cycles for FpuSave 4
45      cycles for FpuSave 3
22      cycles for FpuSave 2
3       cycles for FpuSave 1

437     cycles for fsave/frstor
62      cycles for FpuSave 4
43      cycles for FpuSave 3
25      cycles for FpuSave 2
7       cycles for FpuSave 1

403     cycles for fsave/frstor
41      cycles for FpuSave 4
46      cycles for FpuSave 3
24      cycles for FpuSave 2
7       cycles for FpuSave 1

384     cycles for fsave/frstor
63      cycles for FpuSave 4
38      cycles for FpuSave 3
21      cycles for FpuSave 2
6       cycles for FpuSave 1

435     cycles for fsave/frstor
63      cycles for FpuSave 4
44      cycles for FpuSave 3
24      cycles for FpuSave 2
8       cycles for FpuSave 1
Title: Re: Fast saving of FPU registers
Post by: DRX on February 10, 2012, 04:38:44 AM
AMD E-350 Processor (SSE4)
344     cycles for fsave/frstor
56      cycles for FpuSave 4
43      cycles for FpuSave 3
41      cycles for FpuSave 2
19      cycles for FpuSave 1

281     cycles for fsave/frstor
56      cycles for FpuSave 4
50      cycles for FpuSave 3
26      cycles for FpuSave 2
19      cycles for FpuSave 1

281     cycles for fsave/frstor
67      cycles for FpuSave 4
43      cycles for FpuSave 3
26      cycles for FpuSave 2
19      cycles for FpuSave 1

282     cycles for fsave/frstor
57      cycles for FpuSave 4
43      cycles for FpuSave 3
26      cycles for FpuSave 2
27      cycles for FpuSave 1

276     cycles for fsave/frstor
56      cycles for FpuSave 4
43      cycles for FpuSave 3
41      cycles for FpuSave 2
19      cycles for FpuSave 1
Title: Re: Fast saving of FPU registers
Post by: dedndave on February 10, 2012, 04:48:12 AM
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
1108    cycles for fsave/frstor
113     cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

1105    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
106     cycles for FpuSave 1

1104    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
40      cycles for FpuSave 2
32      cycles for FpuSave 1

1009    cycles for fsave/frstor
81      cycles for FpuSave 4
113     cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

1016    cycles for fsave/frstor
114     cycles for FpuSave 4
59      cycles for FpuSave 3
41      cycles for FpuSave 2
32      cycles for FpuSave 1
Title: Re: Fast saving of FPU registers
Post by: braincell on February 10, 2012, 11:22:14 PM
Thanks, this should come in very handy and speed up my debugging, especially since i'm almost always working with the FPU.  :U

Intel(R) Core(TM)2 Duo CPU     E6750  @ 2.66GHz (SSE4)
381     cycles for fsave/frstor
29      cycles for FpuSave 4
20      cycles for FpuSave 3
14      cycles for FpuSave 2
4       cycles for FpuSave 1

356     cycles for fsave/frstor
30      cycles for FpuSave 4
31      cycles for FpuSave 3
12      cycles for FpuSave 2
9       cycles for FpuSave 1

363     cycles for fsave/frstor
49      cycles for FpuSave 4
20      cycles for FpuSave 3
25      cycles for FpuSave 2
5       cycles for FpuSave 1

346     cycles for fsave/frstor
39      cycles for FpuSave 4
20      cycles for FpuSave 3
15      cycles for FpuSave 2
11      cycles for FpuSave 1

337     cycles for fsave/frstor
28      cycles for FpuSave 4
32      cycles for FpuSave 3
28      cycles for FpuSave 2
4       cycles for FpuSave 1
Title: Re: Fast saving of FPU registers
Post by: clive on February 11, 2012, 03:40:19 AM
AMD A6-3400M APU with Radeon(tm) HD Graphics (SSE3)
233     cycles for fsave/frstor
60      cycles for FpuSave 4
40      cycles for FpuSave 3
64      cycles for FpuSave 2
27      cycles for FpuSave 1

228     cycles for fsave/frstor
35      cycles for FpuSave 4
85      cycles for FpuSave 3
12      cycles for FpuSave 2
10      cycles for FpuSave 1

190     cycles for fsave/frstor
41      cycles for FpuSave 4
27      cycles for FpuSave 3
29      cycles for FpuSave 2
30      cycles for FpuSave 1

230     cycles for fsave/frstor
59      cycles for FpuSave 4
39      cycles for FpuSave 3
29      cycles for FpuSave 2
51      cycles for FpuSave 1

226     cycles for fsave/frstor
61      cycles for FpuSave 4
41      cycles for FpuSave 3
66      cycles for FpuSave 2
30      cycles for FpuSave 1


Intel(R) Atom(TM) CPU N450   @ 1.66GHz (SSE4
600     cycles for fsave/frstor
111     cycles for FpuSave 4
79      cycles for FpuSave 3
51      cycles for FpuSave 2
27      cycles for FpuSave 1

552     cycles for fsave/frstor
103     cycles for FpuSave 4
96      cycles for FpuSave 3
49      cycles for FpuSave 2
28      cycles for FpuSave 1

549     cycles for fsave/frstor
118     cycles for FpuSave 4
76      cycles for FpuSave 3
53      cycles for FpuSave 2
25      cycles for FpuSave 1

512     cycles for fsave/frstor
104     cycles for FpuSave 4
75      cycles for FpuSave 3
50      cycles for FpuSave 2
22      cycles for FpuSave 1

514     cycles for fsave/frstor
103     cycles for FpuSave 4
77      cycles for FpuSave 3
82      cycles for FpuSave 2
28      cycles for FpuSave 1
Title: Re: Fast saving of FPU registers
Post by: jj2007 on February 11, 2012, 05:51:40 AM
Clive,
Are you running a lab??
:bg

Quote from: clive on February 11, 2012, 03:40:19 AM
Intel(R) Pentium(R) D CPU 3.00GHz (SSE3)
1099    cycles for fsave/frstor
79      cycles for FpuSave 4

AMD Phenom(tm) II X6 1055T Processor (SSE3)
285     cycles for fsave/frstor
34      cycles for FpuSave 4

AMD A6-3400M APU with Radeon(tm) HD Graphics (SSE3)
233     cycles for fsave/frstor
60      cycles for FpuSave 4

Intel(R) Atom(TM) CPU N450   @ 1.66GHz (SSE4)
600     cycles for fsave/frstor
111     cycles for FpuSave 4
Title: Re: Fast saving of FPU registers
Post by: vanjast on February 11, 2012, 06:27:28 AM
QuoteIntel(R) Pentium(R) D CPU 3.20GHz (SSE3)
1095    cycles for fsave/frstor
112     cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

1095    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
109     cycles for FpuSave 1

1089    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

999     cycles for fsave/frstor
79      cycles for FpuSave 4
113     cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

999     cycles for fsave/frstor
118     cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1


--- ok ---
Title: Re: Fast saving of FPU registers
Post by: clive on February 11, 2012, 02:07:09 PM
Quote from: jj2007Clive, Are you running a lab??

Actually it's more like a Bat Cave than a Lab.

I do have a diverse collection of hardware at my disposal, of which this is a subset. I ignored the ones which duplicated the results already provided, for example the AMD E-350 and C-50 systems are identical performers, as are the 4 and 6 core Phenom II's.

Timing across a wide spectrum was what you were after, right?

Title: Re: Fast saving of FPU registers
Post by: jj2007 on February 11, 2012, 04:18:18 PM
Quote from: clive on February 11, 2012, 02:07:09 PM
Timing across a wide spectrum was what you were after, right?

Yes, thanks :thumbu
Title: Re: Fast saving of FPU registers
Post by: FORTRANS on February 11, 2012, 11:27:00 PM
Hi,

  Here are some more.

Cheers,

Steve N.


{P-III}

pre-P4 (SSE1)
211   cycles for fsave/frstor
51   cycles for FpuSave 4
16   cycles for FpuSave 3
12   cycles for FpuSave 2
4   cycles for FpuSave 1

195   cycles for fsave/frstor
23   cycles for FpuSave 4
21   cycles for FpuSave 3
9   cycles for FpuSave 2
19   cycles for FpuSave 1

198   cycles for fsave/frstor
24   cycles for FpuSave 4
16   cycles for FpuSave 3
22   cycles for FpuSave 2
4   cycles for FpuSave 1

183   cycles for fsave/frstor
23   cycles for FpuSave 4
49   cycles for FpuSave 3
9   cycles for FpuSave 2
13   cycles for FpuSave 1

198   cycles for fsave/frstor
58   cycles for FpuSave 4
16   cycles for FpuSave 3
13   cycles for FpuSave 2
4   cycles for FpuSave 1


--- ok ---

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
227   cycles for fsave/frstor
32   cycles for FpuSave 4
15   cycles for FpuSave 3
14   cycles for FpuSave 2
3   cycles for FpuSave 1

227   cycles for fsave/frstor
20   cycles for FpuSave 4
15   cycles for FpuSave 3
8   cycles for FpuSave 2
18   cycles for FpuSave 1

230   cycles for fsave/frstor
20   cycles for FpuSave 4
15   cycles for FpuSave 3
14   cycles for FpuSave 2
3   cycles for FpuSave 1

212   cycles for fsave/frstor
20   cycles for FpuSave 4
31   cycles for FpuSave 3
8   cycles for FpuSave 2
10   cycles for FpuSave 1

212   cycles for fsave/frstor
33   cycles for FpuSave 4
15   cycles for FpuSave 3
15   cycles for FpuSave 2
3   cycles for FpuSave 1


--- ok ---

Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
212   cycles for fsave/frstor
32   cycles for FpuSave 4
15   cycles for FpuSave 3
15   cycles for FpuSave 2
3   cycles for FpuSave 1

214   cycles for fsave/frstor
20   cycles for FpuSave 4
15   cycles for FpuSave 3
8   cycles for FpuSave 2
17   cycles for FpuSave 1

215   cycles for fsave/frstor
20   cycles for FpuSave 4
15   cycles for FpuSave 3
13   cycles for FpuSave 2
3   cycles for FpuSave 1

197   cycles for fsave/frstor
20   cycles for FpuSave 4
29   cycles for FpuSave 3
8   cycles for FpuSave 2
10   cycles for FpuSave 1

198   cycles for fsave/frstor
34   cycles for FpuSave 4
15   cycles for FpuSave 3
15   cycles for FpuSave 2
3   cycles for FpuSave 1


--- ok ---

{P-MMX}

pre-P4181   cycles for fsave/frstor
86   cycles for FpuSave 4
34   cycles for FpuSave 3
42   cycles for FpuSave 2
13   cycles for FpuSave 1

181   cycles for fsave/frstor
44   cycles for FpuSave 4
61   cycles for FpuSave 3
24   cycles for FpuSave 2
22   cycles for FpuSave 1

180   cycles for fsave/frstor
80   cycles for FpuSave 4
33   cycles for FpuSave 3
41   cycles for FpuSave 2
13   cycles for FpuSave 1

179   cycles for fsave/frstor
43   cycles for FpuSave 4
61   cycles for FpuSave 3
23   cycles for FpuSave 2
22   cycles for FpuSave 1

180   cycles for fsave/frstor
80   cycles for FpuSave 4
33   cycles for FpuSave 3
42   cycles for FpuSave 2
13   cycles for FpuSave 1


--- ok ---
Title: Re: Fast saving of FPU registers
Post by: rags on February 12, 2012, 04:40:32 AM
AMD Athlon(tm) II X2 215 Processor (SSE3)
384     cycles for fsave/frstor
116     cycles for FpuSave 4
80      cycles for FpuSave 3
124     cycles for FpuSave 2
53      cycles for FpuSave 1

398     cycles for fsave/frstor
118     cycles for FpuSave 4
161     cycles for FpuSave 3
58      cycles for FpuSave 2
52      cycles for FpuSave 1

407     cycles for fsave/frstor
222     cycles for FpuSave 4
79      cycles for FpuSave 3
57      cycles for FpuSave 2
52      cycles for FpuSave 1

388     cycles for fsave/frstor
94      cycles for FpuSave 4
81      cycles for FpuSave 3
56      cycles for FpuSave 2
93      cycles for FpuSave 1

428     cycles for fsave/frstor
115     cycles for FpuSave 4
80      cycles for FpuSave 3
132     cycles for FpuSave 2
52      cycles for FpuSave 1
Title: Re: Fast saving of FPU registers
Post by: jj2007 on February 12, 2012, 06:55:20 AM
Thanks to everybody :U

I think we can leave it "as is" - it has been sufficiently proven that partial saving is a lot faster than fsave/frstor. It will hardly find a useful application in speeding up an innermost loop, but ok we learnt something ;-)