News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fast saving of FPU registers

Started by jj2007, February 09, 2012, 03:40:16 PM

Previous topic - Next topic

jj2007

Sometimes you need to use the full FPU, and if you need to use in between the str$ macro or similar routines, you may discover that FPU regs ST5 ... ST7 are lost.
The fsave/frstor combi is designed to care for that scenario, but it is slow. Since MasmBasic Str$ does indeed trash two FPU regs (6+7), I needed a way to save them in specific cases:

FpuSave   ; no args means save 2 regs: ST(6) and ST(7)
Print Str$("This is a test with eax=%i", eax)   ; Str$ trashes ST(6) and ST(7)
FpuRestore   ; get ST(6) and ST(7) back, correct the stack

For saving 4 FPU regs, the macros are about a factor 15 faster than the fsave/frstor combination.
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
1193    cycles for fsave/frstor
81      cycles for FpuSave 4
60      cycles for FpuSave 3
41      cycles for FpuSave 2
32      cycles for FpuSave 1

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
298     cycles for fsave/frstor
25      cycles for FpuSave 4
19      cycles for FpuSave 3
16      cycles for FpuSave 2
9       cycles for FpuSave 1


Source attached (MB not required, it's plain Masm32).

EDIT: Attachment removed, see new version below qWord's warning.

oex


Intel(R) Core(TM) i3-2310M CPU @ 2.10GHz (SSE4)
376     cycles for fsave/frstor
63      cycles for FpuSave 4
43      cycles for FpuSave 3
25      cycles for FpuSave 2
6       cycles for FpuSave 1

435     cycles for fsave/frstor
65      cycles for FpuSave 4
44      cycles for FpuSave 3
25      cycles for FpuSave 2
8       cycles for FpuSave 1
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

clive

AMD Phenom(tm) II X6 1055T Processor (SSE3)
286     cycles for fsave/frstor
63      cycles for FpuSave 4
47      cycles for FpuSave 3
35      cycles for FpuSave 2
30      cycles for FpuSave 1

285     cycles for fsave/frstor
56      cycles for FpuSave 4
43      cycles for FpuSave 3
41      cycles for FpuSave 2
30      cycles for FpuSave 1
It could be a random act of randomness. Those happen a lot as well.

qWord

hi,
your macros missalgin the stack for all numregs=uneven. It should be N*16 instead of N*10.
FPU in a trice: SmplMath
It's that simple!

jj2007

Quote from: qWord on February 09, 2012, 06:58:00 PM
hi,
your macros missalgin the stack for all numregs=uneven. It should be N*16 instead of N*10.

Thanks, qWord. You are absolutely right. It could be N*12, too, but *10 is definitely a fat bug :red
New version attached. The MasmBasic library has also been updated, as MasmBasic9February2012b.zip - "b" like "bug free" :bg

By the way:
FpuSave MACRO numregs:=<2>
LOCAL ct
  FpuSaveRegs=numregs
  FpuSaveSize=10  ; <<<<<<<<<<<<<<<< bad size
...
.code
FillFPU
FpuSave 3
MsgBox 0, "You won't see this", "Hi", MB_OK
FpuRestore


At least on Win XP, that shows a crippled MessageBox, due to the misaligned stack. And if you check in Olly, you will see that a simple MessageBox trashes all FPU regs :snooty:

clive

#5
Intel(R) Pentium(R) D CPU 3.00GHz (SSE3)
1099    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

1104    cycles for fsave/frstor
79      cycles for FpuSave 4
111     cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

1103    cycles for fsave/frstor
113     cycles for FpuSave 4
59      cycles for FpuSave 3
42      cycles for FpuSave 2
32      cycles for FpuSave 1

1007    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

999     cycles for fsave/frstor
79      cycles for FpuSave 4
61      cycles for FpuSave 3
106     cycles for FpuSave 2
32      cycles for FpuSave 1


AMD Phenom(tm) II X6 1055T Processor (SSE3)
285     cycles for fsave/frstor
34      cycles for FpuSave 4
23      cycles for FpuSave 3
37      cycles for FpuSave 2
16      cycles for FpuSave 1

285     cycles for fsave/frstor
35      cycles for FpuSave 4
49      cycles for FpuSave 3
17      cycles for FpuSave 2
15      cycles for FpuSave 1

286     cycles for fsave/frstor
71      cycles for FpuSave 4
24      cycles for FpuSave 3
17      cycles for FpuSave 2
15      cycles for FpuSave 1

286     cycles for fsave/frstor
34      cycles for FpuSave 4
24      cycles for FpuSave 3
16      cycles for FpuSave 2
27      cycles for FpuSave 1

285     cycles for fsave/frstor
34      cycles for FpuSave 4
24      cycles for FpuSave 3
39      cycles for FpuSave 2
15      cycles for FpuSave 1
It could be a random act of randomness. Those happen a lot as well.

oex


Intel(R) Core(TM) i3-2310M CPU @ 2.10GHz (SSE4)
348     cycles for fsave/frstor
58      cycles for FpuSave 4
45      cycles for FpuSave 3
22      cycles for FpuSave 2
3       cycles for FpuSave 1

437     cycles for fsave/frstor
62      cycles for FpuSave 4
43      cycles for FpuSave 3
25      cycles for FpuSave 2
7       cycles for FpuSave 1

403     cycles for fsave/frstor
41      cycles for FpuSave 4
46      cycles for FpuSave 3
24      cycles for FpuSave 2
7       cycles for FpuSave 1

384     cycles for fsave/frstor
63      cycles for FpuSave 4
38      cycles for FpuSave 3
21      cycles for FpuSave 2
6       cycles for FpuSave 1

435     cycles for fsave/frstor
63      cycles for FpuSave 4
44      cycles for FpuSave 3
24      cycles for FpuSave 2
8       cycles for FpuSave 1
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

DRX

AMD E-350 Processor (SSE4)
344     cycles for fsave/frstor
56      cycles for FpuSave 4
43      cycles for FpuSave 3
41      cycles for FpuSave 2
19      cycles for FpuSave 1

281     cycles for fsave/frstor
56      cycles for FpuSave 4
50      cycles for FpuSave 3
26      cycles for FpuSave 2
19      cycles for FpuSave 1

281     cycles for fsave/frstor
67      cycles for FpuSave 4
43      cycles for FpuSave 3
26      cycles for FpuSave 2
19      cycles for FpuSave 1

282     cycles for fsave/frstor
57      cycles for FpuSave 4
43      cycles for FpuSave 3
26      cycles for FpuSave 2
27      cycles for FpuSave 1

276     cycles for fsave/frstor
56      cycles for FpuSave 4
43      cycles for FpuSave 3
41      cycles for FpuSave 2
19      cycles for FpuSave 1

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
1108    cycles for fsave/frstor
113     cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

1105    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
106     cycles for FpuSave 1

1104    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
40      cycles for FpuSave 2
32      cycles for FpuSave 1

1009    cycles for fsave/frstor
81      cycles for FpuSave 4
113     cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

1016    cycles for fsave/frstor
114     cycles for FpuSave 4
59      cycles for FpuSave 3
41      cycles for FpuSave 2
32      cycles for FpuSave 1

braincell

Thanks, this should come in very handy and speed up my debugging, especially since i'm almost always working with the FPU.  :U

Intel(R) Core(TM)2 Duo CPU     E6750  @ 2.66GHz (SSE4)
381     cycles for fsave/frstor
29      cycles for FpuSave 4
20      cycles for FpuSave 3
14      cycles for FpuSave 2
4       cycles for FpuSave 1

356     cycles for fsave/frstor
30      cycles for FpuSave 4
31      cycles for FpuSave 3
12      cycles for FpuSave 2
9       cycles for FpuSave 1

363     cycles for fsave/frstor
49      cycles for FpuSave 4
20      cycles for FpuSave 3
25      cycles for FpuSave 2
5       cycles for FpuSave 1

346     cycles for fsave/frstor
39      cycles for FpuSave 4
20      cycles for FpuSave 3
15      cycles for FpuSave 2
11      cycles for FpuSave 1

337     cycles for fsave/frstor
28      cycles for FpuSave 4
32      cycles for FpuSave 3
28      cycles for FpuSave 2
4       cycles for FpuSave 1

clive

AMD A6-3400M APU with Radeon(tm) HD Graphics (SSE3)
233     cycles for fsave/frstor
60      cycles for FpuSave 4
40      cycles for FpuSave 3
64      cycles for FpuSave 2
27      cycles for FpuSave 1

228     cycles for fsave/frstor
35      cycles for FpuSave 4
85      cycles for FpuSave 3
12      cycles for FpuSave 2
10      cycles for FpuSave 1

190     cycles for fsave/frstor
41      cycles for FpuSave 4
27      cycles for FpuSave 3
29      cycles for FpuSave 2
30      cycles for FpuSave 1

230     cycles for fsave/frstor
59      cycles for FpuSave 4
39      cycles for FpuSave 3
29      cycles for FpuSave 2
51      cycles for FpuSave 1

226     cycles for fsave/frstor
61      cycles for FpuSave 4
41      cycles for FpuSave 3
66      cycles for FpuSave 2
30      cycles for FpuSave 1


Intel(R) Atom(TM) CPU N450   @ 1.66GHz (SSE4
600     cycles for fsave/frstor
111     cycles for FpuSave 4
79      cycles for FpuSave 3
51      cycles for FpuSave 2
27      cycles for FpuSave 1

552     cycles for fsave/frstor
103     cycles for FpuSave 4
96      cycles for FpuSave 3
49      cycles for FpuSave 2
28      cycles for FpuSave 1

549     cycles for fsave/frstor
118     cycles for FpuSave 4
76      cycles for FpuSave 3
53      cycles for FpuSave 2
25      cycles for FpuSave 1

512     cycles for fsave/frstor
104     cycles for FpuSave 4
75      cycles for FpuSave 3
50      cycles for FpuSave 2
22      cycles for FpuSave 1

514     cycles for fsave/frstor
103     cycles for FpuSave 4
77      cycles for FpuSave 3
82      cycles for FpuSave 2
28      cycles for FpuSave 1
It could be a random act of randomness. Those happen a lot as well.

jj2007

Clive,
Are you running a lab??
:bg

Quote from: clive on February 11, 2012, 03:40:19 AM
Intel(R) Pentium(R) D CPU 3.00GHz (SSE3)
1099    cycles for fsave/frstor
79      cycles for FpuSave 4

AMD Phenom(tm) II X6 1055T Processor (SSE3)
285     cycles for fsave/frstor
34      cycles for FpuSave 4

AMD A6-3400M APU with Radeon(tm) HD Graphics (SSE3)
233     cycles for fsave/frstor
60      cycles for FpuSave 4

Intel(R) Atom(TM) CPU N450   @ 1.66GHz (SSE4)
600     cycles for fsave/frstor
111     cycles for FpuSave 4

vanjast

QuoteIntel(R) Pentium(R) D CPU 3.20GHz (SSE3)
1095    cycles for fsave/frstor
112     cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

1095    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
109     cycles for FpuSave 1

1089    cycles for fsave/frstor
79      cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

999     cycles for fsave/frstor
79      cycles for FpuSave 4
113     cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1

999     cycles for fsave/frstor
118     cycles for FpuSave 4
59      cycles for FpuSave 3
39      cycles for FpuSave 2
32      cycles for FpuSave 1


--- ok ---

clive

Quote from: jj2007Clive, Are you running a lab??

Actually it's more like a Bat Cave than a Lab.

I do have a diverse collection of hardware at my disposal, of which this is a subset. I ignored the ones which duplicated the results already provided, for example the AMD E-350 and C-50 systems are identical performers, as are the 4 and 6 core Phenom II's.

Timing across a wide spectrum was what you were after, right?

It could be a random act of randomness. Those happen a lot as well.

jj2007

Quote from: clive on February 11, 2012, 02:07:09 PM
Timing across a wide spectrum was what you were after, right?

Yes, thanks :thumbu