The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: jj2007 on October 25, 2010, 01:45:53 PM

Title: Floating point comparisons
Post by: jj2007 on October 25, 2010, 01:45:53 PM
I am testing an algo that yields the max and min values in a given array of 128 REAL8 variables. One uses the FPU, the other SSE2.
Grateful for some timings and/or suggestions,
Jochen

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
3747    cycles for fCmpFpu
1794    cycles for fCmpXmm
1784    cycles for fCmpXmmNf
Title: Re: Floating point comparisons
Post by: Antariy on October 25, 2010, 01:52:57 PM
What if use MAXPD and MINPD ?



Alex
Title: Re: Floating point comparisons
Post by: oex on October 25, 2010, 02:08:31 PM

AMD Sempron(tm) Processor 3100+ (SSE3)
1520    cycles for fCmpFpu
1096    cycles for fCmpXmm
1089    cycles for fCmpXmmNf

1513    cycles for fCmpFpu
1099    cycles for fCmpXmm
1099    cycles for fCmpXmmNf

85       bytes for fCmpFpu
78       bytes for fCmpXmm
78       bytes for fCmpXmmNf


OMG.... What's with the Pentium timings? My AMD clocks at 1.8Ghz.... Now I know why my code runs as fast as yours :lol
Title: Re: Floating point comparisons
Post by: dedndave on October 25, 2010, 02:31:31 PM
is it safe to assume all the values are valid (i.e. no NANs) ?
well - had an idea - but don't think it's valid - lol

let's try another idea...

can't you just test the high order dword (as though they were integers - without using the FPU) ?
only if they are equal do you need to compare the remaining bits
Title: Re: Floating point comparisons
Post by: jj2007 on October 25, 2010, 02:35:31 PM
Quote from: Antariy on October 25, 2010, 01:52:57 PM
What if use MAXPD and MINPD ?

Alex

Alex, you are a real friend :bg
.Repeat
minsd xmm2, REAL8 ptr [edx]
maxsd xmm3, REAL8 ptr [edx]
add edx, 8
dec ecx
.Until Sign?

Under 1000 cycles on the Pentium - thanxalot :U
Note that maxpd throws exceptions for not being 16-byte aligned, maxsd behaves ok.

@Dave: > is it safe to assume all the values are valid (i.e. no NANs) ?
Yes, the array is composed of valid REAL8 numbers
Title: Re: Floating point comparisons
Post by: brethren on October 25, 2010, 06:28:44 PM
QuoteAMD Turion(tm) 64 X2 Mobile Technology TL-52 (SSE3)
1517    cycles for fCmpFpu
1089    cycles for fCmpXmm
1094    cycles for fCmpXmmNf

1516    cycles for fCmpFpu
1088    cycles for fCmpXmm
1100    cycles for fCmpXmmNf

85       bytes for fCmpFpu
78       bytes for fCmpXmm
78       bytes for fCmpXmmNf

--- ok ---
Title: Re: Floating point comparisons
Post by: RuiLoureiro on October 25, 2010, 06:38:23 PM
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
3091    cycles for fCmpFpu
1670    cycles for fCmpXmm
1718    cycles for fCmpXmmNf

5766    cycles for fCmpFpu
1912    cycles for fCmpXmm
1812    cycles for fCmpXmmNf

85       bytes for fCmpFpu
78       bytes for fCmpXmm
78       bytes for fCmpXmmNf

--- ok ---
Title: Re: Floating point comparisons
Post by: dioxin on October 25, 2010, 08:22:26 PM
If doing it with ther FPU wouldn't it make more sense (and more speed) to use the FCOMI and FCMOV instructions to avoid the slow interaction with the flags via ax?

Paul.
Title: Re: Floating point comparisons
Post by: jj2007 on October 25, 2010, 10:29:22 PM
Quote from: dioxin on October 25, 2010, 08:22:26 PM
If doing it with ther FPU wouldn't it make more sense (and more speed) to use the FCOMI and FCMOV instructions to avoid the slow interaction with the flags via ax?

Paul.


Yes it does - thanks Paul :U
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1151    cycles for fCmpFpu  (fcomi)
1727    cycles for fCmpFpu2 (fcom)
1162    cycles for fCmpXmm  (comisd)
725     cycles for fMinMax  (minsd)
721     cycles for fMinMax1 (minsd)

1326    cycles for fCmpFpu  (fcomi)
1727    cycles for fCmpFpu2 (fcom)
1158    cycles for fCmpXmm  (comisd)
731     cycles for fMinMax  (minsd)
743     cycles for fMinMax1 (minsd)

1326    cycles for fCmpFpu  (fcomi)
1727    cycles for fCmpFpu2 (fcom)
1163    cycles for fCmpXmm  (comisd)
725     cycles for fMinMax  (minsd)
721     cycles for fMinMax1 (minsd)

75       bytes for fCmpFpu
85       bytes for fCmpFpu2
77       bytes for fCmpXmm
60       bytes for fMinMax
66       bytes for fMinMax1


Note that the first loop is consistently faster, no idea why.
Title: Re: Floating point comparisons
Post by: oex on October 25, 2010, 10:51:27 PM
I get an:

error A2070: invalid instruction operands

on lines 236, 254 and 256.... Why would that happen? It is an SSE2 instruction and I have SSE2

   movsd xmm0, qword ptr fMinMaxHigh-4   ; about 1.79e308

   movsd REAL8 ptr [eax], xmm0

   movsd REAL8 ptr [eax], xmm1
Title: Re: Floating point comparisons
Post by: Antariy on October 25, 2010, 10:55:19 PM
Quote from: oex on October 25, 2010, 10:51:27 PM
I get an:

error A2070: invalid instruction operands

on lines 236, 254 and 256.... Why would that happen? It is an SSE2 instruction and I have SSE2

   movsd xmm0, qword ptr fMinMaxHigh-4   ; about 1.79e308

   movsd REAL8 ptr [eax], xmm0

   movsd REAL8 ptr [eax], xmm1

You use ML 6.15 probably? This is bug of it - it mess MOVSD integer with SIMD.
Just download ML8 - it works.



Alex
Title: Re: Floating point comparisons
Post by: Antariy on October 25, 2010, 10:57:56 PM
Jochen, there are results:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
2246    cycles for fCmpFpu  (fcomi)
5878    cycles for fCmpFpu2 (fcom)
1657    cycles for fCmpXmm  (comisd)
712     cycles for fMinMax  (minsd)
708     cycles for fMinMax1 (minsd)

2368    cycles for fCmpFpu  (fcomi)
5856    cycles for fCmpFpu2 (fcom)
1705    cycles for fCmpXmm  (comisd)
712     cycles for fMinMax  (minsd)
925     cycles for fMinMax1 (minsd)

2290    cycles for fCmpFpu  (fcomi)
5740    cycles for fCmpFpu2 (fcom)
1662    cycles for fCmpXmm  (comisd)
700     cycles for fMinMax  (minsd)
706     cycles for fMinMax1 (minsd)

75       bytes for fCmpFpu
85       bytes for fCmpFpu2
77       bytes for fCmpXmm
60       bytes for fMinMax
66       bytes for fMinMax1


Hardware is still faster  :bg



Alex
Title: Re: Floating point comparisons
Post by: oex on October 25, 2010, 10:58:26 PM
Ah kk yep used 6.15 ty Alex....

http://www.masm32.com/board/index.php?topic=12719.msg98468#msg98468


AMD Sempron(tm) Processor 3100+ (SSE3)
1263    cycles for fCmpFpu  (fcomi)
1667    cycles for fCmpFpu2 (fcom)
1144    cycles for fCmpXmm  (comisd)
408     cycles for fMinMax  (minsd)
413     cycles for fMinMax1 (minsd)

1321    cycles for fCmpFpu  (fcomi)
1670    cycles for fCmpFpu2 (fcom)
1128    cycles for fCmpXmm  (comisd)
409     cycles for fMinMax  (minsd)
413     cycles for fMinMax1 (minsd)

1333    cycles for fCmpFpu  (fcomi)
1677    cycles for fCmpFpu2 (fcom)
1126    cycles for fCmpXmm  (comisd)
408     cycles for fMinMax  (minsd)
409     cycles for fMinMax1 (minsd)

75       bytes for fCmpFpu
85       bytes for fCmpFpu2
77       bytes for fCmpXmm
60       bytes for fMinMax
66       bytes for fMinMax1
Title: Re: Floating point comparisons
Post by: clive on October 25, 2010, 11:49:08 PM
Intel(R) Atom(TM) CPU N270   @ 1.60GHz (SSE4)
4539    cycles for fCmpFpu  (fcomi)
5228    cycles for fCmpFpu2 (fcom)
3785    cycles for fCmpXmm  (comisd)
1257    cycles for fMinMax  (minsd)
1278    cycles for fMinMax1 (minsd)

4159    cycles for fCmpFpu  (fcomi)
5313    cycles for fCmpFpu2 (fcom)
3664    cycles for fCmpXmm  (comisd)
1265    cycles for fMinMax  (minsd)
1260    cycles for fMinMax1 (minsd)

4158    cycles for fCmpFpu  (fcomi)
5338    cycles for fCmpFpu2 (fcom)
3663    cycles for fCmpXmm  (comisd)
1257    cycles for fMinMax  (minsd)
1263    cycles for fMinMax1 (minsd)

75       bytes for fCmpFpu
85       bytes for fCmpFpu2
77       bytes for fCmpXmm
60       bytes for fMinMax
66       bytes for fMinMax1

--- ok ---