News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

m32lib GetPercent need a correction

Started by ToutEnMasm, November 07, 2010, 06:54:01 AM

Previous topic - Next topic

Antariy

Quote from: dedndave on December 01, 2010, 11:43:54 PM
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
24      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
29      cycles for GetPercent2nc
26      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

25      cycles for GetPercentSSE
47      cycles for GetPercent
30      cycles for GetPercent2c
30      cycles for GetPercent2nc
26      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

:U

Thanks, Dave! In testing with rules of the testbed it looks not bad.

dedndave

i am beginning to think my machine gives the worst results - lol
that makes it good for testing, at least
it runs fast enough (with my tweaks)

Antariy

Quote from: dedndave on December 01, 2010, 11:49:48 PM
i am beginning to think my machine gives the worst results - lol
that makes it good for testing, at least
it runs fast enough (with my tweaks)

No, your results is excellent...
...Because they are equal to my results :P

dedndave

it looks ok this time
except this one...
24      cycles for GetPercent2c
30      cycles for GetPercent2c
i think i have a way to fix that problem, though   :P

frktons

Quote from: Antariy on December 01, 2010, 11:13:54 PM
Quote from: hutch-- on December 01, 2010, 10:34:56 PM
Alex,

I get a GP fault out of your last zip on xp sp2.

That's something with ChkPrecision/FPU code, apparently. AxGetPercentInt have no instructions which can cause GPF.

I have checked this, and found a reason, Jochen intentionally made FPU stack overflow in the MACRO:

Algo MACRO arg
finit
REPEAT 8
fldpi
ENDM


ChkPrecision code uses simple FPU code as etalone. And this simple code is not handle cases when FPU stack is full. This is reason why you get exception.

Now I have commented FLDPI line, and it should work properly.

As sayed, that's not bug in my code, just Jochen not write documentation for his testing variant :bg

Hutch, and all, test this one, please.



Alex


It doesn't work on my pc as well. GPF:


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
-2125943931 for 7FFF0000/1, case 0


Probably it depends on the fact you are not using the right Testbed  :lol

Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: frktons on December 02, 2010, 12:00:35 AM
It doesn't work on my pc as well. GPF:

Probably it depends on the fact you are not using the right Testbed  :lol

Of course, influence of the old testbed.  :lol

Well, I have no desire to dig into the all code of entire testbed, to find a culprit.
On PIV cores it works, so - something with FPU part somewhere maybe, which lead to inexact results etc on other hardware.

frktons

Quote from: Antariy on December 02, 2010, 12:06:56 AM
Of course, influence of the old testbed.  :lol

Well, I have no desire to dig into the all code of entire testbed, to find a culprit.
On PIV cores it works, so - something with FPU part somewhere maybe, which lead to inexact results etc on other hardware.

We have worked hard to create a new Testbed, compatible with old machines and MASM versions.
It is a pity we have to see these horrible interfaces again. And they have some bugs as well  ::)
:naughty: :naughty: :snooty: :snooty: :naughty: :naughty:
Mind is like a parachute. You know what to do in order to use it :-)

Antariy

Quote from: frktons on December 02, 2010, 12:10:20 AM
We have worked hard to create a new Testbed, compatible with old machines and MASM versions.
It is a pity we have to see these horrible interfaces again. And they have some bugs as well  ::)
:naughty: :naughty: :snooty: :snooty: :naughty: :naughty:

:green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2

jj2007

Quote from: Antariy on December 01, 2010, 11:13:54 PM
Quote from: hutch-- on December 01, 2010, 10:34:56 PM
Alex,

I get a GP fault out of your last zip on xp sp2.

That's something with ChkPrecision/FPU code, apparently. AxGetPercentInt have no instructions which can cause GPF.

I have checked this, and found a reason, Jochen intentionally made FPU stack overflow in the MACRO:


Alex,

First, I don't make the FPU stack overflow - I just fill the FPU with valid numbers. From a general purpose algo, I would expect that it works even if other parts of the code use the FPU. That is what the ffree instruction is meant for.

Second, the GPF is caused by an int 3 in ChkPrecision.

Third, after commenting out the int 3, I see a serious if incorrect results. Who wrote GetPercentEtalone?

Antariy

Quote from: jj2007 on December 02, 2010, 12:17:38 AM
Quote from: Antariy on December 01, 2010, 11:13:54 PM
Quote from: hutch-- on December 01, 2010, 10:34:56 PM
Alex,

I get a GP fault out of your last zip on xp sp2.

That's something with ChkPrecision/FPU code, apparently. AxGetPercentInt have no instructions which can cause GPF.

I have checked this, and found a reason, Jochen intentionally made FPU stack overflow in the MACRO:


Alex,

First, I don't make the FPU stack overflow - I just fill the FPU with valid numbers. From a general purpose algo, I would expect that it works even if other parts of the code use the FPU. That is what the ffree instruction is meant for.

Second, the GPF is caused by an int 3 in ChkPrecision.

Third, after commenting out the int 3, I see a serious if incorrect results. Who wrote GetPercentEtalone?

:P

Well, not overflow, but you are prepare it for further possible overflow :P  :lol

GPF - is not int 3 (int 3 is debugging exception, and have other code). So, we was misinformed :P

Etalone proc - written becuase GetPercent works not properly for 2^31 and above. Since FPU operate only with signed numbers, and when you load a DWORD - higher bit have meaning of the sign. You can see wrong results GetPercent as well.
"Etalone" was written to handle this thing, but probably in too tired and short time :P

Antariy

Quote from: jj2007 on December 02, 2010, 12:17:38 AM
From a general purpose algo, I would expect that it works even if other parts of the code use the FPU. That is what the ffree instruction is meant for.

In the big program, I will really not expecting that any code will free some regs, which can contain my variables... FPUs rules for general purpose algos require to *not* hold FP values in the regs at time of call to external code.

jj2007

Quote from: Antariy on December 02, 2010, 12:29:07 AM

In the big program, I will really not expecting that any code will free some regs, which can contain my variables... FPUs rules for general purpose algos require to *not* hold FP values in the regs at time of call to external code.


From Raymond Filiatreault's FpuLib help:
Unless a source parameter was specified as being in the TOP data register, the original Fpulib was designed to initialize the FPU to prevent any potential "stack overflow". This destroyed any data which may have been present in the other FPU registers. This was revised later to destroy only the data (if any) in the registers which were necessary to perform the function. This new version will not destroy any of the existing data, except possibly the data in the ST(7) register.

Re range of algos: Give me one reason why invoke &algo, -1000, 55 should not return -550.

Antariy

Quote from: jj2007 on December 02, 2010, 12:46:40 AM
Quote from: Antariy on December 02, 2010, 12:29:07 AM

In the big program, I will really not expecting that any code will free some regs, which can contain my variables... FPUs rules for general purpose algos require to *not* hold FP values in the regs at time of call to external code.


From Raymond Filiatreault's FpuLib help:
Unless a source parameter was specified as being in the TOP data register, the original Fpulib was designed to initialize the FPU to prevent any potential "stack overflow". This destroyed any data which may have been present in the other FPU registers. This was revised later to destroy only the data (if any) in the registers which were necessary to perform the function. This new version will not destroy any of the existing data, except possibly the data in the ST(7) register.

Re range of algos: Give me one reason why invoke &algo, -1000, 55 should not return -550.


But you know this feature of FPU lib, right? So, it should be documented, at least. And this is for programming in ASM only, where you can control flow of the program at all. When I talk about "general purpose" algos, I talk about algos which can be used in any environment, even with HLL... To strictly follow API rules, you should not hold FPU data in the regs. This is disputalble, of course, but only for ASM.

Well, if you like (or forced due to FPU) treat DWORD as signed - then you were right. But for unsigned that's wrong. Right result will be: 8CCCCAA6, and it is returned by integer unsigned code.



Alex

hutch--

Sorry Alex, no go here.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
-2125943931 for 7FFF0000/1, case 26750588


Press a key after this and you have a GP fault.

I am using XP SP3.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Antariy

Quote from: hutch-- on December 02, 2010, 01:35:52 AM
Sorry Alex, no go here.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
-2125943931 for 7FFF0000/1, case 26750588


Press a key after this and you have a GP fault.

I am using XP SP3.

Hutch, try to comment

call ChkPrecision


in the sources, please. Code is precise enough for integer code, and I (and Dave, and Luce) have no problems with inexact results.
To avoid some flaw in the checking code, comment call to checking. Then you will go to timings test straightforward.

Thank you!



Alex