The MASM Forum Archive 2004 to 2012

Project Support Forums => MASM32 => Topic started by: ToutEnMasm on November 07, 2010, 06:54:01 AM

Title: m32lib GetPercent need a correction
Post by: ToutEnMasm on November 07, 2010, 06:54:01 AM
Quote
GetPercent proc source:DWORD, percent:DWORD

    LOCAL var1:DWORD

    mov var1, 100   ; to divide by 100

    fild source     ; load source integer
    fild var1       ; load 100
    fdiv            ; divide source by 100
    fild percent    ; load required percentage
    fmul            ; multiply 1% by required percentage
    fistp var1      ; store result in variable
    mov eax, var1          ;FPU STACK is +2 HERE and can't be used as this
    FINIT    ;---------- <<<<<<<<<<<< Added correction needed file getpcnt.asm
    ret

GetPercent endp
Title: Re: m32lib GetPercent need a correction
Post by: MichaelW on November 07, 2010, 08:44:16 AM
I can't find any problem. The FDIV and FMUL are actually encoded as FDIVP and FMULP (or at least they are if I assemble with ML 6.15).
Title: Re: m32lib GetPercent need a correction
Post by: ToutEnMasm on November 07, 2010, 09:48:52 AM
This instructions aren't the problem,I agree.If you have read the comments the problem is:
Quote
THE STACK OF FPU IS +2 AT THE END OF THE FUNCTION ,NOT 0
The function couldn't be recall without ERROR.
Title: Re: m32lib GetPercent need a correction
Post by: MichaelW on November 07, 2010, 10:28:46 AM
The FPU stack is empty after the FISTP. I can call the procedure repeatedly without problems.
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on November 07, 2010, 10:37:08 AM
Michael is right, The problem exists only in your imagination. However, the GetPercent can certainly be optimised a little bit - 20 bytes instead of 36:
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

GetPercent proc source:DWORD, percent:DWORD
fild dword ptr [esp+4] ; load source integer
push 100
fidiv dword ptr [esp] ; divide source by 100
fimul dword ptr [esp+12] ; multiply 1% by required percentage
fistp dword ptr [esp] ; store result on stack
pop eax ; return in eax
ret 8
GetPercent endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
Title: Re: m32lib GetPercent need a correction
Post by: ToutEnMasm on November 07, 2010, 01:44:11 PM
Perhaps you can made one effort: count on your fingers
Quote
  mov var1, 100   ; to divide by 100

    fild source     ; stack +1
    fild var1       ; lstack +1
    fdiv            ;
    fild percent    ; stack +1
    fmul            ;
    fistp var1      ;stack -1
    mov eax, var1         
    FINIT    ;---------- <<<<<<<<<<<< Added correction needed file getpcnt.asm
    ret
3-1 =2       not 0   

If you want more proof,try a loop with 4 or 5 recall of the function




 
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on November 07, 2010, 02:15:13 PM
Quote from: ToutEnMasm on November 07, 2010, 01:44:11 PM
Perhaps you can made one effort: count on your fingers

Perhaps you can made one effort: Read Michael's comments ("The FDIV and FMUL are actually encoded as FDIVP and FMULP"), or launch Olly to see yourself.

And still, it could be optimised :bg

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
27      cycles for GetPercent
11      cycles for GetPercentJJ1
9       cycles for GetPercentJJ2

32      cycles for GetPercent
11      cycles for GetPercentJJ1
9       cycles for GetPercentJJ2

32      cycles for GetPercent
11      cycles for GetPercentJJ1
9       cycles for GetPercentJJ2

Code sizes:
36      for GetPercent
32      for GetPercentJJ1
32      for GetPercentJJ2
Title: Re: m32lib GetPercent need a correction
Post by: dedndave on November 07, 2010, 02:48:22 PM
he may be using a different assembler   :P

might try the explicit...

fmul st0,st1
Title: Re: m32lib GetPercent need a correction
Post by: ToutEnMasm on November 07, 2010, 02:54:59 PM
Quote
The FDIV and FMUL are actually encoded as FDIVP and FMULP"), or launch Olly to see yourself.
And what do you do if you have to work with an old compiler ?.
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on November 07, 2010, 02:59:20 PM
Quote from: ToutEnMasm on November 07, 2010, 02:54:59 PM
Quote
The FDIV and FMUL are actually encoded as FDIVP and FMULP"), or launch Olly to see yourself.
And what do you do if you have to work with an old compiler ?.

ML 6.14, 6.15, 9.0 and JWasm all expose this behaviour. If you have a "compiler" older than 6.14, move it into the dustbin.
Title: Re: m32lib GetPercent need a correction
Post by: FORTRANS on November 07, 2010, 03:00:46 PM
Hi,

   CPU and FPU do not matter.  It is the assembler that
encodes the implied POP.  It is a matter of convention in
MASM from version 1.0 onwards.  Confuses some more
than others, and is only used with the one operand form.
IIRC.

Regards,

Steve N.
Title: Re: m32lib GetPercent need a correction
Post by: dedndave on November 07, 2010, 05:40:56 PM
i wasn't thinking he may be using an older masm
but, maybe tasm or fasm or some other creature
not sure how GoAsm handles it - knowing Jeremy, it is probably masm-compatible
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on November 07, 2010, 06:45:25 PM
Just for fun, the non-FPU integer (JJ3) and SSE2 versions:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
27      cycles for GetPercent
10      cycles for GetPercentJJ1
9       cycles for GetPercentJJ2
35      cycles for GetPercentJJ3
11      cycles for GetPercentJJ4
7       cycles for GetPercentSSE

Code sizes:
32      for GetPercentJJ1
32      for GetPercentJJ2
17      for GetPercentJJ3
33      for GetPercentJJ4
39      for GetPercentSSE
Title: Re: m32lib GetPercent need a correction
Post by: dedndave on November 07, 2010, 06:55:00 PM
prescott...
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
38      cycles for GetPercent
24      cycles for GetPercentJJ1
20      cycles for GetPercentJJ2
46      cycles for GetPercentJJ3
21      cycles for GetPercentJJ4
18      cycles for GetPercentSSE

43      cycles for GetPercent
21      cycles for GetPercentJJ1
18      cycles for GetPercentJJ2
42      cycles for GetPercentJJ3
22      cycles for GetPercentJJ4
21      cycles for GetPercentSSE
Title: Re: m32lib GetPercent need a correction
Post by: clive on November 07, 2010, 07:53:19 PM
Atom
QuoteIntel(R) Atom(TM) CPU N270   @ 1.60GHz (SSE4)
117     cycles for GetPercent
51      cycles for GetPercentJJ1
46      cycles for GetPercentJJ2
100     cycles for GetPercentJJ3
50      cycles for GetPercentJJ4
40      cycles for GetPercentSSE

125     cycles for GetPercent
48      cycles for GetPercentJJ1
46      cycles for GetPercentJJ2
87      cycles for GetPercentJJ3
51      cycles for GetPercentJJ4
43      cycles for GetPercentSSE

Code sizes:
32      for GetPercentJJ1
32      for GetPercentJJ2
17      for GetPercentJJ3
33      for GetPercentJJ4
39      for GetPercentSSE
Title: Re: m32lib GetPercent need a correction
Post by: hutch-- on November 07, 2010, 11:18:07 PM
I wrote this years ago to plug up a simple requirement, things like integer sizing for screen display and it has never been a performance issue where it was normally used. Now I have no doubt that it can be optimised but in the context of its use, its a "who cares" issue.

Now I understand what Yves has said but the procedure has always handled high call counts with no problems at all. Here is a test piece that calls it in a loop 1 million times.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    push esi

    mov esi, 1000000

  lbl0:
    print str$(rv(GetPercent,esi,50)),13,10
    sub esi, 1
    jnz lbl0

    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: m32lib GetPercent need a correction
Post by: hutch-- on November 08, 2010, 12:14:55 AM
 :bg

JJ,


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
14      cycles for GetPercent
6       cycles for GetPercentJJ1
5       cycles for GetPercentJJ2

16      cycles for GetPercent
6       cycles for GetPercentJJ1
5       cycles for GetPercentJJ2

16      cycles for GetPercent
6       cycles for GetPercentJJ1
5       cycles for GetPercentJJ2

Code sizes:
32      for GetPercentJJ1
32      for GetPercentJJ2

--- ok ---


Removing the FDIV certainly improves the time.  :P
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on November 08, 2010, 12:55:49 AM
Quote from: hutch-- on November 08, 2010, 12:14:55 AM
Removing the FDIV certainly improves the time.  :P

Certainly. SSE speeds it up once more, but I ran into a problem with the mulsd instruction: It is fast if the xmm register is already in float format, but if not, it costs over 200 cycles! The first version below works fine but needs lots of conversions.

GetPercentSSE_s:
GPJ005 REAL8 0.01
GetPercentSSE proc source:DWORD, percent:DWORD
cvtsi2sd xmm0, dword ptr [esp+4] ; source
cvtsi2sd xmm1, dword ptr [esp+8] ; percent
mulsd xmm0, xmm1
mulsd xmm0, GPJ005 ; multiply with 0.01, i.e. divide source by 100
cvtsd2si eax, xmm0
ret 8 ; 11 cycles
GetPercentSSE endp
GetPercentSSE_e:

GetPercentSSEs_s:
GPJ005s REAL8 0.01
GetPercentSSEs proc source:DWORD, percent:DWORD
; xorps xmm0, xmm0
; movaps xmm1, xmm0 ; no effect
movd xmm0, dword ptr [esp+4] ; source
movd xmm1, dword ptr [esp+8] ; percent
; int 3 ; OPT_Olly 2
pmuludq xmm0, xmm1 ; source * percent, OK (pm xmm0, qword ptr [esp+8] possible but slow)
MakeSlow = 1
if MakeSlow ; first branch: result is correct but >200 cycles
mulsd xmm0, GPJ005s ; multiply with 0.01, i.e. divide source by 100 - SLOOOOOOW ###
movd eax, xmm0
else ; second branch: result correct, 27 cycles
cvtdq2pd xmm0, xmm0 ; convert integer to float (Convert  Packed Doubleword Integers to Packed Double-Precision Floating-Point Values)
mulsd xmm0, GPJ005s ; multiply with 0.01, i.e. divide source by 100 - fast
cvtsd2si eax, xmm0
endif
ret 8 ; 259 or 348 cycles
GetPercentSSEs endp
GetPercentSSEs_e:


New testbed attached, now with more realistic cycle counts (there is a REPEAT 64 ... ENDM followed by shr eax, 6).
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
30      cycles for GetPercent
14      cycles for GetPercentJJ2
11      cycles for GetPercentSSE
263     cycles for GetPercentSSEs

Code sizes:
36      bytes for GetPercent, result=6790123
32      bytes for GetPercentJJ2, result=6790123
39      bytes for GetPercentSSE, result=6790123
39      bytes for GetPercentSSEs, result=6790123
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on November 08, 2010, 01:02:23 AM

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
44      cycles for GetPercent
21      cycles for GetPercentJJ2
22      cycles for GetPercentSSE
1104    cycles for GetPercentSSEs

Code sizes:
36      bytes for GetPercent, result=6790123
32      bytes for GetPercentJJ2, result=6790123
39      bytes for GetPercentSSE, result=6790123
39      bytes for GetPercentSSEs, result=6790123

Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on November 08, 2010, 01:04:27 AM
Thanks, Alex. So the ordinary FPU version is one cycle faster than the fast SSE2 version... and a whopping 1100 cycles for the bad SSE stuff ::)
Title: Re: m32lib GetPercent need a correction
Post by: hutch-- on November 08, 2010, 01:50:56 AM
JJ,

Inspired by you reciprocal multiply, this one has abot 40% legs on the old version. I changed the calculation order as well.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

GetPercent2 proc source:DWORD, percent:DWORD

    fild DWORD PTR [esp+8]  ; load percent
    fld10 0.01              ; load reciprocal of 100
    fmul                    ; mul by reciprocal = div by 100
    fild DWORD PTR [esp+4]  ; load the source
    fmul                    ; multiply by previous result
    fistp DWORD PTR [esp+8] ; pop FP stack and store result in stack variable
    mov eax, [esp+8]        ; write result to EAX for return value
    ret 8

GetPercent2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on November 08, 2010, 07:31:41 AM
Great, so now we are waiting for Lingo :bg
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
11      cycles for GetPercentSSE
36      cycles for GetPercent
13      cycles for GetPercent2c
14      cycles for GetPercent2nc
14      cycles for GetPercentJJ2

11      cycles for GetPercentSSE
36      cycles for GetPercent
14      cycles for GetPercent2c
14      cycles for GetPercent2nc
14      cycles for GetPercentJJ2

11      cycles for GetPercentSSE
36      cycles for GetPercent
13      cycles for GetPercent2c
14      cycles for GetPercent2nc
14      cycles for GetPercentJJ2

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=6790123
41      bytes for GetPercent2c, result=6790123
45      bytes for GetPercent2nc, result=6790123
32      bytes for GetPercentJJ2, result=6790123


P.S.: GetPercent2c and GetPercent2nc are two variants of Hutch' new algo. I like the JJ2 variant, it fits into 2 paras and is reasonably fast.

Edit: Prescott P4:
20      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
20      cycles for GetPercentJJ2
Title: Re: m32lib GetPercent need a correction
Post by: hutch-- on November 08, 2010, 10:47:01 AM

Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
14      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
10      cycles for GetPercentJJ2

13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
10      cycles for GetPercentJJ2

13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
10      cycles for GetPercentJJ2

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=6790123
41      bytes for GetPercent2c, result=6790123
45      bytes for GetPercent2nc, result=6790123
32      bytes for GetPercentJJ2, result=6790123

--- ok ---

Title: Re: m32lib GetPercent need a correction
Post by: ToutEnMasm on November 08, 2010, 01:32:02 PM
Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
21      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
21      cycles for GetPercentJJ2

21      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
20      cycles for GetPercentJJ2

21      cycles for GetPercentSSE
48      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
21      cycles for GetPercentJJ2

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=6790123
41      bytes for GetPercent2c, result=6790123
45      bytes for GetPercent2nc, result=6790123
32      bytes for GetPercentJJ2, result=6790123

--- ok ---
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 01, 2010, 02:04:30 PM
Here is my integer version of GetPercent. By my tests - fast and small.


OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
AxGetPercentInt proc source:DWORD, percent:DWORD

mov eax,[esp+4]
imul edx,[esp+8],28F5C29h
jl @F

mul edx
mov eax,edx
shr edx,32-2
sub eax,edx
@@:
ret 8

AxGetPercentInt endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


The code is indended for calculation of values by percents in range 0-100. If percent is greater than, or equal to 100, then returned source value.
Code speed have no dependencyes from the value of the source number or value of percent.
Unusual correction is used.

If algo will always used with percents less than 100, then first 4 lines of the code can be replace to:


imul eax,[esp+8],28F5C29h
jl @F

mul dword ptr [esp+4]


Then timings by 2 clocks faster.


For testing is used latest Jochen's testbed. But I have fix flaw in the ChkPrecision code, which now calls to the right FPU calculation code.

Here is my timings:


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
24      cycles for GetPercentSSE
47      cycles for GetPercent
23      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
20      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

24      cycles for GetPercentSSE
46      cycles for GetPercent
23      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
20      cycles for GetPercentJJ2
15      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
26      bytes for AxGetPercentInt, result=6790122


I have asking for testing. Thanks!



Alex
Title: Re: m32lib GetPercent need a correction
Post by: ToutEnMasm on December 01, 2010, 03:26:00 PM
Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
24      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

24      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
26      bytes for AxGetPercentInt, result=6790122

--- ok ---
Title: Re: m32lib GetPercent need a correction
Post by: hutch-- on December 01, 2010, 10:34:56 PM
Alex,

I get a GP fault out of your last zip on xp sp2.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
-2125943931 for 7FFF0000/1, case 26750588
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 01, 2010, 11:13:54 PM
Quote from: hutch-- on December 01, 2010, 10:34:56 PM
Alex,

I get a GP fault out of your last zip on xp sp2.

That's something with ChkPrecision/FPU code, apparently. AxGetPercentInt have no instructions which can cause GPF.

I have checked this, and found a reason, Jochen intentionally made FPU stack overflow in the MACRO:

Algo MACRO arg
finit
REPEAT 8
fldpi
ENDM


ChkPrecision code uses simple FPU code as etalone. And this simple code is not handle cases when FPU stack is full. This is reason why you get exception.

Now I have commented FLDPI line, and it should work properly.

As sayed, that's not bug in my code, just Jochen not write documentation for his testing variant :bg

Hutch, and all, test this one, please.



Alex
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 01, 2010, 11:20:12 PM
Quote from: ToutEnMasm on December 01, 2010, 03:26:00 PM
Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
24      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

24      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
26      bytes for AxGetPercentInt, result=6790122

--- ok ---

Thanks, Luce!
Title: Re: m32lib GetPercent need a correction
Post by: dedndave on December 01, 2010, 11:43:54 PM
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
24      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
29      cycles for GetPercent2nc
26      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

25      cycles for GetPercentSSE
47      cycles for GetPercent
30      cycles for GetPercent2c
30      cycles for GetPercent2nc
26      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

:U
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 01, 2010, 11:48:08 PM
Quote from: dedndave on December 01, 2010, 11:43:54 PM
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
24      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
29      cycles for GetPercent2nc
26      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

25      cycles for GetPercentSSE
47      cycles for GetPercent
30      cycles for GetPercent2c
30      cycles for GetPercent2nc
26      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

:U

Thanks, Dave! In testing with rules of the testbed it looks not bad.
Title: Re: m32lib GetPercent need a correction
Post by: dedndave on December 01, 2010, 11:49:48 PM
i am beginning to think my machine gives the worst results - lol
that makes it good for testing, at least
it runs fast enough (with my tweaks)
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 01, 2010, 11:56:47 PM
Quote from: dedndave on December 01, 2010, 11:49:48 PM
i am beginning to think my machine gives the worst results - lol
that makes it good for testing, at least
it runs fast enough (with my tweaks)

No, your results is excellent...
...Because they are equal to my results :P
Title: Re: m32lib GetPercent need a correction
Post by: dedndave on December 01, 2010, 11:59:07 PM
it looks ok this time
except this one...
24      cycles for GetPercent2c
30      cycles for GetPercent2c
i think i have a way to fix that problem, though   :P
Title: Re: m32lib GetPercent need a correction
Post by: frktons on December 02, 2010, 12:00:35 AM
Quote from: Antariy on December 01, 2010, 11:13:54 PM
Quote from: hutch-- on December 01, 2010, 10:34:56 PM
Alex,

I get a GP fault out of your last zip on xp sp2.

That's something with ChkPrecision/FPU code, apparently. AxGetPercentInt have no instructions which can cause GPF.

I have checked this, and found a reason, Jochen intentionally made FPU stack overflow in the MACRO:

Algo MACRO arg
finit
REPEAT 8
fldpi
ENDM


ChkPrecision code uses simple FPU code as etalone. And this simple code is not handle cases when FPU stack is full. This is reason why you get exception.

Now I have commented FLDPI line, and it should work properly.

As sayed, that's not bug in my code, just Jochen not write documentation for his testing variant :bg

Hutch, and all, test this one, please.



Alex


It doesn't work on my pc as well. GPF:


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
-2125943931 for 7FFF0000/1, case 0


Probably it depends on the fact you are not using the right Testbed  :lol

Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 12:06:56 AM
Quote from: frktons on December 02, 2010, 12:00:35 AM
It doesn't work on my pc as well. GPF:

Probably it depends on the fact you are not using the right Testbed  :lol

Of course, influence of the old testbed.  :lol

Well, I have no desire to dig into the all code of entire testbed, to find a culprit.
On PIV cores it works, so - something with FPU part somewhere maybe, which lead to inexact results etc on other hardware.
Title: Re: m32lib GetPercent need a correction
Post by: frktons on December 02, 2010, 12:10:20 AM
Quote from: Antariy on December 02, 2010, 12:06:56 AM
Of course, influence of the old testbed.  :lol

Well, I have no desire to dig into the all code of entire testbed, to find a culprit.
On PIV cores it works, so - something with FPU part somewhere maybe, which lead to inexact results etc on other hardware.

We have worked hard to create a new Testbed, compatible with old machines and MASM versions.
It is a pity we have to see these horrible interfaces again. And they have some bugs as well  ::)
:naughty: :naughty: :snooty: :snooty: :naughty: :naughty:
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 12:13:30 AM
Quote from: frktons on December 02, 2010, 12:10:20 AM
We have worked hard to create a new Testbed, compatible with old machines and MASM versions.
It is a pity we have to see these horrible interfaces again. And they have some bugs as well  ::)
:naughty: :naughty: :snooty: :snooty: :naughty: :naughty:

:green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2 :green2
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on December 02, 2010, 12:17:38 AM
Quote from: Antariy on December 01, 2010, 11:13:54 PM
Quote from: hutch-- on December 01, 2010, 10:34:56 PM
Alex,

I get a GP fault out of your last zip on xp sp2.

That's something with ChkPrecision/FPU code, apparently. AxGetPercentInt have no instructions which can cause GPF.

I have checked this, and found a reason, Jochen intentionally made FPU stack overflow in the MACRO:


Alex,

First, I don't make the FPU stack overflow - I just fill the FPU with valid numbers. From a general purpose algo, I would expect that it works even if other parts of the code use the FPU. That is what the ffree instruction is meant for.

Second, the GPF is caused by an int 3 in ChkPrecision.

Third, after commenting out the int 3, I see a serious if incorrect results. Who wrote GetPercentEtalone?
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 12:23:36 AM
Quote from: jj2007 on December 02, 2010, 12:17:38 AM
Quote from: Antariy on December 01, 2010, 11:13:54 PM
Quote from: hutch-- on December 01, 2010, 10:34:56 PM
Alex,

I get a GP fault out of your last zip on xp sp2.

That's something with ChkPrecision/FPU code, apparently. AxGetPercentInt have no instructions which can cause GPF.

I have checked this, and found a reason, Jochen intentionally made FPU stack overflow in the MACRO:


Alex,

First, I don't make the FPU stack overflow - I just fill the FPU with valid numbers. From a general purpose algo, I would expect that it works even if other parts of the code use the FPU. That is what the ffree instruction is meant for.

Second, the GPF is caused by an int 3 in ChkPrecision.

Third, after commenting out the int 3, I see a serious if incorrect results. Who wrote GetPercentEtalone?

:P

Well, not overflow, but you are prepare it for further possible overflow :P  :lol

GPF - is not int 3 (int 3 is debugging exception, and have other code). So, we was misinformed :P

Etalone proc - written becuase GetPercent works not properly for 2^31 and above. Since FPU operate only with signed numbers, and when you load a DWORD - higher bit have meaning of the sign. You can see wrong results GetPercent as well.
"Etalone" was written to handle this thing, but probably in too tired and short time :P
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 12:29:07 AM
Quote from: jj2007 on December 02, 2010, 12:17:38 AM
From a general purpose algo, I would expect that it works even if other parts of the code use the FPU. That is what the ffree instruction is meant for.

In the big program, I will really not expecting that any code will free some regs, which can contain my variables... FPUs rules for general purpose algos require to *not* hold FP values in the regs at time of call to external code.
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on December 02, 2010, 12:46:40 AM
Quote from: Antariy on December 02, 2010, 12:29:07 AM

In the big program, I will really not expecting that any code will free some regs, which can contain my variables... FPUs rules for general purpose algos require to *not* hold FP values in the regs at time of call to external code.


From Raymond Filiatreault's FpuLib help:
Unless a source parameter was specified as being in the TOP data register, the original Fpulib was designed to initialize the FPU to prevent any potential "stack overflow". This destroyed any data which may have been present in the other FPU registers. This was revised later to destroy only the data (if any) in the registers which were necessary to perform the function. This new version will not destroy any of the existing data, except possibly the data in the ST(7) register.

Re range of algos: Give me one reason why invoke &algo, -1000, 55 should not return -550.
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 12:54:10 AM
Quote from: jj2007 on December 02, 2010, 12:46:40 AM
Quote from: Antariy on December 02, 2010, 12:29:07 AM

In the big program, I will really not expecting that any code will free some regs, which can contain my variables... FPUs rules for general purpose algos require to *not* hold FP values in the regs at time of call to external code.


From Raymond Filiatreault's FpuLib help:
Unless a source parameter was specified as being in the TOP data register, the original Fpulib was designed to initialize the FPU to prevent any potential "stack overflow". This destroyed any data which may have been present in the other FPU registers. This was revised later to destroy only the data (if any) in the registers which were necessary to perform the function. This new version will not destroy any of the existing data, except possibly the data in the ST(7) register.

Re range of algos: Give me one reason why invoke &algo, -1000, 55 should not return -550.


But you know this feature of FPU lib, right? So, it should be documented, at least. And this is for programming in ASM only, where you can control flow of the program at all. When I talk about "general purpose" algos, I talk about algos which can be used in any environment, even with HLL... To strictly follow API rules, you should not hold FPU data in the regs. This is disputalble, of course, but only for ASM.

Well, if you like (or forced due to FPU) treat DWORD as signed - then you were right. But for unsigned that's wrong. Right result will be: 8CCCCAA6, and it is returned by integer unsigned code.



Alex
Title: Re: m32lib GetPercent need a correction
Post by: hutch-- on December 02, 2010, 01:35:52 AM
Sorry Alex, no go here.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
-2125943931 for 7FFF0000/1, case 26750588


Press a key after this and you have a GP fault.

I am using XP SP3.
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 01:40:30 AM
Quote from: hutch-- on December 02, 2010, 01:35:52 AM
Sorry Alex, no go here.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
-2125943931 for 7FFF0000/1, case 26750588


Press a key after this and you have a GP fault.

I am using XP SP3.

Hutch, try to comment

call ChkPrecision


in the sources, please. Code is precise enough for integer code, and I (and Dave, and Luce) have no problems with inexact results.
To avoid some flaw in the checking code, comment call to checking. Then you will go to timings test straightforward.

Thank you!



Alex
Title: Re: m32lib GetPercent need a correction
Post by: hutch-- on December 02, 2010, 01:45:31 AM
Yes, works fine with the call commented out.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
11      cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
21      cycles for AxGetPercentInt

13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
11      cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
21      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
26      bytes for AxGetPercentInt, result=12345678

--- ok ---
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 01:49:42 AM
Quote from: hutch-- on December 02, 2010, 01:45:31 AM
Yes, works fine with the call commented out.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
11      cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
21      cycles for AxGetPercentInt

13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
11      cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
21      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
26      bytes for AxGetPercentInt, result=12345678

--- ok ---


Thank you, Hutch!

But results looks strange  :eek
Very interesting thing :eek
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 02:01:30 AM
Quote from: hutch-- on December 02, 2010, 01:45:31 AM

26      bytes for AxGetPercentInt, result=12345678


It seems that PIV hardware have different design of IMUL implementation.
I guess, culprit is in branch after IMUL:

imul edx,[esp+8],28F5C29h
jl @F


So, if change that piece to:


mov edx,[esp+8]
cmp edx,99
ja @F
imul edx,28F5C29h


Code will work guaranteed.
But anyway, this is really strange difference :eek
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 02:05:28 AM
I have changed code, and timings up by 1 clock.
It should work on any CPU. But this is pity that sign bit and overflow bit have other layouts.

Hutch, test this new one, please!



Alex
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on December 02, 2010, 02:47:29 AM
Congrats, Alex, it works now, and it's very fast :U
However, for MasmBasic I will keep the old design that yields -550 for PerCent(-1000, 55); the need for an unsigned PerCent(4294966296, 55)=2362231462 is unclear to me.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
11      cycles for GetPercentSSE
37      cycles for GetPercent
14      cycles for GetPercent2c
15      cycles for GetPercent2nc
15      cycles for GetPercentJJ1
14      cycles for GetPercentJJ2
10      cycles for AxGetPercentInt

11      cycles for GetPercentSSE
37      cycles for GetPercent
14      cycles for GetPercent2c
15      cycles for GetPercent2nc
15      cycles for GetPercentJJ1
14      cycles for GetPercentJJ2
10      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
31      bytes for AxGetPercentInt, result=6790122
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 02:55:45 AM
Quote from: jj2007 on December 02, 2010, 02:47:29 AM
Congrats, Alex, it works now, and it's very fast :U
However, for MasmBasic I will keep the old design that yields -550 for PerCent(-1000, 55); the need for an unsigned PerCent(4294966296, 55) is unclear to me.

Thanks!

Of course, you have using that algo which you want, I have not impose it at all. I just having some spare time, which I spent to it.
I prefer to treat numbers as unsigned, so I trying to make version which work with number of any size without speed loss, and that is all  :bg
EDITED: Why I prefer unsigned: because usually in programming you are needed in positive numbers, rather than negative. For example for calculation of some coordinates.



Alex
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on December 02, 2010, 03:08:52 AM
Quote from: Antariy on December 02, 2010, 02:55:45 AM
For example for calculation of some coordinates.

The Earth's diameter is 40,000 km, that makes 40,000,000 metres or 40,000,000,000 millimetres
40,000,000,000/4294967296 = 9.3 millimetres

For GPS, that is a damn good resolution :bg
Title: Re: m32lib GetPercent need a correction
Post by: dedndave on December 02, 2010, 03:20:00 AM
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
21      cycles for GetPercentSSE
53      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

21      cycles for GetPercentSSE
52      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
20      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt
Title: Re: m32lib GetPercent need a correction
Post by: oex on December 02, 2010, 05:11:13 AM
AMD Sempron(tm) Processor 3100+ (SSE3)
14      cycles for GetPercentSSE
30      cycles for GetPercent
13      cycles for GetPercent2c
15      cycles for GetPercent2nc
16      cycles for GetPercentJJ1
15      cycles for GetPercentJJ2
9       cycles for AxGetPercentInt

12      cycles for GetPercentSSE
29      cycles for GetPercent
13      cycles for GetPercent2c
14      cycles for GetPercent2nc
14      cycles for GetPercentJJ1
14      cycles for GetPercentJJ2
9       cycles for AxGetPercentInt
Title: Re: m32lib GetPercent need a correction
Post by: frktons on December 02, 2010, 06:45:47 AM
The last version is working:

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
14      cycles for GetPercentSSE
36      cycles for GetPercent
9       cycles for GetPercent2c
9       cycles for GetPercent2nc
9       cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
7       cycles for AxGetPercentInt

13      cycles for GetPercentSSE
36      cycles for GetPercent
9       cycles for GetPercent2c
9       cycles for GetPercent2nc
9       cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
7       cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
31      bytes for AxGetPercentInt, result=6790122

--- ok ---

:U
Title: Re: m32lib GetPercent need a correction
Post by: ToutEnMasm on December 02, 2010, 07:32:07 AM

Things begin a little unclear for me.

Last version of AxGetPercentInt is this one
Quote
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
PourCent proc source:DWORD, percent:DWORD

      mov eax,[esp+4]
      
      if 0
      imul edx,[esp+8],28F5C29h
      jl @F
      else
      mov edx,[esp+8]
      cmp edx,99
      ja @F
      imul edx,28F5C29h
      endif

      mul edx
      mov eax,edx
      shr edx,32-2
      sub eax,edx
      @@:
      ret 8         
            
PourCent endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
Am I correct ?

Title: Re: m32lib GetPercent need a correction
Post by: frktons on December 02, 2010, 08:09:11 AM
Quote from: ToutEnMasm on December 02, 2010, 07:32:07 AM

Things begin a little unclear for me.

Last version of AxGetPercentInt is this one
Quote
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
PourCent proc source:DWORD, percent:DWORD

      mov eax,[esp+4]
      
      if 0
      imul edx,[esp+8],28F5C29h
      jl @F
      else
      mov edx,[esp+8]
      cmp edx,99
      ja @F
      imul edx,28F5C29h
      endif

      mul edx
      mov eax,edx
      shr edx,32-2
      sub eax,edx
      @@:
      ret 8         
            
PourCent endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
Am I correct ?



The above zip file is the one you should download:
http://www.masm32.com/board/index.php?action=dlattach;topic=15263.0;id=8563

Frank
Title: Re: m32lib GetPercent need a correction
Post by: ToutEnMasm on December 02, 2010, 08:20:03 AM
Thanks,
I have extracted the good one
Title: Re: m32lib GetPercent need a correction
Post by: FORTRANS on December 02, 2010, 01:12:22 PM
Quote from: Antariy on December 02, 2010, 02:01:30 AM
It seems that PIV hardware have different design of IMUL implementation.
I guess, culprit is in branch after IMUL:

imul edx,[esp+8],28F5C29h
jl @F


So, if change that piece to:


mov edx,[esp+8]
cmp edx,99
ja @F
imul edx,28F5C29h


Code will work guaranteed.
But anyway, this is really strange difference :eek

Hi,

   According to some old documentation I am referring to, only
the carry and overflow flags are valid for the IMUL instruction.
And JL also uses the sign flag, so should not be used.  Do you
have differing documentation as to what are the valid flags
when using IMUL?

Regards,

Steve N.
Title: Re: m32lib GetPercent need a correction
Post by: ToutEnMasm on December 02, 2010, 02:10:45 PM

A very good question on the IMUL flags.
The intel says the SF ZF AF PF flags are undefined (U),What it means ?.
Title: Re: m32lib GetPercent need a correction
Post by: dedndave on December 02, 2010, 03:52:59 PM
undefined means it can be either true or false
Title: Re: m32lib GetPercent need a correction
Post by: ToutEnMasm on December 02, 2010, 06:16:43 PM
Quote
undefined means it can be either true or false

Perhaps a bit short because some comparison seems to work with one cpu and not with another.
I suspect here some secret of particular cpu.
Title: Re: m32lib GetPercent need a correction
Post by: dedndave on December 02, 2010, 07:52:36 PM
not really - lol
if it doesn't work the same way on all CPU's, you don't want to use it

i think it's NexGen CPU's where they divide 25 by 5 or something
the ZF is clear for all CPU's except theirs (something like that)
pretty hokie, if you ask me
some other manufacturer that isn't aware of their quirk (flaw) might violate the rule
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 11:19:50 PM
Quote from: jj2007 on December 02, 2010, 03:08:52 AM
Quote from: Antariy on December 02, 2010, 02:55:45 AM
For example for calculation of some coordinates.

The Earth's diameter is 40,000 km, that makes 40,000,000 metres or 40,000,000,000 millimetres
40,000,000,000/4294967296 = 9.3 millimetres

For GPS, that is a damn good resolution :bg


Distance to the Sun: ~149,700,000,000,000 mm / DWORD range= 34854,74 MM = 34 meters. :bg
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 02, 2010, 11:26:02 PM
Quote from: FORTRANS on December 02, 2010, 01:12:22 PM
Hi,

   According to some old documentation I am referring to, only
the carry and overflow flags are valid for the IMUL instruction.
And JL also uses the sign flag, so should not be used.  Do you
have differing documentation as to what are the valid flags
when using IMUL?

Hi Steve!

Yes, your documentation is right.
Just in the original code used simple (and logical at first look) assumption: if result of multiplication have higher bit set, then IMUL set CF and OF. But, since higher bit setted - this is signed value, and SF should be setted, too (in theory  :lol). When truly overflow was occured, then OF was setted but SF - not (since truncation). So, I checked this on my CPU, and this "rule" seemed to work. And I have used it in algo. But it is not seemed to work on some other CPUs :green2

Steve, at page 2, here: "http://www.masm32.com/board/index.php?topic=15263.msg127043#msg127043" contained old code. Test it please if you have spare time. But firstly needed to comment call SSE code.



Alex
Title: Re: m32lib GetPercent need a correction
Post by: redskull on December 03, 2010, 02:17:47 AM
Intel(R) Core(TM)2 Duo CPU     E4500  @ 2.20GHz (SSE4)
14      cycles for GetPercentSSE
47      cycles for GetPercent
9       cycles for GetPercent2c
9       cycles for GetPercent2nc
9       cycles for GetPercentJJ1
12      cycles for GetPercentJJ2
19      cycles for AxGetPercentInt

38      cycles for GetPercentSSE
37      cycles for GetPercent
22      cycles for GetPercent2c
9       cycles for GetPercent2nc
9       cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
10      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
31      bytes for AxGetPercentInt, result=6790122

--- ok ---


:thumbu

-r
Title: Re: m32lib GetPercent need a correction
Post by: FORTRANS on December 03, 2010, 01:18:55 PM
Quote from: Antariy on December 02, 2010, 11:26:02 PM
Hi Steve!

Steve, at page 2, here: "http://www.masm32.com/board/index.php?topic=15263.msg127043#msg127043" contained old code. Test it please if you have spare time. But firstly needed to comment call SSE code.

Hi Alex,

   Doesn't seem to work.  Neither the *.EXE included nor a
rebuild with SSE code commented out  Seems to hang.
Results displayed (both cases):

G:\WORK\TEMP>2getperc
pre-P4 (SSE1)
-2125943931 for 7FFF0000/1, case 1



Regards,

Steve N.
Quote
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on December 03, 2010, 01:46:46 PM
Steve,
There is an int 3 in ChkPrecision:
            inkey str$(edi), 13, 10
            int 3
            invoke AxGetPercentInt, esi, ebx
Just comment the call to ChkPrecision out, but keep in mind it was there for a reason. AxGetPercentInt behaves differently for negative values.
Title: Re: m32lib GetPercent need a correction
Post by: FORTRANS on December 03, 2010, 02:07:26 PM
Hi jj2007,

   Thanks for pointing that out.  It now goes into an infinite loop.
But it does run.  (?)

-2061530988 for 7FFF2710/4, case 1243708

-2040056706 for 7FFF2710/5, case 1243708

-2018582425 for 7FFF2710/6, case 1243708

-1997108144 for 7FFF2710/7, case 1243708

-1975633863 for 7FFF2710/8, case 1243708

-1954159582 for 7FFF2710/9, case 1243708

-1932685301 for 7FFF2710/10, case 1243708


Regards,

Steve N.
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 03, 2010, 03:01:07 PM
Quote from: FORTRANS on December 03, 2010, 02:07:26 PM
Hi jj2007,

   Thanks for pointing that out.  It now goes into an infinite loop.
But it does run.  (?)

-2061530988 for 7FFF2710/4, case 1243708

-2040056706 for 7FFF2710/5, case 1243708

-2018582425 for 7FFF2710/6, case 1243708

-1997108144 for 7FFF2710/7, case 1243708

-1975633863 for 7FFF2710/8, case 1243708

-1954159582 for 7FFF2710/9, case 1243708

-1932685301 for 7FFF2710/10, case 1243708


Regards,

Steve N.

Hi Steve!

Thank, that is right - code from page 2, which I asks for testing, uses assumption that SF is setted according to the results.
So, it seemed to work only on Prescotts :bg

Here you can get working version: "http://www.masm32.com/board/index.php?topic=15263.msg127085#msg127085".



Alex
Title: Re: m32lib GetPercent need a correction
Post by: FORTRANS on December 03, 2010, 04:51:37 PM
Hi Alex,

   After a couple of edits.

G:\WORK\TEMP> 2getperc
pre-P4 (SSE1)
52      cycles for GetPercent
19      cycles for GetPercent2c
21      cycles for GetPercent2nc
21      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
13      cycles for AxGetPercentInt

52      cycles for GetPercent
19      cycles for GetPercent2c
21      cycles for GetPercent2nc
21      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
13      cycles for AxGetPercentInt

Code sizes:
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
31      bytes for AxGetPercentInt, result=6790122

--- ok ---


Regards,

Steve N.
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 04, 2010, 01:18:38 AM
Quote from: FORTRANS on December 03, 2010, 04:51:37 PM
Hi Alex,

   After a couple of edits.

Hi Steve!

Thank you!

Yes, here we see drawbacks of the used testbed :bg



Alex
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on December 04, 2010, 03:03:55 AM
Quote from: Antariy on December 04, 2010, 01:18:38 AM
Yes, here we see drawbacks of the used testbed :bg

The testbed is transparent - anybody with basic search skills can search for "case" and find the "culprit", ChkPrecision was introduced because some results did not yield the expected results, and so I let it crash there for testing purposes. I didn't know that somebody would develop an algo that yields, as result for invoke GetPercent, -1000, 50 the number 2147483148 (and claims the result is correct...).

So stop blaming the testbed, and start explaining in which real coding situations an "unsigned" GetPercent algo is useful.
Title: Re: m32lib GetPercent need a correction
Post by: dedndave on December 04, 2010, 03:15:06 AM
well - i know of a simple example
displaying percent completion in a file transfer or other procedure
that is a case where a simple percent routine can be used; always positive, 0 to 100 %
it could even be a macro, but speed is not really critical in that case

we shouldn't spend too much time in the laboratory speeding up algos that are used like that
so - you can assume that this algo should be general purpose, and cover a wide range of input values
Title: Re: m32lib GetPercent need a correction
Post by: hutch-- on December 04, 2010, 06:12:07 AM
Humerously enough the GetPercent() algo was designed to do something really simple, calculate screen percentages to the nearest pixel for sizing windows on the screen. I was happy enough to use JJs technique that made it faster and smaller but for what it was designed for, ain't like you are going to tell the difference.
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on December 04, 2010, 07:32:31 AM
Quote from: dedndave on December 04, 2010, 03:15:06 AM
... and cover a wide range of input values

Indeed. Including negative ones like -1000 (which is what algos above do, except AxGetpercent)
:bg
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 05, 2010, 12:31:22 AM
Quote from: jj2007 on December 04, 2010, 03:03:55 AM
Quote from: Antariy on December 04, 2010, 01:18:38 AM
Yes, here we see drawbacks of the used testbed :bg

The testbed is transparent - anybody with basic search skills can search for "case" and find the "culprit", ChkPrecision was introduced because some results did not yield the expected results, and so I let it crash there for testing purposes. I didn't know that somebody would develop an algo that yields, as result for invoke GetPercent, -1000, 50 the number 2147483148 (and claims the result is correct...).

So stop blaming the testbed, and start explaining in which real coding situations an "unsigned" GetPercent algo is useful.

Well...

1. You always can use this algo with any (ANY) kind of numbers. If you want get percent of *negative* number - just NEG it before call, and NEG the result :P
2. Like Dave said, yes - if you copy a file with size of 3.21 GB, that's would be nice, if routine says: "Copied -808.96 MB" :P
3. With screen coordinates, as Hutch said, you is not needed in FPUs precisions and rounding, you can use short, fast integer i386 SX capable code :P

Stop blaming integer code due its "unsignedness". You can use unsigned algo with any kind of data, because it treat all bits as data bits. But, FPU code working with DWORD cannot work with unsigned numbers, because it treat only 31 bits as data. So, unsigned code have no limitations, signed - is.



Alex
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 05, 2010, 12:36:49 AM
Quote from: jj2007 on December 04, 2010, 07:32:31 AM
Quote from: dedndave on December 04, 2010, 03:15:06 AM
... and cover a wide range of input values

Indeed. Including negative ones like -1000 (which is what algos above do, except AxGetpercent)
:bg

Some algos produce wrong results. That's negative results, though.
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 05, 2010, 01:40:31 AM
Quote from: jj2007 on December 04, 2010, 03:03:55 AM
So stop blaming the testbed, and start explaining in which real coding situations an "unsigned" GetPercent algo is useful.

First. I said about that Steve was commented SSE2 code - that's the point of testbed "blaming" - old testbed is not skip not supported algos.
Second. You can use unsigned algo with signed numbers. It would cost ~1-2 clocks more, but it is more flexible than signed algo - which not work with unsigned numbers.
Third. This algo tweak is by 2 clocks faster (for me):

AxGetPercentInt proc source:DWORD, percent:DWORD

mov edx,[esp+8]
mov eax,[esp+4]
cmp edx,99
ja @F
imul edx,28F5C29h

mul edx
mov eax,edx
shr edx,32-2
sub eax,edx
@@:
ret 8

AxGetPercentInt endp


16 bytes of code is taken by loading and checking and exiting. As macro, this code would be 15 bytes long, some cycles long.

It *works* with negative numbers. Just 2 clocks and 4 bytes more as favour. And I claim that 50% of -1000 = 2147483148 is correct. Since -1000 is FFFFFC18h (4294966296).



Alex
Title: Re: m32lib GetPercent need a correction
Post by: jj2007 on December 05, 2010, 07:39:30 AM
Ok, so we'll ask Hutch to put your fast unsigned AxGetPercent into Masm32.
Perhaps the documentatiosn should then contain a little example:
QuoteIn case you have to translate the y coordinate of a sinus function into screen coordinates:
  call MySinus   ; get some value for the Y axis
  mov edx, ScreenFactorY   ; scale factor
  test eax, eax
  .if Sign?
     neg eax
     invoke AxGetPercent, eax, edx
     neg eax
  .else
     invoke AxGetPercent, eax, edx
  .endif

That is easy and elegant, and avoids starting a flame war here, right?
Title: Re: m32lib GetPercent need a correction
Post by: Antariy on December 06, 2010, 02:42:19 AM
Quote from: jj2007 on December 05, 2010, 07:39:30 AM
That is easy and elegant, and avoids starting a flame war here, right?

Where you see flame war? There is not Soap-Box :bg

Just you did not understand what I want to say with "bad old testbed". I want to say that old testbed is not skip unsupported algos from test, and Steve (FORTRANS) was forced to comment SSE2 code.
All next - is answer about real coding situations.

Nothing flame. Just funny that such small piece of code lead to such discussion :bg
Jochen, it would be interesting if you will post timings for the new tweak, because your clocks is always very different.



Alex
Title: Re: m32lib GetPercent need a correction
Post by: oex on December 06, 2010, 06:36:38 AM
Quote from: Antariy on December 06, 2010, 02:42:19 AM
Just funny that such small piece of code lead to such discussion :bg

Small is big on the MASM32 forum :lol