m32lib GetPercent need a correction - Page 2

Main Menu

Home
Search

May 20, 2025, 08:25:25 PM

News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

m32lib GetPercent need a correction

Started by ToutEnMasm, November 07, 2010, 06:54:01 AM

Previous topic - Next topic

Print

Go Down Pages 1 2 3 4 ... 6

hutch--

In Memoriam
Administrator
Member
Posts: 8,397
Masm32 SDK Creator
Location: Now at Peace
Logged

Re: m32lib GetPercent need a correction

#15

November 07, 2010, 11:18:07 PM

I wrote this years ago to plug up a simple requirement, things like integer sizing for screen display and it has never been a performance issue where it was normally used. Now I have no doubt that it can be optimised but in the context of its use, its a "who cares" issue.

Now I understand what Yves has said but the procedure has always handled high call counts with no problems at all. Here is a test piece that calls it in a loop 1 million times.

Code Select


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    push esi

    mov esi, 1000000

  lbl0:
    print str$(rv(GetPercent,esi,50)),13,10
    sub esi, 1
    jnz lbl0

    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

Download site for MASM32 New MASM Forum
https://masm32.com https://masm32.com/board/index.php

hutch--

In Memoriam
Administrator
Member
Posts: 8,397
Masm32 SDK Creator
Location: Now at Peace
Logged

Re: m32lib GetPercent need a correction

#16

November 08, 2010, 12:14:55 AM

:bg

JJ,

Code Select


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
14      cycles for GetPercent
6       cycles for GetPercentJJ1
5       cycles for GetPercentJJ2

16      cycles for GetPercent
6       cycles for GetPercentJJ1
5       cycles for GetPercentJJ2

16      cycles for GetPercent
6       cycles for GetPercentJJ1
5       cycles for GetPercentJJ2

Code sizes:
32      for GetPercentJJ1
32      for GetPercentJJ2

--- ok ---

Removing the FDIV certainly improves the time. :P

Download site for MASM32 New MASM Forum
https://masm32.com https://masm32.com/board/index.php

jj2007

Member
Posts: 5,393
Location: Italy
Logged

Re: m32lib GetPercent need a correction

#17

November 08, 2010, 12:55:49 AM

Quote from: hutch-- on November 08, 2010, 12:14:55 AM
Removing the FDIV certainly improves the time. :P

Certainly. SSE speeds it up once more, but I ran into a problem with the mulsd instruction: It is fast if the xmm register is already in float format, but if not, it costs over 200 cycles! The first version below works fine but needs lots of conversions.

Code Select

GetPercentSSE_s:
GPJ005	REAL8 0.01
GetPercentSSE proc source:DWORD, percent:DWORD
	cvtsi2sd xmm0, dword ptr [esp+4]		; source
	cvtsi2sd xmm1, dword ptr [esp+8]		; percent
	mulsd xmm0, xmm1
	mulsd xmm0, GPJ005		; multiply with 0.01, i.e. divide source by 100
	cvtsd2si eax, xmm0
	ret 8		; 11 cycles
GetPercentSSE endp
GetPercentSSE_e:

GetPercentSSEs_s:
GPJ005s	REAL8 0.01
GetPercentSSEs proc source:DWORD, percent:DWORD
;	xorps xmm0, xmm0
;	movaps xmm1, xmm0	; no effect
	movd xmm0, dword ptr [esp+4]	; source
	movd xmm1, dword ptr [esp+8]	; percent
;	int 3	; OPT_Olly 2
	pmuludq xmm0, xmm1		; source * percent, OK (pm xmm0, qword ptr [esp+8] possible but slow)
	MakeSlow = 1
	if MakeSlow			; first branch: result is correct but >200 cycles
		mulsd xmm0, GPJ005s	; multiply with 0.01, i.e. divide source by 100 - SLOOOOOOW ###
		movd eax, xmm0
	else				; second branch: result correct, 27 cycles
		cvtdq2pd xmm0, xmm0	; convert integer to float (Convert  Packed Doubleword Integers to Packed Double-Precision Floating-Point Values)
		mulsd xmm0, GPJ005s	; multiply with 0.01, i.e. divide source by 100 - fast
		cvtsd2si eax, xmm0
	endif
	ret 8		; 259 or 348 cycles
GetPercentSSEs endp
GetPercentSSEs_e:

New testbed attached, now with more realistic cycle counts (there is a REPEAT 64 ... ENDM followed by shr eax, 6).

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
30      cycles for GetPercent
14      cycles for GetPercentJJ2
11      cycles for GetPercentSSE
263     cycles for GetPercentSSEs

Code sizes:
36      bytes for GetPercent, result=6790123
32      bytes for GetPercentJJ2, result=6790123
39      bytes for GetPercentSSE, result=6790123
39      bytes for GetPercentSSEs, result=6790123

GetPercentTimings2.zip
4.53 KB
downloaded 316 times

Masm32 Tips, Tricks and Traps

Antariy

Member
Posts: 893
Location: Earth Planet, Solar System, Milky Way Galaxy, Space
Logged

Re: m32lib GetPercent need a correction

#18

November 08, 2010, 01:02:23 AM

Code Select


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
44      cycles for GetPercent
21      cycles for GetPercentJJ2
22      cycles for GetPercentSSE
1104    cycles for GetPercentSSEs

Code sizes:
36      bytes for GetPercent, result=6790123
32      bytes for GetPercentJJ2, result=6790123
39      bytes for GetPercentSSE, result=6790123
39      bytes for GetPercentSSEs, result=6790123

jj2007

Member
Posts: 5,393
Location: Italy
Logged

Re: m32lib GetPercent need a correction

#19

November 08, 2010, 01:04:27 AM

Thanks, Alex. So the ordinary FPU version is one cycle faster than the fast SSE2 version... and a whopping 1100 cycles for the bad SSE stuff ::)

Masm32 Tips, Tricks and Traps

hutch--

In Memoriam
Administrator
Member
Posts: 8,397
Masm32 SDK Creator
Location: Now at Peace
Logged

Re: m32lib GetPercent need a correction

#20

November 08, 2010, 01:50:56 AM

JJ,

Inspired by you reciprocal multiply, this one has abot 40% legs on the old version. I changed the calculation order as well.

Code Select


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

GetPercent2 proc source:DWORD, percent:DWORD

    fild DWORD PTR [esp+8]  ; load percent
    fld10 0.01              ; load reciprocal of 100
    fmul                    ; mul by reciprocal = div by 100
    fild DWORD PTR [esp+4]  ; load the source
    fmul                    ; multiply by previous result
    fistp DWORD PTR [esp+8] ; pop FP stack and store result in stack variable
    mov eax, [esp+8]        ; write result to EAX for return value
    ret 8

GetPercent2 endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Download site for MASM32 New MASM Forum
https://masm32.com https://masm32.com/board/index.php

jj2007

Member
Posts: 5,393
Location: Italy
Logged

Re: m32lib GetPercent need a correction

#21

November 08, 2010, 07:31:41 AM Last Edit: November 08, 2010, 09:03:48 AM by jj2007

Great, so now we are waiting for Lingo :bg

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
11      cycles for GetPercentSSE
36      cycles for GetPercent
13      cycles for GetPercent2c
14      cycles for GetPercent2nc
14      cycles for GetPercentJJ2

11      cycles for GetPercentSSE
36      cycles for GetPercent
14      cycles for GetPercent2c
14      cycles for GetPercent2nc
14      cycles for GetPercentJJ2

11      cycles for GetPercentSSE
36      cycles for GetPercent
13      cycles for GetPercent2c
14      cycles for GetPercent2nc
14      cycles for GetPercentJJ2

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=6790123
41      bytes for GetPercent2c, result=6790123
45      bytes for GetPercent2nc, result=6790123
32      bytes for GetPercentJJ2, result=6790123

P.S.: GetPercent2c and GetPercent2nc are two variants of Hutch' new algo. I like the JJ2 variant, it fits into 2 paras and is reasonably fast.

Edit: Prescott P4:

Code Select

20      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
20      cycles for GetPercentJJ2

GetPercentTimings3.zip
6.79 KB
downloaded 317 times

Masm32 Tips, Tricks and Traps

hutch--

In Memoriam
Administrator
Member
Posts: 8,397
Masm32 SDK Creator
Location: Now at Peace
Logged

Re: m32lib GetPercent need a correction

#22

November 08, 2010, 10:47:01 AM

Code Select


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
14      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
10      cycles for GetPercentJJ2

13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
10      cycles for GetPercentJJ2

13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
10      cycles for GetPercentJJ2

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=6790123
41      bytes for GetPercent2c, result=6790123
45      bytes for GetPercent2nc, result=6790123
32      bytes for GetPercentJJ2, result=6790123

--- ok ---

Download site for MASM32 New MASM Forum
https://masm32.com https://masm32.com/board/index.php

ToutEnMasm

Member
Posts: 1,496
FA is a musical note to play with cl
Location: FRANCE
Logged

Re: m32lib GetPercent need a correction

#23

November 08, 2010, 01:32:02 PM

Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
21 cycles for GetPercentSSE
47 cycles for GetPercent
24 cycles for GetPercent2c
26 cycles for GetPercent2nc
21 cycles for GetPercentJJ2

21 cycles for GetPercentSSE
47 cycles for GetPercent
24 cycles for GetPercent2c
26 cycles for GetPercent2nc
20 cycles for GetPercentJJ2

21 cycles for GetPercentSSE
48 cycles for GetPercent
24 cycles for GetPercent2c
26 cycles for GetPercent2nc
21 cycles for GetPercentJJ2

Code sizes:
39 bytes for GetPercentSSE, result=6790123
36 bytes for GetPercent, result=6790123
41 bytes for GetPercent2c, result=6790123
45 bytes for GetPercent2nc, result=6790123
32 bytes for GetPercentJJ2, result=6790123

--- ok ---

My Code Site
http://perso.orange.fr/luce.yves/

Antariy

Member
Posts: 893
Location: Earth Planet, Solar System, Milky Way Galaxy, Space
Logged

Re: m32lib GetPercent need a correction

#24

December 01, 2010, 02:04:30 PM

Here is my integer version of GetPercent. By my tests - fast and small.

Code Select


OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
AxGetPercentInt proc source:DWORD, percent:DWORD

		mov eax,[esp+4]
		imul edx,[esp+8],28F5C29h
		jl @F

		mul edx
		mov eax,edx
		shr edx,32-2
		sub eax,edx
		@@:
		ret 8			
				
AxGetPercentInt endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

The code is indended for calculation of values by percents in range 0-100. If percent is greater than, or equal to 100, then returned source value.
Code speed have no dependencyes from the value of the source number or value of percent.
Unusual correction is used.

If algo will always used with percents less than 100, then first 4 lines of the code can be replace to:

Code Select


		imul eax,[esp+8],28F5C29h
		jl @F

		mul dword ptr [esp+4]

Then timings by 2 clocks faster.

For testing is used latest Jochen's testbed. But I have fix flaw in the ChkPrecision code, which now calls to the right FPU calculation code.

Here is my timings:

Code Select


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
24      cycles for GetPercentSSE
47      cycles for GetPercent
23      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
20      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

24      cycles for GetPercentSSE
46      cycles for GetPercent
23      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
20      cycles for GetPercentJJ2
15      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
26      bytes for AxGetPercentInt, result=6790122

I have asking for testing. Thanks!

Alex

AxGetPercentInt.zip
8.09 KB
downloaded 326 times

ToutEnMasm

Member
Posts: 1,496
FA is a musical note to play with cl
Location: FRANCE
Logged

Re: m32lib GetPercent need a correction

#25

December 01, 2010, 03:26:00 PM

Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
24 cycles for GetPercentSSE
47 cycles for GetPercent
24 cycles for GetPercent2c
26 cycles for GetPercent2nc
24 cycles for GetPercentJJ1
21 cycles for GetPercentJJ2
16 cycles for AxGetPercentInt

24 cycles for GetPercentSSE
47 cycles for GetPercent
24 cycles for GetPercent2c
26 cycles for GetPercent2nc
24 cycles for GetPercentJJ1
21 cycles for GetPercentJJ2
16 cycles for AxGetPercentInt

Code sizes:
39 bytes for GetPercentSSE, result=6790123
36 bytes for GetPercent, result=-2147483648
41 bytes for GetPercent2c, result=-2147483648
45 bytes for GetPercent2nc, result=6790123
37 bytes for GetPercentJJ1, result=6790123
32 bytes for GetPercentJJ2, result=6790123
26 bytes for AxGetPercentInt, result=6790122

--- ok ---

My Code Site
http://perso.orange.fr/luce.yves/

hutch--

In Memoriam
Administrator
Member
Posts: 8,397
Masm32 SDK Creator
Location: Now at Peace
Logged

Re: m32lib GetPercent need a correction

#26

December 01, 2010, 10:34:56 PM

Alex,

I get a GP fault out of your last zip on xp sp2.

Code Select


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
-2125943931 for 7FFF0000/1, case 26750588

Download site for MASM32 New MASM Forum
https://masm32.com https://masm32.com/board/index.php

Antariy

Member
Posts: 893
Location: Earth Planet, Solar System, Milky Way Galaxy, Space
Logged

Re: m32lib GetPercent need a correction

#27

December 01, 2010, 11:13:54 PM

Quote from: hutch-- on December 01, 2010, 10:34:56 PM
Alex,

I get a GP fault out of your last zip on xp sp2.

That's something with ChkPrecision/FPU code, apparently. AxGetPercentInt have no instructions which can cause GPF.

I have checked this, and found a reason, Jochen intentionally made FPU stack overflow in the MACRO:

Code Select


Algo MACRO arg
	finit
	REPEAT 8
		fldpi
	ENDM

ChkPrecision code uses simple FPU code as etalone. And this simple code is not handle cases when FPU stack is full. This is reason why you get exception.

Now I have commented FLDPI line, and it should work properly.

As sayed, that's not bug in my code, just Jochen not write documentation for his testing variant :bg

Hutch, and all, test this one, please.

Alex

AxGetPercentInt.zip
8.09 KB
downloaded 322 times

Antariy

Member
Posts: 893
Location: Earth Planet, Solar System, Milky Way Galaxy, Space
Logged

Re: m32lib GetPercent need a correction

#28

December 01, 2010, 11:20:12 PM

Quote from: ToutEnMasm on December 01, 2010, 03:26:00 PM
Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
24 cycles for GetPercentSSE
47 cycles for GetPercent
24 cycles for GetPercent2c
26 cycles for GetPercent2nc
24 cycles for GetPercentJJ1
21 cycles for GetPercentJJ2
16 cycles for AxGetPercentInt

24 cycles for GetPercentSSE
47 cycles for GetPercent
24 cycles for GetPercent2c
26 cycles for GetPercent2nc
24 cycles for GetPercentJJ1
21 cycles for GetPercentJJ2
16 cycles for AxGetPercentInt

Code sizes:
39 bytes for GetPercentSSE, result=6790123
36 bytes for GetPercent, result=-2147483648
41 bytes for GetPercent2c, result=-2147483648
45 bytes for GetPercent2nc, result=6790123
37 bytes for GetPercentJJ1, result=6790123
32 bytes for GetPercentJJ2, result=6790123
26 bytes for AxGetPercentInt, result=6790122

--- ok ---

Thanks, Luce!

dedndave

Member
Posts: 10,231
Logged

Re: m32lib GetPercent need a correction

#29

December 01, 2010, 11:43:54 PM

prescott w/htt

Code Select

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
24      cycles for GetPercentSSE
47      cycles for GetPercent
24      cycles for GetPercent2c
29      cycles for GetPercent2nc
26      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

25      cycles for GetPercentSSE
47      cycles for GetPercent
30      cycles for GetPercent2c
30      cycles for GetPercent2nc
26      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

:U

Print

Go Up Pages 1 2 3 4 ... 6

User actions

Print