News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

m32lib GetPercent need a correction

Started by ToutEnMasm, November 07, 2010, 06:54:01 AM

Previous topic - Next topic

hutch--

Yes, works fine with the call commented out.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
11      cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
21      cycles for AxGetPercentInt

13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
11      cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
21      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
26      bytes for AxGetPercentInt, result=12345678

--- ok ---
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Antariy

Quote from: hutch-- on December 02, 2010, 01:45:31 AM
Yes, works fine with the call commented out.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
11      cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
21      cycles for AxGetPercentInt

13      cycles for GetPercentSSE
21      cycles for GetPercent
8       cycles for GetPercent2c
9       cycles for GetPercent2nc
11      cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
21      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
26      bytes for AxGetPercentInt, result=12345678

--- ok ---


Thank you, Hutch!

But results looks strange  :eek
Very interesting thing :eek

Antariy

Quote from: hutch-- on December 02, 2010, 01:45:31 AM

26      bytes for AxGetPercentInt, result=12345678


It seems that PIV hardware have different design of IMUL implementation.
I guess, culprit is in branch after IMUL:

imul edx,[esp+8],28F5C29h
jl @F


So, if change that piece to:


mov edx,[esp+8]
cmp edx,99
ja @F
imul edx,28F5C29h


Code will work guaranteed.
But anyway, this is really strange difference :eek

Antariy

I have changed code, and timings up by 1 clock.
It should work on any CPU. But this is pity that sign bit and overflow bit have other layouts.

Hutch, test this new one, please!



Alex

jj2007

Congrats, Alex, it works now, and it's very fast :U
However, for MasmBasic I will keep the old design that yields -550 for PerCent(-1000, 55); the need for an unsigned PerCent(4294966296, 55)=2362231462 is unclear to me.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
11      cycles for GetPercentSSE
37      cycles for GetPercent
14      cycles for GetPercent2c
15      cycles for GetPercent2nc
15      cycles for GetPercentJJ1
14      cycles for GetPercentJJ2
10      cycles for AxGetPercentInt

11      cycles for GetPercentSSE
37      cycles for GetPercent
14      cycles for GetPercent2c
15      cycles for GetPercent2nc
15      cycles for GetPercentJJ1
14      cycles for GetPercentJJ2
10      cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
31      bytes for AxGetPercentInt, result=6790122

Antariy

Quote from: jj2007 on December 02, 2010, 02:47:29 AM
Congrats, Alex, it works now, and it's very fast :U
However, for MasmBasic I will keep the old design that yields -550 for PerCent(-1000, 55); the need for an unsigned PerCent(4294966296, 55) is unclear to me.

Thanks!

Of course, you have using that algo which you want, I have not impose it at all. I just having some spare time, which I spent to it.
I prefer to treat numbers as unsigned, so I trying to make version which work with number of any size without speed loss, and that is all  :bg
EDITED: Why I prefer unsigned: because usually in programming you are needed in positive numbers, rather than negative. For example for calculation of some coordinates.



Alex

jj2007

Quote from: Antariy on December 02, 2010, 02:55:45 AM
For example for calculation of some coordinates.

The Earth's diameter is 40,000 km, that makes 40,000,000 metres or 40,000,000,000 millimetres
40,000,000,000/4294967296 = 9.3 millimetres

For GPS, that is a damn good resolution :bg

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
21      cycles for GetPercentSSE
53      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
21      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

21      cycles for GetPercentSSE
52      cycles for GetPercent
24      cycles for GetPercent2c
26      cycles for GetPercent2nc
24      cycles for GetPercentJJ1
20      cycles for GetPercentJJ2
16      cycles for AxGetPercentInt

oex

AMD Sempron(tm) Processor 3100+ (SSE3)
14      cycles for GetPercentSSE
30      cycles for GetPercent
13      cycles for GetPercent2c
15      cycles for GetPercent2nc
16      cycles for GetPercentJJ1
15      cycles for GetPercentJJ2
9       cycles for AxGetPercentInt

12      cycles for GetPercentSSE
29      cycles for GetPercent
13      cycles for GetPercent2c
14      cycles for GetPercent2nc
14      cycles for GetPercentJJ1
14      cycles for GetPercentJJ2
9       cycles for AxGetPercentInt
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

frktons

The last version is working:

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
14      cycles for GetPercentSSE
36      cycles for GetPercent
9       cycles for GetPercent2c
9       cycles for GetPercent2nc
9       cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
7       cycles for AxGetPercentInt

13      cycles for GetPercentSSE
36      cycles for GetPercent
9       cycles for GetPercent2c
9       cycles for GetPercent2nc
9       cycles for GetPercentJJ1
10      cycles for GetPercentJJ2
7       cycles for AxGetPercentInt

Code sizes:
39      bytes for GetPercentSSE, result=6790123
36      bytes for GetPercent, result=-2147483648
41      bytes for GetPercent2c, result=-2147483648
45      bytes for GetPercent2nc, result=6790123
37      bytes for GetPercentJJ1, result=6790123
32      bytes for GetPercentJJ2, result=6790123
31      bytes for AxGetPercentInt, result=6790122

--- ok ---

:U
Mind is like a parachute. You know what to do in order to use it :-)

ToutEnMasm


Things begin a little unclear for me.

Last version of AxGetPercentInt is this one
Quote
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
PourCent proc source:DWORD, percent:DWORD

      mov eax,[esp+4]
      
      if 0
      imul edx,[esp+8],28F5C29h
      jl @F
      else
      mov edx,[esp+8]
      cmp edx,99
      ja @F
      imul edx,28F5C29h
      endif

      mul edx
      mov eax,edx
      shr edx,32-2
      sub eax,edx
      @@:
      ret 8         
            
PourCent endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
Am I correct ?


frktons

Quote from: ToutEnMasm on December 02, 2010, 07:32:07 AM

Things begin a little unclear for me.

Last version of AxGetPercentInt is this one
Quote
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
PourCent proc source:DWORD, percent:DWORD

      mov eax,[esp+4]
      
      if 0
      imul edx,[esp+8],28F5C29h
      jl @F
      else
      mov edx,[esp+8]
      cmp edx,99
      ja @F
      imul edx,28F5C29h
      endif

      mul edx
      mov eax,edx
      shr edx,32-2
      sub eax,edx
      @@:
      ret 8         
            
PourCent endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
Am I correct ?



The above zip file is the one you should download:
http://www.masm32.com/board/index.php?action=dlattach;topic=15263.0;id=8563

Frank
Mind is like a parachute. You know what to do in order to use it :-)

ToutEnMasm


FORTRANS

Quote from: Antariy on December 02, 2010, 02:01:30 AM
It seems that PIV hardware have different design of IMUL implementation.
I guess, culprit is in branch after IMUL:

imul edx,[esp+8],28F5C29h
jl @F


So, if change that piece to:


mov edx,[esp+8]
cmp edx,99
ja @F
imul edx,28F5C29h


Code will work guaranteed.
But anyway, this is really strange difference :eek

Hi,

   According to some old documentation I am referring to, only
the carry and overflow flags are valid for the IMUL instruction.
And JL also uses the sign flag, so should not be used.  Do you
have differing documentation as to what are the valid flags
when using IMUL?

Regards,

Steve N.

ToutEnMasm


A very good question on the IMUL flags.
The intel says the SF ZF AF PF flags are undefined (U),What it means ?.