News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

2 masm32.lib updates.

Started by hutch--, November 09, 2010, 03:19:15 AM

Previous topic - Next topic

hutch--

There are 2 modules in the zip file, one is a replacement for the BinSearch procedure, the other is a replacement for the GetPercent procedure.

The BinSearch proc had an error that on some occasions allowed it to scan past the end of the search buffer. This algorithm has been rewritten and is testing OK. It runs at about 1.4 gig/sec on my development Core2 Quad.

The replacement for GetPercent is about 40% faster than the version it replaces.

To install the two procedures, copy them to the m32lib directory overwriting the existing ones and run the batch file MAKE.BAT to rebuild the MASM32 library.

With thanks to ramguru for finding the problem in BinSearch and the test condition to fix it and JJ for his reciprocal divide design that made the GetPercent algo a lot faster.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave


hutch--

Urrrrgh, YES but I have to do some testing and a lot of checking first. More later.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

very cool, Hutch

thanks again for all your efforts   :U

jj2007

Hutch,

You might consider the GetPercentJJ2 variant: It is 32 bytes short, equally fast on most CPUs and some cycles faster on the P4, and it produces correct results even if the FPU is full of valid numbers.

R8ZeroPointZeroOne REAL8 0.01
; ^ ^ Double precision is enough, results are exact with 53 bits mantissa
GetPercentJJ2 proc source:DWORD, percent:DWORD
  push eax
  ffree st(7) ; prevent BAD number if st(7) is valid
  fild dword ptr [esp+8] ; load source integer
  fimul dword ptr [esp+12] ; multiply 1% by required percentage
  fmul R8ZeroPointZeroOne ; multiply with 0.01, i.e. divide source by 100
  fistp dword ptr [esp] ; store result on stack
  pop eax ; return in eax
  ret 8
GetPercentJJ2 endp


For comparison, your version:
GetPercent proc source:DWORD, percent:DWORD
    fild DWORD PTR [esp+8]  ; load percent
; ## fld10 is a macro declaring a REAL10 in .data ##
    fld10 0.01              ; load reciprocal of 100
    fmul                    ; mul by reciprocal = div by 100
    fild DWORD PTR [esp+4]  ; load the source
    fmul                    ; multiply by previous result
    fistp DWORD PTR [esp+8] ; pop FP stack and store result in stack variable
    mov eax, [esp+8]        ; write result to EAX for return value
    ret 8
GetPercent endp


Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
24      cycles for GetPercentSSE
27      cycles for GetPercent2c
22      cycles for GetPercentJJ2

21      cycles for GetPercentSSE
24      cycles for GetPercent2c
20      cycles for GetPercentJJ2

21      cycles for GetPercentSSE
23      cycles for GetPercent2c
21      cycles for GetPercentJJ2

21      cycles for GetPercentSSE
24      cycles for GetPercent2c
20      cycles for GetPercentJJ2

21      cycles for GetPercentSSE
23      cycles for GetPercent2c
21      cycles for GetPercentJJ2

Code sizes:
39      bytes for GetPercentSSE, result=6790123
41      bytes for GetPercent2c, result=-2147483648
32      bytes for GetPercentJJ2, result=6790123

hutch--

JJ,

I took the barbarian approach, I tried the range of variations and selected the fastest one. Strangely enough the shorter versions in terms of instruction count were slower on the Core2 Quad I develop on so i use the higher instruction count version.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

ToutEnMasm


ToutEnMasm


Quote
any hopes of updated windows.inc files ?
Why Hopes ?! it is made in the same way as windows.inc by:
This one need a minimal changes to replace windows.inc (include translate.in first)
http://www.masm32.com/board/index.php?topic=11531.0
Never enough header's translate ?
http://www.masm32.com/board/index.php?topic=5428.0
and the last:
http://www.masm32.com/board/index.php?topic=11617.msg87347#msg87347



frktons

When I assemble the lib with make.bat I get these errors:

fptoa.asm(73) : error A2023:instruction operand must have size
Assembling: fptoa2.asm
fptoa2.asm(72) : error A2023:instruction operand must have size
Assembling: frame3d.asm


I use MASM v.10. Do I have  something to change?

Frank
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

in both files,
        fbstp   [esp]
can be...
        fbstp tbyte ptr [esp]
that should work - if not, try real10 ptr   :bg

it seems odd that they are preceeded by
        sub     esp,10
i would think
        sub     esp,12
would keep the stack dword aligned
of course, you would have to adjust the stack accordingly by changing
        add     esp,10
to
        add     esp,12

EDIT - of course, half the guys in here would say this is ok
        fbstp tbyte ptr [esp-10]
and forget both stack adjustments altogether

frktons

Quote from: dedndave on November 09, 2010, 03:35:10 PM
in both files,
        fbstp   [esp]
can be...
        fbstp tbyte ptr [esp]
that should work - if not, try real10 ptr - lol

it seems odd that they are preceeded by
        sub     esp,10
i would think
        sub     esp,12
would keep the stack dword aligned
of course, you have to adjust the stack accordingly by changing
        add     esp,10
to
        add     esp,12

Hi Dave.

Are you answering my question?  ::) or what?
Why should those instructions be changed? Does it depend on
the MASM version? Are you talking about the program logic?
tbyte means 10 bytes if I recall it correctly.
Why then:
        sub     esp,12?

Frank

Mind is like a parachute. You know what to do in order to use it :-)

dedndave

to keep the stack 4-aligned
it may not make a difference, since the stack doesn't appear to be accessed otherwise   :P

frktons

I'm curious  to see what Hutch suggests as well, and what version of MASM did
he use to assemble the library. Probably there are some differences between ML 6.1
and ML 10. I'm not sure about what can cause this error for the time being.

By the way, I'll try to make the changes you suggested, except for the "12" that
I actually don't understand if is needed or useful whatsoever.  :P

Frank
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

FBSTP always writes 10 bytes, so size shouldn't really be needed
perhaps, in ML 10, they tightened the ropes a bit

frktons

Quote from: dedndave on November 09, 2010, 04:01:48 PM
FBSTP always writes 10 bytes, so size shouldn't really be needed
perhaps, in ML 10, they tightened the ropes a bit

Indeed they did.  :lol

Adding the tbyte ptr now they both assemble without errors.  :U

Probably Hutch is not using the last ML version.  :P

Thanks Dave.

Frank
Mind is like a parachute. You know what to do in order to use it :-)