There are 2 modules in the zip file, one is a replacement for the BinSearch procedure, the other is a replacement for the GetPercent procedure.
The BinSearch proc had an error that on some occasions allowed it to scan past the end of the search buffer. This algorithm has been rewritten and is testing OK. It runs at about 1.4 gig/sec on my development Core2 Quad.
The replacement for GetPercent is about 40% faster than the version it replaces.
To install the two procedures, copy them to the m32lib directory overwriting the existing ones and run the batch file MAKE.BAT to rebuild the MASM32 library.
With thanks to ramguru for finding the problem in BinSearch and the test condition to fix it and JJ for his reciprocal divide design that made the GetPercent algo a lot faster.
any hopes of updated windows.inc files ? :P
Urrrrgh, YES but I have to do some testing and a lot of checking first. More later.
very cool, Hutch
thanks again for all your efforts :U
Hutch,
You might consider the GetPercentJJ2 variant: It is 32 bytes short, equally fast on most CPUs and some cycles faster on the P4, and it produces correct results even if the FPU is full of valid numbers.
R8ZeroPointZeroOne REAL8 0.01
; ^ ^ Double precision is enough, results are exact with 53 bits mantissa
GetPercentJJ2 proc source:DWORD, percent:DWORD
push eax
ffree st(7) ; prevent BAD number if st(7) is valid
fild dword ptr [esp+8] ; load source integer
fimul dword ptr [esp+12] ; multiply 1% by required percentage
fmul R8ZeroPointZeroOne ; multiply with 0.01, i.e. divide source by 100
fistp dword ptr [esp] ; store result on stack
pop eax ; return in eax
ret 8
GetPercentJJ2 endp
For comparison, your version:
GetPercent proc source:DWORD, percent:DWORD
fild DWORD PTR [esp+8] ; load percent
; ## fld10 is a macro declaring a REAL10 in .data ##
fld10 0.01 ; load reciprocal of 100
fmul ; mul by reciprocal = div by 100
fild DWORD PTR [esp+4] ; load the source
fmul ; multiply by previous result
fistp DWORD PTR [esp+8] ; pop FP stack and store result in stack variable
mov eax, [esp+8] ; write result to EAX for return value
ret 8
GetPercent endp
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
24 cycles for GetPercentSSE
27 cycles for GetPercent2c
22 cycles for GetPercentJJ2
21 cycles for GetPercentSSE
24 cycles for GetPercent2c
20 cycles for GetPercentJJ2
21 cycles for GetPercentSSE
23 cycles for GetPercent2c
21 cycles for GetPercentJJ2
21 cycles for GetPercentSSE
24 cycles for GetPercent2c
20 cycles for GetPercentJJ2
21 cycles for GetPercentSSE
23 cycles for GetPercent2c
21 cycles for GetPercentJJ2
Code sizes:
39 bytes for GetPercentSSE, result=6790123
41 bytes for GetPercent2c, result=-2147483648
32 bytes for GetPercentJJ2, result=6790123
JJ,
I took the barbarian approach, I tried the range of variations and selected the fastest one. Strangely enough the shorter versions in terms of instruction count were slower on the Core2 Quad I develop on so i use the higher instruction count version.
:clap:
Quote
any hopes of updated windows.inc files ?
Why Hopes ?! it is made in the same way as windows.inc by:
This one need a minimal changes to replace windows.inc (include translate.in first)
http://www.masm32.com/board/index.php?topic=11531.0
Never enough header's translate ?
http://www.masm32.com/board/index.php?topic=5428.0
and the last:
http://www.masm32.com/board/index.php?topic=11617.msg87347#msg87347
When I assemble the lib with make.bat I get these errors:
fptoa.asm(73) : error A2023:instruction operand must have size
Assembling: fptoa2.asm
fptoa2.asm(72) : error A2023:instruction operand must have size
Assembling: frame3d.asm
I use MASM v.10. Do I have something to change?
Frank
in both files,
fbstp [esp]
can be...
fbstp tbyte ptr [esp]
that should work - if not, try real10 ptr :bg
it seems odd that they are preceeded by
sub esp,10
i would think
sub esp,12
would keep the stack dword aligned
of course, you would have to adjust the stack accordingly by changing
add esp,10
to
add esp,12
EDIT - of course, half the guys in here would say this is ok
fbstp tbyte ptr [esp-10]
and forget both stack adjustments altogether
Quote from: dedndave on November 09, 2010, 03:35:10 PM
in both files,
fbstp [esp]
can be...
fbstp tbyte ptr [esp]
that should work - if not, try real10 ptr - lol
it seems odd that they are preceeded by
sub esp,10
i would think
sub esp,12
would keep the stack dword aligned
of course, you have to adjust the stack accordingly by changing
add esp,10
to
add esp,12
Hi Dave.
Are you answering my question? ::) or what?
Why should those instructions be changed? Does it depend on
the MASM version? Are you talking about the program logic?
tbyte means 10 bytes if I recall it correctly.
Why then:
sub esp,12
?
Frank
to keep the stack 4-aligned
it may not make a difference, since the stack doesn't appear to be accessed otherwise :P
I'm curious to see what Hutch suggests as well, and what version of MASM did
he use to assemble the library. Probably there are some differences between ML 6.1
and ML 10. I'm not sure about what can cause this error for the time being.
By the way, I'll try to make the changes you suggested, except for the "12" that
I actually don't understand if is needed or useful whatsoever. :P
Frank
FBSTP always writes 10 bytes, so size shouldn't really be needed
perhaps, in ML 10, they tightened the ropes a bit
Quote from: dedndave on November 09, 2010, 04:01:48 PM
FBSTP always writes 10 bytes, so size shouldn't really be needed
perhaps, in ML 10, they tightened the ropes a bit
Indeed they did. :lol
Adding the
tbyte ptr now they both assemble without errors. :U
Probably Hutch is not using the last ML version. :P
Thanks Dave.
Frank
i suppose he uses 6.14, as it is the version that may be distributed with the masm32 package
the fptoa problem and one another is corrected here
http://www.masm32.com/board/index.php?topic=10051.0
Quote from: ToutEnMasm on November 09, 2010, 06:30:20 PM
the fptoa problem and one another is corrected here
http://www.masm32.com/board/index.php?topic=10051.0
Thanks Mon Ami. :bg
Frank
Frank,
While I own every version of ML.EXE since 1990, the only one I can distribute is ML 6.14 and the library builds with that version. If you do the mod that Dave mentioned make sure you can build it with 6.14 as well.
Quote from: hutch-- on November 09, 2010, 11:45:07 PM
Frank,
While I own every version of ML.EXE since 1990, the only one I can distribute is ML 6.14 and the library builds with that version. If you do the mod that Dave mentioned make sure you can build it with 6.14 as well.
Well, I don't think I'm going to use ML 6.14. But if it is worth something I'll try to build the lib
with ML 6.14.
Frank
6.14 shouldn't burp on "tbyte ptr"
Frank,
Just add these two files to the masm32 library and use make.bat. I have tested the two files that Kenneth Zheng posted and they build fine. They build with the versions of ML that I hve here.