News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Tips: Basic Math

Started by Farabi, February 19, 2005, 04:40:59 AM

Previous topic - Next topic

Farabi

FsinCos is a good instruction. It can count sin and cos a degree at once. But if you using it too many, it can cause your program run slow. The only thing to made it fast is only by remember it on the memory.

So the code like this:

push degree        ;
fsincos                ; 365 Clock cycle

can be replaced by a code like this:


mov esi, sincostable         ; 1 clock cylce
mov ecx,degree              ; 1 clock cycle
mov eax,[esi+ecx*4]       ; 1 clock cycle ( please tell me if im wrong)

It far diferent on speed with no different on result.

This is the support function. This function used to convert degree to radian.  Deg is a value in range 0-3600.


Deg2Rad proc deg:dword ; 104 Clock cycle

fldpi ; Load Phi 8  Clock cycle
push 180 ; Push 180 degree 1  Clock cycle
fidiv dword ptr[esp] ; Div it with #180 86 Clock cycle
pop eax ; eax are Junk 4  Clock cycle

fild deg ; Rot
push 10 ; 1  Clock cycle
fidiv dword ptr[esp] ; Rot/#100, We got float here
pop eax ; eax are Junk 4  Clock cycle

fmul st,st(1) ; Mul, a = (rot/#100) * (Phi/180)

ret
Deg2Rad endp ; Result at FPU






Phase1 proc uses esi ; 28.8 Kbyte needed
LOCAL deg,sin,cos,tan:dword

invoke LocalAlloc,LMEM_DISCARDABLE,28800+14400 + 16*10000
invoke LocalLock,eax
mov SCTbl,eax

mov esi,eax ; 1 Clock cycle
xor ecx,ecx ; 1 Clock cycle
mov deg,ecx ; 1 Clock cycle

mov edx,3600 ; 1 Clock cycle
shl edx,2 ; 2 Clock cycle


@@:
finit ; 17 Clock cycle
pushad
invoke Deg2Rad,deg ; 104 Clock cycle
popad
fsincos ; 365 Clock cycle

fstp sin ; 8 Clock cycle
fstp cos ; 8 Clock cycle

mov eax,sin ; 1 Clock cycle
mov dword ptr[esi],eax ; 1 Clock cycle
mov eax,cos ; 1 Clock cycle
mov dword ptr[esi+edx],eax ; 1 Clock cycle

add deg,1 ; 3 Clock cycle
add esi,4 ; 1 Clock cycle
add ecx,4 ; 1 Clock cycle
cmp deg,3600 ; 2 Clock cycle
jl @b ; 3 Clock cycle
; 513 Clock cycle Each loop
; 1846800 Clock cycle total loop
; 1846806 Clock cycle
mov esi,SCTbl
add esi,28800
;Tan
mov edx,3600 ; 1 Clock cycle
shl edx,2 ; 2 Clock cycle

xor ecx,ecx
mov deg,ecx

@@:
finit ; 17 Clock cycle
pushad
invoke Deg2Rad,deg ; 104 Clock cycle
popad
fptan ; 273 Clock cycle
fstp tan

push tan
pop [esi]
add deg,1 ; 3 Clock cycle
add esi,4 ; 1 Clock cycle
add ecx,4 ; 1 Clock cycle
cmp deg,3600 ; 2 Clock cycle
jl @b ; 3 Clock cycle

mov esi,SCTbl
add esi,28800+14400
mov Line_Table,esi
mov eax,16*10000
mov nLLimit,eax
mov nLPtr,0

ret
Phase1 endp




If the code above is run on your project. This is the function to obtain sin and cos.


GetSin proc uses esi deg:dword

mov esi,SCTbl ; 1 Clock cycle
mov ecx,deg ; 1 Clock cycle
mov eax,dword ptr[esi+ecx*4] ; 1 Clock cycle

ret
GetSin endp

GetCos proc uses esi deg:dword

mov esi,SCTbl ; 1 Clock cycle
add esi,14400 ; 1 Clock cycle
mov ecx,deg ; 1 Clock cycle
mov eax,dword ptr[esi+ecx*4] ; 1 Clock cycle

ret
GetCos endp
GetTan proc uses esi deg:dword

mov esi,SCTbl ; 1 Clock cycle
add esi,28800 ; 1 Clock cycle
mov ecx,deg ; 1 Clock cycle
mov eax,dword ptr[esi+ecx*4] ; 1 Clock cycle

ret
GetTan endp


This function show how to count a circle position.

UMGetPosRound proc uses esi delta:dword,deg:dword ; 138 Clock cycle
LOCAL r_sin,r_cos:dword

; mov esi,SCTbl ; 1 Clock cycle
; mov edx,3600 ; 1 Clock cycle
; shl edx,2 ; 2 Clock cycle
; shl deg,2 ; 4 Clock cycle

; add esi,edx ; 1 Clock cycle
; add esi,deg ; 2 Clock cycle
; fld dword ptr[esi] ; 3 Clock cycle
; mov esi,lpmem ; 1 Clock cycle
; add esi,deg ; 2 Clock cycle
; fld dword ptr[esi] ; 3 Clock cycle

; Until here 20 Clock cycle

invoke GetSin,deg ; 3 Clock cycle
push eax ; 1 Clock cycle
fld dword ptr[esp] ; 3 Clock cycle
pop eax ; 1 Clock cycle
fimul delta ; 24 Clock cycle
fistp r_sin ; 34 Clock cycle

invoke GetCos,deg ; 4 Clock cycle
push eax ; 1 Clock cycle
fld dword ptr[esp] ; 3 Clock cycle
pop eax ; 1 Clock cycle
fimul delta ; 24 Clock cycle
fistp r_cos ; 34 Clock cycle

mov edx,r_cos ; 1 Clock cycle
mov eax,r_sin ; 1 Clock cycle
; 135 Clock cycle

ret
UMGetPosRound endp


Here is the demo how to use all the function.

http://www.geocities.com/realvampire2001/Grafik_Test2_17_2_2k5.zip
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

raymond

Your knowledge of clock cycles may be good but your knowledge of FPU instructions leaves a lot to desire.
Quote
fsincos ; 365 Clock cycle

fstp sin ; 8 Clock cycle
fstp cos ; 8 Clock cycle

You would be filling the sin table with the cos values and the cos table with the sin values.

Quote
fptan ; 273 Clock cycle
fstp tan
You would be filling the tan table entirely with 1's and leaving the tan values on the FPU.

Are you sure you tested all that code?

Have a look at:
http://www.ray.masmcode.com/tutorial/index.html

Raymond

When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

Farabi

#2
Quote from: raymond on February 20, 2005, 05:05:45 AM
Your knowledge of clock cycles may be good but your knowledge of FPU instructions leaves a lot to desire.
Quote
fsincos ; 365 Clock cycle

fstp sin ; 8 Clock cycle
fstp cos ; 8 Clock cycle

You would be filling the sin table with the cos values and the cos table with the sin values.

Quote
fptan ; 273 Clock cycle
fstp tan
You would be filling the tan table entirely with 1's and leaving the tan values on the FPU.

Are you sure you tested all that code?

Have a look at:
http://www.ray.masmcode.com/tutorial/index.html

Raymond




I have test all the code and it working, but I dont know about the tan function. Can I upload the source code here?
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

Mark_Larson

#3
Quote from: Farabi on February 19, 2005, 04:40:59 AM
...

mov esi, sincostable         ; 1 clock cylce
mov ecx,degree              ; 1 clock cycle
mov eax,[esi+ecx*4]       ; 1 clock cycle ( please tell me if im wrong)


Actually raymond his code timing isn't that accurate either.  I wasn't going to say anything until you spoke up.  Move instructions take 1 cycle on a P3 but 0.5 cycles on a P4.  Memory acesses ( you have 3) if it is in the L1 data cache take 2 cycles.  So that adds to the time the instruction takes to execute.  So the 3 moves would have an additional delay of 2 cycles if it is in the L1 cache.  The L1 cache delay also applies to P3's.  I think the L1 cache delay for AMD is 3 cycles.  But it's been a while since I looked at their manual.  In addition the 3rd move can stall the pipeline while waiting for the first 2 moves to complete ( read after write dependency delay). 

  The biggest problem with solutions like this is large tables in a big program.  You get a lot of cache thrashing with all the table accesses.  So a lot of them end up coming from memory which is a LOT slower.  Some people do a variation on this theme with a smaller table and interpolate between two values

  You really should do some accurate timing of the code to see how fast it really is.  Grab MichaelW's timing macros and time your stuff.  Post what processor you are running it in when you post some results.
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

raymond

QuoteI have test all the code and it working, but I dont know about the tan function.

Are you maintaining that you have checked that the values in your sin table (and/or cos table) are correct? You do seem to confirm that you have never checked the values in your tan table.

Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

Farabi

Yes. I never check the value. What function I should use to check a dword floating point?
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

AeroASM

First convert it to a REAL8. (I think you can do this by copying the REAL4 into the upper half of the REAL8, and zeroing the lower half.

Then use FloatToStr (in the MASM32 runtime library)

raymond

Quote from: Farabi on February 22, 2005, 12:17:54 PM
Yes. I never check the value. What function I should use to check a dword floating point?

There are numerous ways you could do it. One of them is to run your app in a debugger and look at what values are being stored and check if those are what you would expect. Another way is to write a small app taking some of the stored values, such as sin(10 deg), and check it against the value you would get from a calculator.

Raymond
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

Farabi

Thanks raymond. Your answer is very help. I have check the tan value. I take the result from st(1). Now I know fptan put 1.0 on st(0) and tan(x) on st(1).

My mistake is just me and my lack.
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"