FsinCos is a good instruction. It can count sin and cos a degree at once. But if you using it too many, it can cause your program run slow. The only thing to made it fast is only by remember it on the memory.
So the code like this:
push degree ;
fsincos ; 365 Clock cycle
can be replaced by a code like this:
mov esi, sincostable ; 1 clock cylce
mov ecx,degree ; 1 clock cycle
mov eax,[esi+ecx*4] ; 1 clock cycle ( please tell me if im wrong)
It far diferent on speed with no different on result.
This is the support function. This function used to convert degree to radian. Deg is a value in range 0-3600.
Deg2Rad proc deg:dword ; 104 Clock cycle
fldpi ; Load Phi 8 Clock cycle
push 180 ; Push 180 degree 1 Clock cycle
fidiv dword ptr[esp] ; Div it with #180 86 Clock cycle
pop eax ; eax are Junk 4 Clock cycle
fild deg ; Rot
push 10 ; 1 Clock cycle
fidiv dword ptr[esp] ; Rot/#100, We got float here
pop eax ; eax are Junk 4 Clock cycle
fmul st,st(1) ; Mul, a = (rot/#100) * (Phi/180)
ret
Deg2Rad endp ; Result at FPU
Phase1 proc uses esi ; 28.8 Kbyte needed
LOCAL deg,sin,cos,tan:dword
invoke LocalAlloc,LMEM_DISCARDABLE,28800+14400 + 16*10000
invoke LocalLock,eax
mov SCTbl,eax
mov esi,eax ; 1 Clock cycle
xor ecx,ecx ; 1 Clock cycle
mov deg,ecx ; 1 Clock cycle
mov edx,3600 ; 1 Clock cycle
shl edx,2 ; 2 Clock cycle
@@:
finit ; 17 Clock cycle
pushad
invoke Deg2Rad,deg ; 104 Clock cycle
popad
fsincos ; 365 Clock cycle
fstp sin ; 8 Clock cycle
fstp cos ; 8 Clock cycle
mov eax,sin ; 1 Clock cycle
mov dword ptr[esi],eax ; 1 Clock cycle
mov eax,cos ; 1 Clock cycle
mov dword ptr[esi+edx],eax ; 1 Clock cycle
add deg,1 ; 3 Clock cycle
add esi,4 ; 1 Clock cycle
add ecx,4 ; 1 Clock cycle
cmp deg,3600 ; 2 Clock cycle
jl @b ; 3 Clock cycle
; 513 Clock cycle Each loop
; 1846800 Clock cycle total loop
; 1846806 Clock cycle
mov esi,SCTbl
add esi,28800
;Tan
mov edx,3600 ; 1 Clock cycle
shl edx,2 ; 2 Clock cycle
xor ecx,ecx
mov deg,ecx
@@:
finit ; 17 Clock cycle
pushad
invoke Deg2Rad,deg ; 104 Clock cycle
popad
fptan ; 273 Clock cycle
fstp tan
push tan
pop [esi]
add deg,1 ; 3 Clock cycle
add esi,4 ; 1 Clock cycle
add ecx,4 ; 1 Clock cycle
cmp deg,3600 ; 2 Clock cycle
jl @b ; 3 Clock cycle
mov esi,SCTbl
add esi,28800+14400
mov Line_Table,esi
mov eax,16*10000
mov nLLimit,eax
mov nLPtr,0
ret
Phase1 endp
If the code above is run on your project. This is the function to obtain sin and cos.
GetSin proc uses esi deg:dword
mov esi,SCTbl ; 1 Clock cycle
mov ecx,deg ; 1 Clock cycle
mov eax,dword ptr[esi+ecx*4] ; 1 Clock cycle
ret
GetSin endp
GetCos proc uses esi deg:dword
mov esi,SCTbl ; 1 Clock cycle
add esi,14400 ; 1 Clock cycle
mov ecx,deg ; 1 Clock cycle
mov eax,dword ptr[esi+ecx*4] ; 1 Clock cycle
ret
GetCos endp
GetTan proc uses esi deg:dword
mov esi,SCTbl ; 1 Clock cycle
add esi,28800 ; 1 Clock cycle
mov ecx,deg ; 1 Clock cycle
mov eax,dword ptr[esi+ecx*4] ; 1 Clock cycle
ret
GetTan endp
This function show how to count a circle position.
UMGetPosRound proc uses esi delta:dword,deg:dword ; 138 Clock cycle
LOCAL r_sin,r_cos:dword
; mov esi,SCTbl ; 1 Clock cycle
; mov edx,3600 ; 1 Clock cycle
; shl edx,2 ; 2 Clock cycle
; shl deg,2 ; 4 Clock cycle
; add esi,edx ; 1 Clock cycle
; add esi,deg ; 2 Clock cycle
; fld dword ptr[esi] ; 3 Clock cycle
; mov esi,lpmem ; 1 Clock cycle
; add esi,deg ; 2 Clock cycle
; fld dword ptr[esi] ; 3 Clock cycle
; Until here 20 Clock cycle
invoke GetSin,deg ; 3 Clock cycle
push eax ; 1 Clock cycle
fld dword ptr[esp] ; 3 Clock cycle
pop eax ; 1 Clock cycle
fimul delta ; 24 Clock cycle
fistp r_sin ; 34 Clock cycle
invoke GetCos,deg ; 4 Clock cycle
push eax ; 1 Clock cycle
fld dword ptr[esp] ; 3 Clock cycle
pop eax ; 1 Clock cycle
fimul delta ; 24 Clock cycle
fistp r_cos ; 34 Clock cycle
mov edx,r_cos ; 1 Clock cycle
mov eax,r_sin ; 1 Clock cycle
; 135 Clock cycle
ret
UMGetPosRound endp
Here is the demo how to use all the function.
http://www.geocities.com/realvampire2001/Grafik_Test2_17_2_2k5.zip
Your knowledge of clock cycles may be good but your knowledge of FPU instructions leaves a lot to desire.
Quote
fsincos ; 365 Clock cycle
fstp sin ; 8 Clock cycle
fstp cos ; 8 Clock cycle
You would be filling the sin table with the cos values and the cos table with the sin values.
Quote
fptan ; 273 Clock cycle
fstp tan
You would be filling the tan table entirely with 1's and leaving the tan values on the FPU.
Are you sure you tested all that code?
Have a look at:
http://www.ray.masmcode.com/tutorial/index.html
Raymond
Quote from: raymond on February 20, 2005, 05:05:45 AM
Your knowledge of clock cycles may be good but your knowledge of FPU instructions leaves a lot to desire.
Quote
fsincos ; 365 Clock cycle
fstp sin ; 8 Clock cycle
fstp cos ; 8 Clock cycle
You would be filling the sin table with the cos values and the cos table with the sin values.
Quote
fptan ; 273 Clock cycle
fstp tan
You would be filling the tan table entirely with 1's and leaving the tan values on the FPU.
Are you sure you tested all that code?
Have a look at:
http://www.ray.masmcode.com/tutorial/index.html
Raymond
I have test all the code and it working, but I dont know about the tan function. Can I upload the source code here?
Quote from: Farabi on February 19, 2005, 04:40:59 AM
...
mov esi, sincostable ; 1 clock cylce
mov ecx,degree ; 1 clock cycle
mov eax,[esi+ecx*4] ; 1 clock cycle ( please tell me if im wrong)
Actually raymond his code timing isn't that accurate either. I wasn't going to say anything until you spoke up. Move instructions take 1 cycle on a P3 but 0.5 cycles on a P4. Memory acesses ( you have 3) if it is in the L1 data cache take 2 cycles. So that adds to the time the instruction takes to execute. So the 3 moves would have an additional delay of 2 cycles if it is in the L1 cache. The L1 cache delay also applies to P3's. I think the L1 cache delay for AMD is 3 cycles. But it's been a while since I looked at their manual. In addition the 3rd move can stall the pipeline while waiting for the first 2 moves to complete ( read after write dependency delay).
The biggest problem with solutions like this is large tables in a big program. You get a lot of cache thrashing with all the table accesses. So a lot of them end up coming from memory which is a LOT slower. Some people do a variation on this theme with a smaller table and interpolate between two values
You really should do some accurate timing of the code to see how fast it really is. Grab MichaelW's timing macros and time your stuff. Post what processor you are running it in when you post some results.
QuoteI have test all the code and it working, but I dont know about the tan function.
Are you maintaining that you have
checked that the values in your sin table (and/or cos table) are correct? You do seem to confirm that you have never checked the values in your tan table.
Raymond
Yes. I never check the value. What function I should use to check a dword floating point?
First convert it to a REAL8. (I think you can do this by copying the REAL4 into the upper half of the REAL8, and zeroing the lower half.
Then use FloatToStr (in the MASM32 runtime library)
Quote from: Farabi on February 22, 2005, 12:17:54 PM
Yes. I never check the value. What function I should use to check a dword floating point?
There are numerous ways you could do it. One of them is to run your app in a debugger and look at what values are being stored and check if those are what you would expect. Another way is to write a small app taking some of the stored values, such as sin(10 deg), and check it against the value you would get from a calculator.
Raymond
Thanks raymond. Your answer is very help. I have check the tan value. I take the result from st(1). Now I know fptan put 1.0 on st(0) and tan(x) on st(1).
My mistake is just me and my lack.