I'm messing around with different python versions and I'm wondering about asm speed in the following recursion-heavy algorithm :
def fib(n):
if n == 0 or n == 1:
return n
else:
return fib(n-1) + fib(n-2)
print fib(36)
Of course it's purposefully recursion-heavy, so that it takes some time. So if anyone in interested, bring on the real-deal asm contestant ! (Please provide a binary and print the result and a simple GetTickCount subtraction in the console)
you can try this - i dunno if it's correct :P
;---------------------------------------------------------------
fib proc
;Call With: EAX = n
; Returns: EAX = fib(n)
cmp eax,1
jbe fib000
dec eax
push eax
dec eax
call fib
push eax
mov eax,[esp+4]
call fib
add eax,[esp]
add esp,8
fib000: ret
fib endp
;---------------------------------------------------------------
results on a prescott...
14930352 result value
883953645 clock cycles
894733766 clock cycles
889448193 clock cycles
884905258 clock cycles
886262680 clock cycles
EDIT - according to this applet, it is correct
http://csc.colstate.edu/summers/notes/cs463/fibo.html
Fibonacci numbers up to F(47) would not exceed unsigned 32-bit numbers. For the smaller F(x), simple addition is the fastest way to arrive at the required answer.
You can go up to F(92) with signed 64-bit numbers. Although additions may (or may not) be the fastest for the lower range exceeding a DWORD, there is a closed formula which may be faster for the higher range. However, it requires the use of the FPU.
If you are looking to compute Fibonacci numbers larger than F(92), they would exceed signed 64-bit numbers and you would need somewhat of a bignum library depending on how high you want to go.
Which range are you interested in???
hi Ray - good to see ya
i don't think he was so much interested in a fib function, as he was in a time-waster routine for speed comparisons
i know the call overhead of recursion is a waste of time in this application
The easy loop for the lower range of Fibonacci numbers, lower than F(48), is as follows:
fibonacci proc n:DWORD
; returns with answer in EAX
.if n == 0
xor eax,eax
ret
.endif
mov eax,1
.if n < 3
ret
.endif
mov edx,eax
mov ecx,2
@@:
add edx,eax
xchg eax,edx
inc ecx
cmp ecx,n
jb @B
ret
fibonacci endp
I'll leave the timings to others.
Raymond, I'm just playing around, trying to get a feel for speed. It's intended to burn some cpu time :toothy
I copy/pasted dave's procedure and stripped the timings stuff, since I'm not interested in clock count, but in milliseconds, so I can compare to python. Here is my GoAsm adaptation :
#dynamiclinkfile kernel32.dll user32.dll
#include windows.h
DATA SECTION
sz_format_1 db "Result is : %lu",13,10,"Time elapsed in milliseconds : %lu",13,10,0
h_console_out dd 0
h_console_in dd 0
buffer db 128 dup 0
junk db 0
CODE SECTION
START:
invoke GetStdHandle, STD_OUTPUT_HANDLE
mov [h_console_out], eax
invoke GetStdHandle, STD_INPUT_HANDLE
mov [h_console_in], eax
invoke GetTickCount
push eax
mov eax, 36
call @fib
push eax
invoke GetTickCount
pop ebx
pop edx
sub eax,edx
invoke wsprintf, addr buffer, addr sz_format_1, ebx, eax
invoke WriteConsole, [h_console_out], addr buffer , eax , addr junk, NULL
invoke ExitProcess, NULL
@fib:
;Call With: EAX = n
; Returns: EAX = fib(n)
; Thanks Dave, I like this
cmp eax,1
jbe > .return
dec eax
push eax
dec eax
call @fib
push eax
mov eax,[esp+4]
call @fib
add eax,[esp]
add esp,8
.return: ret
On my slightly overclocked e8400 it takes about 240ms, psyco-optimized python script does about 600ms, not too shabby. Normal python does 9 seconds :tdown
on my machine, i got 297 mS
prescott @ 3 GHz, XP Pro SP2
no python binaries to compare it with :P
The GetTickCount is not accurate enough to measure the timing of the procedure I suggested because I'm quite sure it would be much less than 1 millisec.
Quotepsyco-optimized python script does about 600ms
There's no way you can expect any decent speed from scripted languages. That's like comparing snail-mail to e-mail. :boohoo:
Quote from: raymond on May 11, 2010, 04:45:42 AM
The GetTickCount is not accurate enough to measure the timing of the procedure I suggested because I'm quite sure it would be much less than 1 millisec.
Quotepsyco-optimized python script does about 600ms
There's no way you can expect any decent speed from scripted languages. That's like comparing snail-mail to e-mail. :boohoo:
That's why I used my own buffed up snail-algo ! :toothy
Also, I'm surprised by psyco's optimization, 3*asm time is very very good !
I forgot to use Dave's core-locking. I will try now.
Interesting, setting the core affinity to 1 actually makes a difference, a miniscule drop in speed, although the app is single-threaded.
Cool.
EDIT: I still don't get it.
i only do that to help RDTSC will work properly
Each core has its own TSC, so when windows starts each core they are out of step with each other.
Even if it's one thread it could run on any core that is free. If you do a cpuid you can see the core's APIC ID.