Fibonacci bencmarking proposal

BlackVortex · May 11, 2010, 02:26:44 AM

I'm messing around with different python versions and I'm wondering about asm speed in the following recursion-heavy algorithm :

Code Select

def fib(n):
if n == 0 or n == 1:
    return n
else:
    return fib(n-1) + fib(n-2)

print fib(36)

Of course it's purposefully recursion-heavy, so that it takes some time. So if anyone in interested, bring on the real-deal asm contestant ! (Please provide a binary and print the result and a simple GetTickCount subtraction in the console)

dedndave · May 11, 2010, 03:18:50 AM

you can try this - i dunno if it's correct :P

Code Select

;---------------------------------------------------------------

fib     proc

;Call With: EAX = n
;  Returns: EAX = fib(n)

        cmp     eax,1
        jbe     fib000

        dec     eax
        push    eax
        dec     eax
        call    fib
        push    eax
        mov     eax,[esp+4]
        call    fib
        add     eax,[esp]
        add     esp,8

fib000: ret

fib     endp

;---------------------------------------------------------------

results on a prescott...

Code Select

14930352        result value
883953645       clock cycles
894733766       clock cycles
889448193       clock cycles
884905258       clock cycles
886262680       clock cycles

EDIT - according to this applet, it is correct

http://csc.colstate.edu/summers/notes/cs463/fibo.html

raymond · May 11, 2010, 04:05:50 AM

Fibonacci numbers up to F(47) would not exceed unsigned 32-bit numbers. For the smaller F(x), simple addition is the fastest way to arrive at the required answer.

You can go up to F(92) with signed 64-bit numbers. Although additions may (or may not) be the fastest for the lower range exceeding a DWORD, there is a closed formula which may be faster for the higher range. However, it requires the use of the FPU.

If you are looking to compute Fibonacci numbers larger than F(92), they would exceed signed 64-bit numbers and you would need somewhat of a bignum library depending on how high you want to go.

Which range are you interested in???

dedndave · May 11, 2010, 04:10:35 AM

hi Ray - good to see ya

i don't think he was so much interested in a fib function, as he was in a time-waster routine for speed comparisons
i know the call overhead of recursion is a waste of time in this application

raymond · May 11, 2010, 04:25:21 AM

The easy loop for the lower range of Fibonacci numbers, lower than F(48), is as follows:

Code Select

fibonacci proc n:DWORD
; returns with answer in EAX
   .if  n == 0
       xor  eax,eax
       ret
   .endif 
   mov  eax,1
   .if  n < 3
       ret
   .endif
   mov  edx,eax
   mov  ecx,2
@@:
   add  edx,eax
   xchg eax,edx
   inc  ecx
   cmp  ecx,n
   jb   @B
   ret
fibonacci endp

I'll leave the timings to others.

BlackVortex · May 11, 2010, 04:33:50 AM

Raymond, I'm just playing around, trying to get a feel for speed. It's intended to burn some cpu time :toothy

I copy/pasted dave's procedure and stripped the timings stuff, since I'm not interested in clock count, but in milliseconds, so I can compare to python. Here is my GoAsm adaptation :

Code Select


#dynamiclinkfile kernel32.dll user32.dll
#include windows.h

DATA SECTION

sz_format_1        db "Result is : %lu",13,10,"Time elapsed in milliseconds : %lu",13,10,0
h_console_out      dd 0
h_console_in       dd 0
buffer             db 128 dup 0
junk               db 0

CODE SECTION
START:

invoke GetStdHandle, STD_OUTPUT_HANDLE
mov [h_console_out], eax
invoke GetStdHandle, STD_INPUT_HANDLE
mov [h_console_in], eax

invoke GetTickCount
push eax

mov eax, 36
call @fib

push eax
invoke GetTickCount
pop ebx
pop edx
sub eax,edx
invoke wsprintf, addr buffer, addr sz_format_1, ebx, eax
invoke WriteConsole, [h_console_out], addr buffer , eax , addr junk, NULL

invoke ExitProcess, NULL




@fib:
;Call With: EAX = n
;  Returns: EAX = fib(n)
; Thanks Dave, I like this

        cmp     eax,1
        jbe > .return

        dec     eax
        push    eax
        dec     eax
        call    @fib
        push    eax
        mov     eax,[esp+4]
        call    @fib
        add     eax,[esp]
        add     esp,8

.return: ret

On my slightly overclocked e8400 it takes about 240ms, psyco-optimized python script does about 600ms, not too shabby. Normal python does 9 seconds :tdown

dedndave · May 11, 2010, 04:38:42 AM

on my machine, i got 297 mS

prescott @ 3 GHz, XP Pro SP2

no python binaries to compare it with :P

raymond · May 11, 2010, 04:45:42 AM

The GetTickCount is not accurate enough to measure the timing of the procedure I suggested because I'm quite sure it would be much less than 1 millisec.

Quotepsyco-optimized python script does about 600ms

There's no way you can expect any decent speed from scripted languages. That's like comparing snail-mail to e-mail. :boohoo:

BlackVortex · May 11, 2010, 04:58:59 AM

Quote from: raymond on May 11, 2010, 04:45:42 AM
The GetTickCount is not accurate enough to measure the timing of the procedure I suggested because I'm quite sure it would be much less than 1 millisec.

Quotepsyco-optimized python script does about 600ms

There's no way you can expect any decent speed from scripted languages. That's like comparing snail-mail to e-mail. :boohoo:

That's why I used my own buffed up snail-algo ! :toothy

Also, I'm surprised by psyco's optimization, 3*asm time is very very good !

I forgot to use Dave's core-locking. I will try now.

BlackVortex · May 11, 2010, 05:05:04 AM

Interesting, setting the core affinity to 1 actually makes a difference, a miniscule drop in speed, although the app is single-threaded.

Cool.

EDIT: I still don't get it.

dedndave · May 11, 2010, 11:15:19 AM

i only do that to help RDTSC will work properly

sinsi · May 11, 2010, 11:23:44 AM

Each core has its own TSC, so when windows starts each core they are out of step with each other.
Even if it's one thread it could run on any core that is free. If you do a cpuid you can see the core's APIC ID.

News:

Fibonacci bencmarking proposal