News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fibonacci bencmarking proposal

Started by BlackVortex, May 11, 2010, 02:26:44 AM

Previous topic - Next topic

BlackVortex

I'm messing around with different python versions and I'm wondering about asm speed in the following recursion-heavy algorithm :

def fib(n):
if n == 0 or n == 1:
    return n
else:
    return fib(n-1) + fib(n-2)

print fib(36)


Of course it's purposefully recursion-heavy, so that it takes some time. So if anyone in interested, bring on the real-deal asm contestant ! (Please provide a binary and print the result and a simple GetTickCount subtraction in the console)

dedndave

you can try this - i dunno if it's correct   :P
;---------------------------------------------------------------

fib     proc

;Call With: EAX = n
;  Returns: EAX = fib(n)

        cmp     eax,1
        jbe     fib000

        dec     eax
        push    eax
        dec     eax
        call    fib
        push    eax
        mov     eax,[esp+4]
        call    fib
        add     eax,[esp]
        add     esp,8

fib000: ret

fib     endp

;---------------------------------------------------------------

results on a prescott...
14930352        result value
883953645       clock cycles
894733766       clock cycles
889448193       clock cycles
884905258       clock cycles
886262680       clock cycles


EDIT - according to this applet, it is correct

http://csc.colstate.edu/summers/notes/cs463/fibo.html

raymond

Fibonacci numbers up to F(47) would not exceed unsigned 32-bit numbers. For the smaller F(x), simple addition is the fastest way to arrive at the required answer.

You can go up to F(92) with signed 64-bit numbers. Although additions may (or may not) be the fastest for the lower range exceeding a DWORD, there is a closed formula which may be faster for the higher range. However, it requires the use of the FPU.

If you are looking to compute Fibonacci numbers larger than F(92), they would exceed signed 64-bit numbers and you would need somewhat of a bignum library depending on how high you want to go.

Which range are you interested in???
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

dedndave

hi Ray - good to see ya

i don't think he was so much interested in a fib function, as he was in a time-waster routine for speed comparisons
i know the call overhead of recursion is a waste of time in this application

raymond

The easy loop for the lower range of Fibonacci numbers, lower than F(48), is as follows:

fibonacci proc n:DWORD
; returns with answer in EAX
   .if  n == 0
       xor  eax,eax
       ret
   .endif
   mov  eax,1
   .if  n < 3
       ret
   .endif
   mov  edx,eax
   mov  ecx,2
@@:
   add  edx,eax
   xchg eax,edx
   inc  ecx
   cmp  ecx,n
   jb   @B
   ret
fibonacci endp


I'll leave the timings to others.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

BlackVortex

Raymond, I'm just playing around, trying to get a feel for speed. It's intended to burn some cpu time   :toothy

I copy/pasted dave's procedure and stripped the timings stuff, since I'm not interested in clock count, but in milliseconds, so I can compare to python. Here is my GoAsm adaptation :

#dynamiclinkfile kernel32.dll user32.dll
#include windows.h

DATA SECTION

sz_format_1        db "Result is : %lu",13,10,"Time elapsed in milliseconds : %lu",13,10,0
h_console_out      dd 0
h_console_in       dd 0
buffer             db 128 dup 0
junk               db 0

CODE SECTION
START:

invoke GetStdHandle, STD_OUTPUT_HANDLE
mov [h_console_out], eax
invoke GetStdHandle, STD_INPUT_HANDLE
mov [h_console_in], eax

invoke GetTickCount
push eax

mov eax, 36
call @fib

push eax
invoke GetTickCount
pop ebx
pop edx
sub eax,edx
invoke wsprintf, addr buffer, addr sz_format_1, ebx, eax
invoke WriteConsole, [h_console_out], addr buffer , eax , addr junk, NULL

invoke ExitProcess, NULL




@fib:
;Call With: EAX = n
;  Returns: EAX = fib(n)
; Thanks Dave, I like this

       cmp     eax,1
       jbe > .return

       dec     eax
       push    eax
       dec     eax
       call    @fib
       push    eax
       mov     eax,[esp+4]
       call    @fib
       add     eax,[esp]
       add     esp,8

.return: ret

On my slightly overclocked e8400 it takes about 240ms, psyco-optimized python script does about 600ms, not too shabby. Normal python does 9 seconds  :tdown

dedndave

on my machine, i got 297 mS

prescott @ 3 GHz, XP Pro SP2

no python binaries to compare it with   :P

raymond

The GetTickCount is not accurate enough to measure the timing of the procedure I suggested because I'm quite sure it would be much less than 1 millisec.

Quotepsyco-optimized python script does about 600ms

There's no way you can expect any decent speed from scripted languages. That's like comparing snail-mail to e-mail. :boohoo:
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

BlackVortex

Quote from: raymond on May 11, 2010, 04:45:42 AM
The GetTickCount is not accurate enough to measure the timing of the procedure I suggested because I'm quite sure it would be much less than 1 millisec.

Quotepsyco-optimized python script does about 600ms

There's no way you can expect any decent speed from scripted languages. That's like comparing snail-mail to e-mail. :boohoo:
That's why I used my own buffed up snail-algo !  :toothy

Also, I'm surprised by psyco's optimization, 3*asm time is very very good !

I forgot to use Dave's core-locking. I will try now.

BlackVortex

#9
Interesting, setting the core affinity to 1 actually makes a difference, a miniscule drop in speed, although the app is single-threaded.

Cool.

EDIT: I still don't get it.

dedndave

i only do that to help RDTSC will work properly

sinsi

Each core has its own TSC, so when windows starts each core they are out of step with each other.
Even if it's one thread it could run on any core that is free. If you do a cpuid you can see the core's APIC ID.
Light travels faster than sound, that's why some people seem bright until you hear them.