The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: wizzra on November 13, 2006, 10:16:55 PM

Title: Couting Up - not so fast
Post by: wizzra on November 13, 2006, 10:16:55 PM
Hey guys,

I played - and got stuck coding an algorithm, that is, coding it can be easy. But optimizing it could be a bit problematic.

the Algorithm:
array of 5 bytes (for example, i need upto 11 bytes+-):
array db 0, 0, 0, 0, 0

1. Count up the first byte up to FFh, once over FFh, reset to 0 and add 1 to the second byte.
2. goto 1. than count the second byte upto FFh, once over FFh, reset to 0 and add 1 to the third byte.
and so on..

example:

loop1:
00 00 00 00 00
01 00 00 00 00
02 00 00 00 00
....
FF 00 00 00 00

loop2:
00 01 00 00 00
01 01 00 00 00
02 01 00 00 00
....
FF 01 00 00 00

loop 3:
00 02 00 00 00
01 02 00 00 00
02 02 00 00 00
....
FF 02 00 00 00

loop 4:
00 FF 00 00 00
01 FF 00 00 00
02 FF 00 00 00
....
FF FF 00 00 00

loop 5:
00 00 01 00 00
01 00 01 00 00
02 00 01 00 00
....
FF 00 01 00 00


in asm (simpe code) could be like this:

.data
arr db 0,0,0,0,0,0,0

.code
xor eax,eax
lea edi,arr

begin:
cmp al,0ffh
    mov byte ptr [edi],al
    jnz _end
    xor eax,eax
    dec eax
    add byte ptr [edi+1],1
    cmp byte ptr [edi+1],0ffh
    jnz _end
    add byte ptr [edi+2],1
    cmp byte ptr [edi+2],0ffh
    jnz _end
    add byte ptr [edi+3],1
    cmp byte ptr [edi+3],0ffh
    jnz _end
    add byte ptr [edi+4],1
    cmp byte ptr [edi+4],0ffh
    jnz _end
    jmp _exit
_end:
    inc eax
    jmp begin
_exit:


its slow as hell, i know!
i tried making it more registers-wise, but managed to count upto FFFF and way pass it seems to be more trouble:


.code

test2:
    xor edx,edx
    xor eax,eax
    xor ecx,ecx
    mov ecx,8
    mov ebx,01000000h
loopy:
    add edx,ebx
    cmp edx,0ffff0000h
   
jz sec_loop
    cmp edx,0ffffffffh
    jz _exit
    mov eax,edx
    and eax, 0ff000000h
    ror eax,24
    cmp eax,0ffh
    jz fix_add1
    jmp loopy
fix_add1:
    and edx,00ffffffh
    ror ebx,cl
    add edx,ebx
    rol
ebx,cl
    jmp loopy
sec_loop:
    and edx,0000ffffh
    mov ecx,16
    jmp loopy


hope you guys know about this, and if a fast algo already exists i would be glad to use it, if not, would be nice if we can find a good algorithm for it.

Best regards
Wizzra.
Title: Re: Couting Up - not so fast
Post by: ecube on November 13, 2006, 10:33:18 PM
you can do something like this for parsing strings in arrays


.586
.MODEL FLAT,STDCALL
OPTION CASEMAP:NONE


include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\masm32.inc
include \masm32\include\user32.inc

includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\masm32.lib
includelib \masm32\lib\user32.lib

        mNextListEntry MACRO ML
          cld
          xor     eax, eax
          or      ecx, -1
          repnz scasb
          cmp     byte ptr[edi], 0
          jnz     ML
        ENDM
.data
mystring db "beer",0
            db "is",0
            db "good",0,0
.code
start:

mov     edi, offset mystring
@loopstr:

invoke MessageBox,0,edi,edi,MB_OK
mNextListEntry @loopstr
invoke ExitProcess,0
end start


Also is Donkeys string libray he has great array handling procedures you can add and remove strings to easy
Title: Re: Couting Up - not so fast
Post by: drizz on November 13, 2006, 11:02:13 PM
Hi Ben,

use add with carry, treat the array like a big number

xor eax,eax;; array == ecx::ebx::eax
xor ebx,ebx
xor ecx,ecx
xor edx,edx
mov edi,0FF000000h; [ 11 ] bytes
@@:
add eax,1
adc ebx,edx
adc ecx,edx
test ecx,edi
jz @B
Title: Re: Couting Up - not so fast
Post by: stanhebben on November 13, 2006, 11:02:58 PM
First, don't worry about your code - it's relatively fast.

You could use the following code:


mov esi, 0
@@:
add byte ptr [edi+esi], 1
jnc @F
add esi, 1
jmp @B
@@:


Just curious, what do you want to use it for? If you have 11 bytes, you can go until 2^88. If one increment takes 1 cycle, then my computer needs 5.10^12 years to get there.

If you need a large counter, you could perform 2 32-bit increments, for example:

add dword ptr [edi], 1
adc dword ptr [edi+4], 0


Edit: and of course, you could also use dwords instead of bytes, as drizz said above
Title: Re: Couting Up - not so fast
Post by: daydreamer on November 14, 2006, 09:26:04 AM
I had such code once, about the same as Stan's code, but it was ascii numbers counting up on an old 6502, perfect for scores, for example if you got 300 points, count up 3 times, much faster than a integer->ascii conversion each time and only the screen limited it to how many digits
Title: Re: Couting Up - not so fast
Post by: wizzra on November 14, 2006, 10:32:01 PM
Just curious, what do you want to use it for?
i was hoping to try some brute-force technique in my engine. but i guess there is no hope since even register wise it will take alot to reach to full 11bytes with FF using this algorithm.

the adc trick is also nice thnx drizz :) i'll see if i can manage to find other ideas from it.
Title: Re: Couting Up - not so fast
Post by: dsouza123 on November 14, 2006, 11:55:49 PM

.686
.model flat,stdcall
option casemap:none

include    \masm32\include\windows.inc
include    \masm32\include\kernel32.inc
include    \masm32\include\user32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib

.data
align 16
   arr      db 16 dup (0)
   szMsg    db "Done",0
   szCap    db "Counter",0

.code
start:
;--------------------------------------------------------
   mov esi, offset arr           ; version using registers
   xor eax, eax
@@:
   add dword ptr [esi+0], 1
   jnc @B
   adc dword ptr [esi+4], eax
   jnc @B
   adc dword ptr [esi+8], eax
   jnc @B

;------------------- Alternative just using memory ---------------------
@@:                                 
   stc                           ; no register version
   adc dword ptr [arr+0], 0
   jnc @B
   adc dword ptr [arr+4], 0
   jnc @B
   adc dword ptr [arr+8], 0
   jnc @B

   invoke MessageBox,NULL,ADDR szMsg,ADDR szCap,MB_OK
   invoke ExitProcess,0
end start


Updating just the first 4 bytes from 0 to 2^32 - 1 takes 17 seconds on an Athlon 1171 Mhz.
The the first five instructions of the register version fit within a 16 byte paragraph.
Title: Re: Couting Up - not so fast
Post by: wizzra on November 15, 2006, 08:18:58 PM
Thanks guys for your help,
I guess there is no much escape from this solution. It will take alllot of and alllot of years until a cpu will exists which can compute such count in split seconds :) (at least as a single cpu that is - not talking about cpu farm)
such simple task tsk tsk :)

best i could some up from your posts.

eax:edx 0FFFFFFFFFFFFFFFFh range
   mov esi, offset arr           ; version using registers
   xor eax, eax
   xor edx,edx
first:
   adc eax,1
   jnc first
@@:
   adc edx,eax
   jnc first
   jmp _exit


thanks guys
Title: Re: Couting Up - not so fast
Post by: wizzra on November 15, 2006, 08:21:03 PM
btw, the above code of mine will count regulary, but u can always swap the order of the bytes to get the orginal idea.
Title: Re: Couting Up - not so fast
Post by: dsouza123 on November 16, 2006, 01:19:11 PM
Looking at your code if carry was already set then it goes from 0 to 2 at the start,
using add eax, 1 would solve it, also esi isn't needed, but the dual use of eax
for both the first dword and as a 0 for the register parameter for the adc
instead of using an immediate 0 or another register was insightful.

Here is a pure register version.

  xor eax, eax  ; only register version no memory or immediates
  xor ebx, ebx
  inc ebx
  xor ecx, ecx
  xor edx, edx
@@:
  add eax, ebx
  jnc @B          ; goes back to nearest @@:
  adc edx, ecx
  jnc @B          ; goes back to nearest @@:
  jmp _exit
Title: Re: Couting Up - not so fast
Post by: wizzra on November 16, 2006, 05:11:11 PM
yeah i noticed :) thanks,
though... its still damn slow!!! (relatively slow)
Title: Re: Couting Up - not so fast
Post by: stanhebben on November 28, 2006, 08:45:27 PM
Quote from: wizzra on November 16, 2006, 05:11:11 PM
yeah i noticed :) thanks,
though... its still damn slow!!! (relatively slow)

Maybe change the algorithm to something that doesn't have to count up at all. First develop a quick algo, then a quick implementation of it.
Title: Re: Couting Up - not so fast
Post by: Human on December 07, 2006, 11:48:17 PM
why everybody use jc, less jumps code is faster. for 11 bytes you can do
add eax,1
adc ebx,0
adc edx,0
and ecx is as counter how many times add
if you need just 8 bytes use
paddq mm0,mm1 and mm1 holds 1

dsouza using registers?
i dont know for what he needs that, but for 1 time update speed isnt necessery, for highscore he can change add eax,1 to add eax,points, also with even 1000 runs its faster to move all to regs, do additions and after all write all to memory, not use add [esi+0],1
all is about optimizing, sometimes even add [something+0],1 will be faster if we have complex code and need all registers, so with direct memory write we dont need to save and restore some reg, ofcourse my code is slow, because it cant be fast, every operation stalls pipeline and alus due next opcode is dependent on last, but he can mix it with other code to make it faster. but for just counting idea is useless like whole topic