News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Couting Up - not so fast

Started by wizzra, November 13, 2006, 10:16:55 PM

Previous topic - Next topic

wizzra

Hey guys,

I played - and got stuck coding an algorithm, that is, coding it can be easy. But optimizing it could be a bit problematic.

the Algorithm:
array of 5 bytes (for example, i need upto 11 bytes+-):
array db 0, 0, 0, 0, 0

1. Count up the first byte up to FFh, once over FFh, reset to 0 and add 1 to the second byte.
2. goto 1. than count the second byte upto FFh, once over FFh, reset to 0 and add 1 to the third byte.
and so on..

example:

loop1:
00 00 00 00 00
01 00 00 00 00
02 00 00 00 00
....
FF 00 00 00 00

loop2:
00 01 00 00 00
01 01 00 00 00
02 01 00 00 00
....
FF 01 00 00 00

loop 3:
00 02 00 00 00
01 02 00 00 00
02 02 00 00 00
....
FF 02 00 00 00

loop 4:
00 FF 00 00 00
01 FF 00 00 00
02 FF 00 00 00
....
FF FF 00 00 00

loop 5:
00 00 01 00 00
01 00 01 00 00
02 00 01 00 00
....
FF 00 01 00 00


in asm (simpe code) could be like this:

.data
arr db 0,0,0,0,0,0,0

.code
xor eax,eax
lea edi,arr

begin:
cmp al,0ffh
    mov byte ptr [edi],al
    jnz _end
    xor eax,eax
    dec eax
    add byte ptr [edi+1],1
    cmp byte ptr [edi+1],0ffh
    jnz _end
    add byte ptr [edi+2],1
    cmp byte ptr [edi+2],0ffh
    jnz _end
    add byte ptr [edi+3],1
    cmp byte ptr [edi+3],0ffh
    jnz _end
    add byte ptr [edi+4],1
    cmp byte ptr [edi+4],0ffh
    jnz _end
    jmp _exit
_end:
    inc eax
    jmp begin
_exit:


its slow as hell, i know!
i tried making it more registers-wise, but managed to count upto FFFF and way pass it seems to be more trouble:


.code

test2:
    xor edx,edx
    xor eax,eax
    xor ecx,ecx
    mov ecx,8
    mov ebx,01000000h
loopy:
    add edx,ebx
    cmp edx,0ffff0000h
   
jz sec_loop
    cmp edx,0ffffffffh
    jz _exit
    mov eax,edx
    and eax, 0ff000000h
    ror eax,24
    cmp eax,0ffh
    jz fix_add1
    jmp loopy
fix_add1:
    and edx,00ffffffh
    ror ebx,cl
    add edx,ebx
    rol
ebx,cl
    jmp loopy
sec_loop:
    and edx,0000ffffh
    mov ecx,16
    jmp loopy


hope you guys know about this, and if a fast algo already exists i would be glad to use it, if not, would be nice if we can find a good algorithm for it.

Best regards
Wizzra.

ecube

you can do something like this for parsing strings in arrays


.586
.MODEL FLAT,STDCALL
OPTION CASEMAP:NONE


include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\masm32.inc
include \masm32\include\user32.inc

includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\masm32.lib
includelib \masm32\lib\user32.lib

        mNextListEntry MACRO ML
          cld
          xor     eax, eax
          or      ecx, -1
          repnz scasb
          cmp     byte ptr[edi], 0
          jnz     ML
        ENDM
.data
mystring db "beer",0
            db "is",0
            db "good",0,0
.code
start:

mov     edi, offset mystring
@loopstr:

invoke MessageBox,0,edi,edi,MB_OK
mNextListEntry @loopstr
invoke ExitProcess,0
end start


Also is Donkeys string libray he has great array handling procedures you can add and remove strings to easy

drizz

Hi Ben,

use add with carry, treat the array like a big number

xor eax,eax;; array == ecx::ebx::eax
xor ebx,ebx
xor ecx,ecx
xor edx,edx
mov edi,0FF000000h; [ 11 ] bytes
@@:
add eax,1
adc ebx,edx
adc ecx,edx
test ecx,edi
jz @B
The truth cannot be learned ... it can only be recognized.

stanhebben

First, don't worry about your code - it's relatively fast.

You could use the following code:


mov esi, 0
@@:
add byte ptr [edi+esi], 1
jnc @F
add esi, 1
jmp @B
@@:


Just curious, what do you want to use it for? If you have 11 bytes, you can go until 2^88. If one increment takes 1 cycle, then my computer needs 5.10^12 years to get there.

If you need a large counter, you could perform 2 32-bit increments, for example:

add dword ptr [edi], 1
adc dword ptr [edi+4], 0


Edit: and of course, you could also use dwords instead of bytes, as drizz said above

daydreamer

I had such code once, about the same as Stan's code, but it was ascii numbers counting up on an old 6502, perfect for scores, for example if you got 300 points, count up 3 times, much faster than a integer->ascii conversion each time and only the screen limited it to how many digits

wizzra

Just curious, what do you want to use it for?
i was hoping to try some brute-force technique in my engine. but i guess there is no hope since even register wise it will take alot to reach to full 11bytes with FF using this algorithm.

the adc trick is also nice thnx drizz :) i'll see if i can manage to find other ideas from it.

dsouza123


.686
.model flat,stdcall
option casemap:none

include    \masm32\include\windows.inc
include    \masm32\include\kernel32.inc
include    \masm32\include\user32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib

.data
align 16
   arr      db 16 dup (0)
   szMsg    db "Done",0
   szCap    db "Counter",0

.code
start:
;--------------------------------------------------------
   mov esi, offset arr           ; version using registers
   xor eax, eax
@@:
   add dword ptr [esi+0], 1
   jnc @B
   adc dword ptr [esi+4], eax
   jnc @B
   adc dword ptr [esi+8], eax
   jnc @B

;------------------- Alternative just using memory ---------------------
@@:                                 
   stc                           ; no register version
   adc dword ptr [arr+0], 0
   jnc @B
   adc dword ptr [arr+4], 0
   jnc @B
   adc dword ptr [arr+8], 0
   jnc @B

   invoke MessageBox,NULL,ADDR szMsg,ADDR szCap,MB_OK
   invoke ExitProcess,0
end start


Updating just the first 4 bytes from 0 to 2^32 - 1 takes 17 seconds on an Athlon 1171 Mhz.
The the first five instructions of the register version fit within a 16 byte paragraph.

wizzra

Thanks guys for your help,
I guess there is no much escape from this solution. It will take alllot of and alllot of years until a cpu will exists which can compute such count in split seconds :) (at least as a single cpu that is - not talking about cpu farm)
such simple task tsk tsk :)

best i could some up from your posts.

eax:edx 0FFFFFFFFFFFFFFFFh range
   mov esi, offset arr           ; version using registers
   xor eax, eax
   xor edx,edx
first:
   adc eax,1
   jnc first
@@:
   adc edx,eax
   jnc first
   jmp _exit


thanks guys

wizzra

btw, the above code of mine will count regulary, but u can always swap the order of the bytes to get the orginal idea.

dsouza123

Looking at your code if carry was already set then it goes from 0 to 2 at the start,
using add eax, 1 would solve it, also esi isn't needed, but the dual use of eax
for both the first dword and as a 0 for the register parameter for the adc
instead of using an immediate 0 or another register was insightful.

Here is a pure register version.

  xor eax, eax  ; only register version no memory or immediates
  xor ebx, ebx
  inc ebx
  xor ecx, ecx
  xor edx, edx
@@:
  add eax, ebx
  jnc @B          ; goes back to nearest @@:
  adc edx, ecx
  jnc @B          ; goes back to nearest @@:
  jmp _exit

wizzra

yeah i noticed :) thanks,
though... its still damn slow!!! (relatively slow)

stanhebben

Quote from: wizzra on November 16, 2006, 05:11:11 PM
yeah i noticed :) thanks,
though... its still damn slow!!! (relatively slow)

Maybe change the algorithm to something that doesn't have to count up at all. First develop a quick algo, then a quick implementation of it.

Human

why everybody use jc, less jumps code is faster. for 11 bytes you can do
add eax,1
adc ebx,0
adc edx,0
and ecx is as counter how many times add
if you need just 8 bytes use
paddq mm0,mm1 and mm1 holds 1

dsouza using registers?
i dont know for what he needs that, but for 1 time update speed isnt necessery, for highscore he can change add eax,1 to add eax,points, also with even 1000 runs its faster to move all to regs, do additions and after all write all to memory, not use add [esi+0],1
all is about optimizing, sometimes even add [something+0],1 will be faster if we have complex code and need all registers, so with direct memory write we dont need to save and restore some reg, ofcourse my code is slow, because it cant be fast, every operation stalls pipeline and alus due next opcode is dependent on last, but he can mix it with other code to make it faster. but for just counting idea is useless like whole topic