Hey guys,
I played - and got stuck coding an algorithm, that is, coding it can be easy. But optimizing it could be a bit problematic.
the Algorithm:
array of 5 bytes (for example, i need upto 11 bytes+-):
array db 0, 0, 0, 0, 0
1. Count up the first byte up to FFh, once over FFh, reset to 0 and add 1 to the second byte.
2. goto 1. than count the second byte upto FFh, once over FFh, reset to 0 and add 1 to the third byte.
and so on..
example:
loop1:
00 00 00 00 00
01 00 00 00 00
02 00 00 00 00
....
FF 00 00 00 00
loop2:
00 01 00 00 00
01 01 00 00 00
02 01 00 00 00
....
FF 01 00 00 00
loop 3:
00 02 00 00 00
01 02 00 00 00
02 02 00 00 00
....
FF 02 00 00 00
loop 4:
00 FF 00 00 00
01 FF 00 00 00
02 FF 00 00 00
....
FF FF 00 00 00
loop 5:
00 00 01 00 00
01 00 01 00 00
02 00 01 00 00
....
FF 00 01 00 00
in asm (simpe code) could be like this:
.data
arr db 0,0,0,0,0,0,0
.code
xor eax,eax
lea edi,arr
begin:
cmp al,0ffh
mov byte ptr [edi],al
jnz _end
xor eax,eax
dec eax
add byte ptr [edi+1],1
cmp byte ptr [edi+1],0ffh
jnz _end
add byte ptr [edi+2],1
cmp byte ptr [edi+2],0ffh
jnz _end
add byte ptr [edi+3],1
cmp byte ptr [edi+3],0ffh
jnz _end
add byte ptr [edi+4],1
cmp byte ptr [edi+4],0ffh
jnz _end
jmp _exit
_end:
inc eax
jmp begin
_exit:
its slow as hell, i know!
i tried making it more registers-wise, but managed to count upto FFFF and way pass it seems to be more trouble:
.code
test2:
xor edx,edx
xor eax,eax
xor ecx,ecx
mov ecx,8
mov ebx,01000000h
loopy:
add edx,ebx
cmp edx,0ffff0000h
jz sec_loop
cmp edx,0ffffffffh
jz _exit
mov eax,edx
and eax, 0ff000000h
ror eax,24
cmp eax,0ffh
jz fix_add1
jmp loopy
fix_add1:
and edx,00ffffffh
ror ebx,cl
add edx,ebx
rol
ebx,cl
jmp loopy
sec_loop:
and edx,0000ffffh
mov ecx,16
jmp loopy
hope you guys know about this, and if a fast algo already exists i would be glad to use it, if not, would be nice if we can find a good algorithm for it.
Best regards
Wizzra.
you can do something like this for parsing strings in arrays
.586
.MODEL FLAT,STDCALL
OPTION CASEMAP:NONE
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\masm32.inc
include \masm32\include\user32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\masm32.lib
includelib \masm32\lib\user32.lib
mNextListEntry MACRO ML
cld
xor eax, eax
or ecx, -1
repnz scasb
cmp byte ptr[edi], 0
jnz ML
ENDM
.data
mystring db "beer",0
db "is",0
db "good",0,0
.code
start:
mov edi, offset mystring
@loopstr:
invoke MessageBox,0,edi,edi,MB_OK
mNextListEntry @loopstr
invoke ExitProcess,0
end start
Also is Donkeys string libray he has great array handling procedures you can add and remove strings to easy
Hi Ben,
use add with carry, treat the array like a big number
xor eax,eax;; array == ecx::ebx::eax
xor ebx,ebx
xor ecx,ecx
xor edx,edx
mov edi,0FF000000h; [ 11 ] bytes
@@:
add eax,1
adc ebx,edx
adc ecx,edx
test ecx,edi
jz @B
First, don't worry about your code - it's relatively fast.
You could use the following code:
mov esi, 0
@@:
add byte ptr [edi+esi], 1
jnc @F
add esi, 1
jmp @B
@@:
Just curious, what do you want to use it for? If you have 11 bytes, you can go until 2^88. If one increment takes 1 cycle, then my computer needs 5.10^12 years to get there.
If you need a large counter, you could perform 2 32-bit increments, for example:
add dword ptr [edi], 1
adc dword ptr [edi+4], 0
Edit: and of course, you could also use dwords instead of bytes, as drizz said above
I had such code once, about the same as Stan's code, but it was ascii numbers counting up on an old 6502, perfect for scores, for example if you got 300 points, count up 3 times, much faster than a integer->ascii conversion each time and only the screen limited it to how many digits
Just curious, what do you want to use it for?
i was hoping to try some brute-force technique in my engine. but i guess there is no hope since even register wise it will take alot to reach to full 11bytes with FF using this algorithm.
the adc trick is also nice thnx drizz :) i'll see if i can manage to find other ideas from it.
.686
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\user32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib
.data
align 16
arr db 16 dup (0)
szMsg db "Done",0
szCap db "Counter",0
.code
start:
;--------------------------------------------------------
mov esi, offset arr ; version using registers
xor eax, eax
@@:
add dword ptr [esi+0], 1
jnc @B
adc dword ptr [esi+4], eax
jnc @B
adc dword ptr [esi+8], eax
jnc @B
;------------------- Alternative just using memory ---------------------
@@:
stc ; no register version
adc dword ptr [arr+0], 0
jnc @B
adc dword ptr [arr+4], 0
jnc @B
adc dword ptr [arr+8], 0
jnc @B
invoke MessageBox,NULL,ADDR szMsg,ADDR szCap,MB_OK
invoke ExitProcess,0
end start
Updating just the first 4 bytes from 0 to 2^32 - 1 takes 17 seconds on an Athlon 1171 Mhz.
The the first five instructions of the register version fit within a 16 byte paragraph.
Thanks guys for your help,
I guess there is no much escape from this solution. It will take alllot of and alllot of years until a cpu will exists which can compute such count in split seconds :) (at least as a single cpu that is - not talking about cpu farm)
such simple task tsk tsk :)
best i could some up from your posts.
eax:edx 0FFFFFFFFFFFFFFFFh range
mov esi, offset arr ; version using registers
xor eax, eax
xor edx,edx
first:
adc eax,1
jnc first
@@:
adc edx,eax
jnc first
jmp _exit
thanks guys
btw, the above code of mine will count regulary, but u can always swap the order of the bytes to get the orginal idea.
Looking at your code if carry was already set then it goes from 0 to 2 at the start,
using add eax, 1 would solve it, also esi isn't needed, but the dual use of eax
for both the first dword and as a 0 for the register parameter for the adc
instead of using an immediate 0 or another register was insightful.
Here is a pure register version.
xor eax, eax ; only register version no memory or immediates
xor ebx, ebx
inc ebx
xor ecx, ecx
xor edx, edx
@@:
add eax, ebx
jnc @B ; goes back to nearest @@:
adc edx, ecx
jnc @B ; goes back to nearest @@:
jmp _exit
yeah i noticed :) thanks,
though... its still damn slow!!! (relatively slow)
Quote from: wizzra on November 16, 2006, 05:11:11 PM
yeah i noticed :) thanks,
though... its still damn slow!!! (relatively slow)
Maybe change the algorithm to something that doesn't have to count up at all. First develop a quick algo, then a quick implementation of it.
why everybody use jc, less jumps code is faster. for 11 bytes you can do
add eax,1
adc ebx,0
adc edx,0
and ecx is as counter how many times add
if you need just 8 bytes use
paddq mm0,mm1 and mm1 holds 1
dsouza using registers?
i dont know for what he needs that, but for 1 time update speed isnt necessery, for highscore he can change add eax,1 to add eax,points, also with even 1000 runs its faster to move all to regs, do additions and after all write all to memory, not use add [esi+0],1
all is about optimizing, sometimes even add [something+0],1 will be faster if we have complex code and need all registers, so with direct memory write we dont need to save and restore some reg, ofcourse my code is slow, because it cant be fast, every operation stalls pipeline and alus due next opcode is dependent on last, but he can mix it with other code to make it faster. but for just counting idea is useless like whole topic