Hi all,
I have a routine that adds 2 multi-precision (or arbitrary precision) integers.
This means that the integers can be many dwords long.
For reasons of speed I add the integers dword per dword in chunks of 4 dwords (loop unrolling).
This is the basic loop:
Note that:
- esi is negative and is incremented to zero
- eax, ebx and edx actually point to the END of the operands (that's why esi must be negative initially)
Lab_Add1:
mov edx, [eax+esi] 'get dword of h1
adc edx, [ebx+esi] 'add dword of h2
mov [ecx+esi], edx 'store result in hr
mov edx, [eax+esi+4] 'get dword of h1
adc edx, [ebx+esi+4] 'add dword of h2
mov [ecx+esi+4], edx 'store result in hr
mov edx, [eax+esi+8] 'get dword of h1
adc edx, [ebx+esi+8] 'add dword of h2
mov [ecx+esi+8], edx 'store result in hr
mov edx, [eax+esi+12] 'get dword of h1
adc edx, [ebx+esi+12] 'add dword of h2
mov [ecx+esi+12], edx 'store result in hr
ADD esi, 16 'increment loop counter
jnz Lab_Add1
There's one little annoyance. 'ADD esi, 16' affects the carry flag.
And I don't want that, since the CF must be preserved between loops.
Here is a (not so elegant/efficient) solution.
Does anyone know of a more efficient solution?
Lab_Add1:
mov edx, [eax+esi] 'get dword of h1
adc edx, [ebx+esi] 'add dword of h2
mov [ecx+esi], edx 'store result in hr
mov edx, [eax+esi+4] 'get dword of h1
adc edx, [ebx+esi+4] 'add dword of h2
mov [ecx+esi+4], edx 'store result in hr
mov edx, [eax+esi+8] 'get dword of h1
adc edx, [ebx+esi+8] 'add dword of h2
mov [ecx+esi+8], edx 'store result in hr
mov edx, [eax+esi+12] 'get dword of h1
adc edx, [ebx+esi+12] 'add dword of h2
mov [ecx+esi+12], edx 'store result in hr
rcl edi, 1 'Store carryflag because it can be overwritten by the ADD
ADD esi, 16 'increment loop counter
rcr edi, 1 'Retrieve stored carryflag
jnz Lab_Add1
Kind regards
Eddy
Use lea instruction, it does not affect any flags:
lea esi, [esi + 16]
That was my first thought, but for 'jnz Lab_Add1' I need to have the ZF affected.
To do that, I would have to use CMP... which again affects the CF... and I have the same problem again..:(
Kind regards
Eddy
You can store the flag condition you want to preserve in a memory location. You would then react to the contents of that memory location as need be.
Paul
Paul,
That's pretty much what I'm doing in my second code snippet, except I'm using the EDI register instead of a memory location.
rcl edi, 1 'Store carryflag because it can be overwritten by the ADD
ADD esi, 16 'increment loop counter
rcr edi, 1 'Retrieve stored carryflag
But I was wondering if there was a way without having to store the state of the CF flag...
Too bad I can't use a scale factor of 16 in the addressing, otherwise I could do this:
...
mov edx, [eax+esi*16+4]
...
inc esi
jnz Lab_Add1
INC doesn't affect CF. But unfortunately scale factor can be no higher than 8...:(
Kind regards
Eddy
How about pushing the flags, do the compare, pop the flags and then do the jnz thing. It will react to the results of the compare and you get to keep the previous state of the flags.
Paul
Paul,
When I pushf, cmp, popf, the results (flags) of the cmp will be overwritten by the pop.
Besides, what I did above (rcl/rcr) is faster than pushf/popf.
Kind regards
Eddy
You can try using MMX instructiions (You would have to modify the code below as I just copy + paste).
movd mm0, [ecx]
movd mm1, [edx]
paddq mm1, mm0
movd [eax], mm1
psrlq mm1, 32
i = 4
REPT N/2 - 1
movd mm0, [ecx+i]
movd mm2, [edx+i]
paddq mm2, mm0
paddq mm2, mm1
movd [eax+i],mm2
psrlq mm2, 32
i = i + 4
movd mm0, [ecx+i]
movd mm1, [edx+i]
paddq mm1, mm0
paddq mm1, mm2
movd [eax+i], mm1
psrlq mm1, 32
i = i + 4
ENDM
movd mm0,[ecx+i]
movd mm2,[edx+i]
paddq mm2, mm0
paddq mm2, mm1
movq [eax+i], mm2
This combines two partial solutions:
lea esi, [esi+15]
inc esi
jnz Lab_Add1
Roticv,
Yes, I really should start getting into that MMX thing, but I haven't so far..
Thanks for your suggestion!
Frank,
How clever! Hadn't thought of that. Thanks!! :U
Kind regards
Eddy