News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Preserve carry flag in loop

Started by Eddy, August 29, 2005, 08:33:14 AM

Previous topic - Next topic

Eddy

Hi all,

I have a routine that adds 2 multi-precision (or arbitrary precision) integers.
This means that the integers can be many dwords long.
For reasons of speed I add the integers dword per dword in chunks of 4 dwords (loop unrolling).

This is the basic loop:
Note that:
- esi is negative and is incremented to zero
- eax, ebx and edx actually point to the END of the operands (that's why esi must be negative initially)


    Lab_Add1:
        mov  edx, [eax+esi]       'get dword of h1
        adc  edx, [ebx+esi]       'add dword of h2
        mov  [ecx+esi], edx       'store result in hr

        mov  edx, [eax+esi+4]     'get dword of h1
        adc  edx, [ebx+esi+4]     'add dword of h2
        mov  [ecx+esi+4], edx     'store result in hr

        mov  edx, [eax+esi+8]     'get dword of h1
        adc  edx, [ebx+esi+8]     'add dword of h2
        mov  [ecx+esi+8], edx     'store result in hr

        mov  edx, [eax+esi+12]    'get dword of h1
        adc  edx, [ebx+esi+12]    'add dword of h2
        mov  [ecx+esi+12], edx    'store result in hr

        ADD esi, 16               'increment loop counter
       
    jnz Lab_Add1


There's one little annoyance. 'ADD esi, 16' affects the carry flag.
And I don't want that, since the CF must be preserved between loops.

Here is a (not so elegant/efficient) solution.
Does anyone know of a more efficient solution?


    Lab_Add1:
        mov  edx, [eax+esi]       'get dword of h1
        adc  edx, [ebx+esi]       'add dword of h2
        mov  [ecx+esi], edx       'store result in hr

        mov  edx, [eax+esi+4]     'get dword of h1
        adc  edx, [ebx+esi+4]     'add dword of h2
        mov  [ecx+esi+4], edx     'store result in hr

        mov  edx, [eax+esi+8]     'get dword of h1
        adc  edx, [ebx+esi+8]     'add dword of h2
        mov  [ecx+esi+8], edx     'store result in hr

        mov  edx, [eax+esi+12]    'get dword of h1
        adc  edx, [ebx+esi+12]    'add dword of h2
        mov  [ecx+esi+12], edx    'store result in hr

        rcl edi, 1                'Store carryflag because it can be overwritten by the ADD
        ADD esi, 16               'increment loop counter
        rcr edi, 1                'Retrieve stored carryflag
       
    jnz Lab_Add1


Kind regards
Eddy
Eddy
www.devotechs.com -- HIME : Huge Integer Math and Encryption library--

Petroizki

Use lea instruction, it does not affect any flags:

lea esi, [esi + 16]

Eddy

That was my first thought, but for 'jnz Lab_Add1' I need to have the ZF affected.
To do that, I would have to use CMP... which again affects the CF... and I have the same problem again..:(

Kind regards
Eddy
Eddy
www.devotechs.com -- HIME : Huge Integer Math and Encryption library--

PBrennick

You can store the flag condition you want to preserve in a memory location.  You would then react to the contents of that memory location as need be.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

Eddy

Paul,

That's pretty much what I'm doing in my second code snippet, except I'm using the EDI register instead of a memory location.

      rcl edi, 1             'Store carryflag because it can be overwritten by the ADD
      ADD esi, 16            'increment loop counter
      rcr edi, 1             'Retrieve stored carryflag

But I was wondering if there was a way without having to store the state of the CF flag...

Too bad I can't use a scale factor of 16 in the addressing, otherwise I could do this:

...
mov  edx, [eax+esi*16+4]
...
inc esi
jnz Lab_Add1


INC doesn't affect CF. But unfortunately scale factor can be no higher than 8...:(

Kind regards
Eddy
Eddy
www.devotechs.com -- HIME : Huge Integer Math and Encryption library--

PBrennick

How about pushing the flags, do the compare, pop the flags and then do the jnz thing.  It will react to the results of the compare and you get to keep the previous state of the flags.

Paul
The GeneSys Project is available from:
The Repository or My crappy website

Eddy

Paul,

When I pushf, cmp, popf, the results (flags) of the cmp will be overwritten by the pop.
Besides, what I did above (rcl/rcr) is faster than pushf/popf.

Kind regards
Eddy
Eddy
www.devotechs.com -- HIME : Huge Integer Math and Encryption library--

roticv

You can try using MMX instructiions (You would have to modify the code below as I just copy + paste).


movd mm0, [ecx]
movd mm1, [edx]
paddq mm1, mm0
movd [eax], mm1
psrlq mm1, 32
i = 4
REPT N/2 - 1
movd mm0, [ecx+i]
movd mm2, [edx+i]
paddq mm2, mm0
paddq mm2, mm1
movd [eax+i],mm2
psrlq mm2, 32
i = i + 4
movd mm0, [ecx+i]
movd mm1, [edx+i]
paddq mm1, mm0
paddq mm1, mm2
movd [eax+i], mm1
psrlq mm1, 32
i = i + 4
ENDM
movd mm0,[ecx+i]
movd mm2,[edx+i]
paddq mm2, mm0
paddq mm2, mm1
movq [eax+i], mm2

Frank

This combines two partial solutions:


lea esi, [esi+15]
inc esi
jnz Lab_Add1


Eddy

Roticv,
Yes, I really should start getting into that MMX thing, but I haven't so far..
Thanks for your suggestion!

Frank,
How clever! Hadn't thought of that. Thanks!!   :U

Kind regards
Eddy
Eddy
www.devotechs.com -- HIME : Huge Integer Math and Encryption library--