News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Gigabit integer add optimization

Started by jnm2, February 15, 2012, 03:50:31 PM

Previous topic - Next topic

qWord

my suggestion would be:
CHUNK_INFO struct
    pChunkA     PVOID   ? ; in , destination
    pChunkB     PVOID   ? ; in , source
    n           DWORD   ? ; in , number of QWORDs
    carry       DWORD   ? ; in/out  , 0 or 1
CHUNK_INFO ends


; thread procedure
BlockAdd proc pChunkInfo:ptr CHUNK_INFO
   
    mov rbx,pChunkInfo
    mov ecx,[rbx].CHUNK_INFO.n
    mov rdi,[rbx].CHUNK_INFO.pChunkA
    mov rsi,[rbx].CHUNK_INFO.pChunkB
    shr ecx,3
    mov edx,[rbx].CHUNK_INFO.carry
    align 16
@@:
    mov r8 ,[rsi+0*8]
    mov r9 ,[rsi+1*8]
    mov r10,[rsi+2*8]
    mov r11,[rsi+3*8]
    mov r12,[rsi+4*8]
    mov r13,[rsi+5*8]
    mov r14,[rsi+6*8]
    mov r15,[rsi+7*8]
    lea rsi,[rsi+8*8]
    add r8,rdx
    mov rdx,0
    adc [rdi+0*8],r8
    adc [rdi+1*8],r9
    adc [rdi+2*8],r10
    adc [rdi+3*8],r11
    adc [rdi+4*8],r12
    adc [rdi+5*8],r13
    adc [rdi+6*8],r14
    adc [rdi+7*8],r15
    adc rdx,0
    lea rdi,[rdi+8*8]
   
    ; this loop could be unrolled several times ...
   
    dec ecx
    jnz @B
   
    mov [rbx].CHUNK_INFO.carry,edx
   
    mov rax,1
    ret
   
BlockAdd endp

The structure is filled before calling CreateThread() with the needed addresses. Also a possible carry is returned through it.

EDIT: code changed
FPU in a trice: SmplMath
It's that simple!

jj2007

Quote from: jnm2 on February 16, 2012, 01:29:28 AM
PADDQ doesn't do carries.

Valid argument. Now if your data are predictable, you could branch into SSE code if no carry expected (test for bit 63?). Usually the speed gain with SSE2 is considerable...