News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

fiadd vs add

Started by ted, July 06, 2005, 12:10:22 AM

Previous topic - Next topic

ted

I've been reading Visual C++ Optimization with Assembly Code by Yury Magda (I like it; do you have an opinion?) but I noticed that in Chapter 2 he shows an optimized version of an integer add (actually adding up the elements of an integer array) using assembly language with the VC++ __asm{}. He uses fiadd. The disassembly of the C++ uses an add. Is it often considered better to use an fp instruction and the fp stack rather than an integer instruction to do integer arithmetic? I'm surprised. The book doesn't address that issue other than to say that the asm version is better performing.

The asm code looks like
__asm{
   mov ECX, sf ; sf is the number of elements in the array
   dec ECX
   mov ESI, DWORD PTR piarray
   finit
   fild DWORD PTR [ESI]
next:
   add ESI, 4
   fiadd DWORD PTR [ESI]
   loop next
  fistp DWORD PTR isum
  fwait
}

while the C++ is (I know, pointer arithmetic!)

for (int cnt=0;cnt<sf;cnt++) {
   isum += *piarray;
  piarray++;
}

Tedd

(Why does it seem like I'm talking to myself here? :toothy)

Floating-point instructions are generally slow. And almost always considerably slower than integer operations.
So unless you specifically need to use floating point - don't :wink And if you do, then everyone will tell you to use SSE anyway ::)

That example code could be trying to demonstrate a point, but out of context, using (integer) add would be much better (and MMX would be better still :green)
No snowflake in an avalanche feels responsible.

MichaelW

I doubt that the code is a demonstration of speed optimization. What point could it be trying to demonstrate?

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .586
    include timers.asm

    _fiadd PROTO :DWORD,:DWORD
    _add PROTO :DWORD,:DWORD
    __add PROTO :DWORD,:DWORD
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
        array dd 1000 dup(1)
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    invoke _fiadd, ADDR array, 1000
    print ustr$(eax), 32
    invoke _add, ADDR array, 1000
    print ustr$(eax), 32
    invoke __add, ADDR array, 1000
    print ustr$(eax), 13,10

    LOOP_COUNT EQU 200000
    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      invoke _fiadd, ADDR array, 1000
    counter_end
    print ustr$(eax)," cycles", 13, 10

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      invoke _add, ADDR array, 1000
    counter_end
    print ustr$(eax)," cycles", 13, 10

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      invoke __add, ADDR array, 1000
    counter_end
    print ustr$(eax)," cycles", 13, 10

    mov   eax, input(13,10,"Press enter to exit...")
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

align 4

_fiadd proc uses esi arr:DWORD,cnt:DWORD
    LOCAL iSum:DWORD
    mov   ecx, cnt
    dec   ecx
    mov   esi, arr
    finit
    fild  DWORD PTR[esi]
  @@:
    add   esi, 4
    fiadd DWORD PTR[esi]
    loop  @B
    fistp iSum
    fwait
    mov   eax, iSum
    ret
_fiadd endp

align 4

_add proc uses esi arr:DWORD,cnt:DWORD
    mov   ecx, cnt
    dec   ecx
    mov   esi, arr
    mov   eax, [esi]
  @@:
    add   eax,[esi]
    loop  @B
    ret
_add endp

; The optimization technique used here was borrowed
; from the MASM32 arradd procedure.
align 4
__add proc arr:DWORD,cnt:DWORD
    mov   ecx, cnt
    mov   edx, arr
    add   ecx, ecx
    xor   eax, eax
    add   ecx, ecx
    add   edx, ecx
    neg   ecx
    jmp   @F
  align 16
  @@:
    add   eax,[edx+ecx]
    add   ecx, 4
    jnz   @B
    ret
__add endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

Results on my P3

1000 1000 1000
9180 cycles
6317 cycles
2031 cycles



[attachment deleted by admin]
eschew obfuscation

Eóin

Not revelant to the above demonstartion, but one advantage of fiadd is that it'll work with 64bit ints.