News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

add$ returns wrong value ?

Started by Slugsnack, May 16, 2009, 01:09:33 PM

Previous topic - Next topic

jj2007

Quote from: dedndave on May 17, 2009, 07:18:44 PM

Jochen - sounds like a new macro as opposed to changing the old one
hafta maintain backward compatibility, whenever possible


Dave, you are perfectly right. Here it is, a bit bloated but fully compatible, with an additional shorter and faster syntax (I hope you do not consider the fact that it preserves ecx and edx as a serious issue :bg)

Regs preserved:
addx$/szCatStr2:
add2$/szCatStr3: ecx edx

timings:
549 cycles for addx$
628 cycles for add2$, old syntax
252 cycles for add2$, short syntax

Code sizes:
szCatStr2    = 45
szCatStr3    = 91

[attachment deleted by admin]

hutch--

JJ,

Here is a quick tweak of the szCatStr algo. Changed it to using the Agner Fog StrLen algo which dropped about 200 ms from its timing then unrolled the main loop by 4 which improved its timing by about 6 or 7%. Tested the difference on this Prescott 3.2 gig PIV between TEST EAX, EAX and ADD EAX, EAX and the add was measurably faster.


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 16
szCatStr2 proc lpszSource:DWORD, lpszAdd:DWORD

    invoke StrLen,[esp+4]           ; get source length
    mov edx, [esp+4]                ; load source address
    mov ecx, [esp+8]                ; load append string address
    add edx, eax                    ; set write starting position

    push edi
    or edi, -1

  @@:

  REPEAT 3

    add edi, 1
    movzx eax, BYTE PTR [ecx+edi]   ; read byte from append string and zero extend it
    mov [edx+edi], al               ; write append string to end of source
    add eax, eax
    jz @F

  ENDM

    add edi, 1
    movzx eax, BYTE PTR [ecx+edi]   ; read byte from append string and zero extend it
    mov [edx+edi], al               ; write append string to end of source
    add eax, eax
    jnz @B

  @@:

    pop edi

    mov eax, [esp+4]                ; return start address of source for macro

    ret 8

szCatStr2 endp
szCatStr2_END:

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

#17
Quote from: hutch-- on May 18, 2009, 05:46:51 AM
JJ,

Here is a quick tweak of the szCatStr algo. Changed it to using the Agner Fog StrLen algo which dropped about 200 ms from its timing then unrolled the main loop by 4 which improved its timing by about 6 or 7%. Tested the difference on this Prescott 3.2 gig PIV between TEST EAX, EAX and ADD EAX, EAX and the add was measurably faster.


364 down from 555, that's quite a big improvement :U

364 cycles for addx$
626 cycles for add2$, old syntax
253 cycles for add2$, short syntax

As to add eax, eax, I will have to test it on my two machines. May I quote you in certain wars about register destruction etc.?
:wink

Edit: P4 timings, new version:

668 cycles for addx$
894 cycles for add3$, old syntax
862 cycles for add4$, old syntax
279 cycles for add3$, short syntax
243 cycles for add4$, short syntax

add eax, eax is a lot faster on the P4! (and with movzx eax, al there is no risk of getting ever a fake zero :wink)

[attachment deleted by admin]

hutch--

 :bg

> As to add eax, eax, I will have to test it on my two machines. May I quote you in certain wars about register destruction etc.?

As long as you quote the following MOVZX.  :P
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

Hutch - i thought "OR eax,eax" was in vogue, nowdays
are you going to make an update to the masm library with Agner's algo ?

hutch--

Dave,

In the context of the algo, a simple blind ADD does less work that a discarded AND so at least on the PIVs I run it is usually faster if it is within an intensive loop. The code I posted will become the replacement for the szCatStr algo in the masm32 library.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Mark Jones

Old drive died, corrupted sectors in the registry... very ugly. :(

Finally installed WinXP Pro x64 on a new drive... :bg

AMD Athlon x2 4000+ / WinXP Pro x64
376 cycles for addx$
722 cycles for add3$, old syntax
280 cycles for add3$, short syntax
262 cycles for add4$, short syntax

Code sizes:
szCatStr2    = 87
szCatStr3    = 95
szCatStr4    = 112
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08