Surely according to the definition, EAX at EIP should be the pointer 0x403020, ie. source buffer ? I noticed this a long time ago but never pointed it out it seems to return lpString instead of lpBuffer as the documentation appears to describe
please excuse my poor attempt at hungarian notation.. :dance:
(http://img200.imageshack.us/img200/8254/config.jpg)
The error is actually in szCatStr, which returns the address of the second string.
But szCatStr says it has no return values, so the error is in Add$ assuming szCatStr returns the address of the first string.
Even though the documentation for szCatStr says it has no return values, the comments imply it tries to return the address of the first string, but it is a coding error-
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
szCatStr proc lpszSource:DWORD, lpszAdd:DWORD
push edi
invoke szLen,[esp+8] ; get source length
mov edi, [esp+8]
mov ecx, [esp+12]
add edi, eax ; set write starting position
xor edx, edx ; zero index
@@:
movzx eax, BYTE PTR [ecx+edx] ; write append string to end of source
mov [edi+edx], al
add edx, 1
test eax, eax ; exit when terminator is written
jne @B
pop edi <------------------------- notice this, the source is no longer in [esp+8]
mov eax, [esp+8] ; return start address of source
ret 8
szCatStr endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
Whoever did this outsmarted themselves by trying to use PROLOGUE:NONE, but lost track of the stack.
So whether the error is in Add$ because it assumed a return from a proc that says it doesn't have a return, or in szCatStr that tried to document it's coding error away, is a debatable question.
we should be able to fix szCatStr.asm without causing problems - is that a good assumption ?
i don't see any reprocussions from fixing it - maybe i am missing something
It depends upon if you want to fix the documentation so it says returns the start of the second string, or remove the "mov eax, [esp+8] " instruction so it truly doesn't return anything and thereby matches the documetation and saves a few cycles. Hopefully there isn't too many programs that rely on the fact that it now returns the start of the second string.
I would think it would be better to just load the correct address in the Add$ macro.
Quote from: Jimg on May 16, 2009, 04:48:45 PM
It depends upon if you want to fix the documentation so it says returns the start of the second string
Even more useful would be to return the address of the zero delimiter after the added string, so that you can use it directly for repeated add$ calls.
I'm going to agree. I have my own printx and printxa macros I use for this, and I couldn't live without them now.
i moved the pop edi to after the mov eax,[esp+8]
the documentation is still valid - i.e. do not assume it returns anything
(seeing as how the return value is undefined per the docs)
you guys are right, of course, as to which is more useful, but this fixes the add$
i suppose i could make both right, but then i would have compatibility issues with the rest of the world
i am a n00b - trying to update the library without breaking anything :red
It looks like the problem first appeared somewhere between version 9r and version 9 sp2. I never noticed because I apparently never tried to use the return value or the add$ macro since the problem was introduced. The fix is fairly involved, but doable ;)
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.data
str0 db "my other ", 0, 20 dup(0)
str1 db "brother darryl",0
db 10 dup(0)
str2 db "my other ", 0, 20 dup(0)
str3 db "brother darryl",0
db 10 dup(0)
str4 db "my other ", 0, 20 dup(0)
str5 db "brother darryl",0
db 10 dup(0)
str6 db "my other ", 0, 20 dup(0)
str7 db "brother darryl",0
db 10 dup(0)
str8 db "my other ", 0, 20 dup(0)
str9 db "brother darryl",0
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
_add$ MACRO lpSource,lpAppend
invoke _szCatStr,tstarg(lpSource),reparg(lpAppend)
EXITM <eax>
ENDM
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
_szCatStr proc lpszSource:DWORD, lpszAdd:DWORD
push edi
invoke szLen,[esp+8] ; get source length
mov edi, [esp+8]
mov ecx, [esp+12]
add edi, eax ; set write starting position
xor edx, edx ; zero index
@@:
movzx eax, BYTE PTR [ecx+edx] ; write append string to end of source
mov [edi+edx], al
add edx, 1
test eax, eax ; exit when terminator is written
jne @B
pop edi
mov eax, [esp+4] ; return start address of source
ret 8
_szCatStr endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
szCatStrOld proc lpszSource:DWORD, lpszAdd:DWORD
push edi
invoke szLen,lpszSource ; get source length
mov edi, lpszSource
mov ecx, lpszAdd
add edi, eax ; set write starting position
xor edx, edx ; zero index
xor eax, eax ; avoid stall with following AL reads and writes
@@:
mov al, [ecx+edx] ; write append string to end of source
mov [edi+edx], al
add edx, 1
test al, al ; exit when terminator is written
jne @B
pop edi
mov eax, lpszSource ; return start address of source
ret
szCatStrOld endp
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
print str$(len(ADDR str0)),9
print str$(len(ADDR str1)),13,10
invoke szCatStr, ADDR str0, ADDR str1
print eax,13,10
print str$(len(ADDR str0)),13,10,13,10
print str$(len(ADDR str2)),9
print str$(len(ADDR str3)),13,10
print add$(ADDR str2, ADDR str3),13,10
print str$(len(ADDR str2)),13,10,13,10
print str$(len(ADDR str4)),9
print str$(len(ADDR str5)),13,10
invoke _szCatStr, ADDR str4, ADDR str5
print eax,13,10
print str$(len(ADDR str4)),13,10,13,10
print str$(len(ADDR str6)),9
print str$(len(ADDR str7)),13,10
print _add$(ADDR str6, ADDR str7),13,10
print str$(len(ADDR str6)),13,10,13,10
print str$(len(ADDR str8)),9
print str$(len(ADDR str9)),13,10
invoke szCatStrOld, ADDR str8, ADDR str9
print eax,13,10
print str$(len(ADDR str8)),13,10,13,10
inkey "Press any key to exit..."
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
9 14
brother darryl
23
9 14
brother darryl
23
9 14
my other brother darryl
23
9 14
my other brother darryl
23
9 14
my other brother darryl
23
And BTW, szCatStr has been documented as not returning a value since at least 2003.
how many brothers ya got, there, Michael ?
"my other brother Darryl" is from an often-repeated line by William Sanderson's character Larry, from the series Newhart (http://en.wikipedia.org/wiki/Newhart), that for some reason stuck in my mind.
oh yah - i remember those guys - lol
totally type-casted that guy, too - i can't watch him in anything else without looking for his 2 brothers
The return value was added later so the macro add$() could be used in the function position. It is evidently in the wrong place and should have been written to EAX before EDI was popped which changes ESP. The other alternative is to write the stack address directly to EAX without the correction done for the first push but the first is tidier.
This minor rewrite seems to be testing up OK. Unless there is something major wrong with it, this is the replacement.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
szCatStr2 PROTO :DWORD, :DWORD
; ********************************
; BASIC string function emulation
; ********************************
addx$ MACRO lpSource,lpAppend
invoke szCatStr2,tstarg(lpSource),reparg(lpAppend)
EXITM <eax>
ENDM
.data?
value dd ?
.data
buffer db "text ",0," "
txt1 db "text 1 ",0
txt2 db "text 2",0
tstbuf db "test ",0," "
tst1 db "test 1 ",0
tst2 db "test 2 ",0
tst3 db "test 3 ",13,10,0
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
print str$(OFFSET buffer)," original address",13,10,13,10
invoke szCatStr2,ADDR buffer,ADDR txt1
print str$(eax)," 1st return address",13,10
print ADDR buffer,13,10
invoke szCatStr2,ADDR buffer,ADDR txt2
print str$(eax)," 2nd return address",13,10
print ADDR buffer,13,10
print "test the macro",13,10
print addx$(addx$(addx$(ADDR tstbuf,ADDR tst1),ADDR tst2),ADDR tst3)
ret
main endp
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
szCatStr2 proc lpszSource:DWORD, lpszAdd:DWORD
invoke szLen,[esp+4] ; get source length
mov edx, [esp+4] ; load source address
mov ecx, [esp+8] ; load append string address
add edx, eax ; set write starting position
push edi
or edi, -1
@@:
add edi, 1
movzx eax, BYTE PTR [ecx+edi] ; read byte from append string and zero extend it
mov [edx+edi], al ; write append string to end of source
add eax, eax
jnz @B
pop edi
mov eax, [esp+4] ; return start address of source for macro
ret 8
szCatStr2 endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
PS: Thanks everyone for recognising the problem and identifying the error.
Nonetheless, the option with the end of the buffer seems interesting:
516 cycles for addx$
812 cycles for add2$ <- bad technology
263 cycles for add3$ <- good one
[attachment deleted by admin]
yes - nice catch Jim
Jochen - sounds like a new macro as opposed to changing the old one
hafta maintain backward compatibility, whenever possible
(unlike some company that starts with an "M" (http://l.yimg.com/us.yimg.com/i/mesg/emoticons7/49.gif))
Quote from: dedndave on May 17, 2009, 07:18:44 PM
Jochen - sounds like a new macro as opposed to changing the old one
hafta maintain backward compatibility, whenever possible
Dave, you are perfectly right. Here it is, a bit bloated but fully compatible, with an additional shorter and faster syntax (I hope you do not consider the fact that it preserves ecx and edx as a serious issue :bg)
Regs preserved:
addx$/szCatStr2:
add2$/szCatStr3: ecx edx
timings:
549 cycles for addx$
628 cycles for add2$, old syntax
252 cycles for add2$, short syntax
Code sizes:
szCatStr2 = 45
szCatStr3 = 91
[attachment deleted by admin]
JJ,
Here is a quick tweak of the szCatStr algo. Changed it to using the Agner Fog StrLen algo which dropped about 200 ms from its timing then unrolled the main loop by 4 which improved its timing by about 6 or 7%. Tested the difference on this Prescott 3.2 gig PIV between TEST EAX, EAX and ADD EAX, EAX and the add was measurably faster.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
szCatStr2 proc lpszSource:DWORD, lpszAdd:DWORD
invoke StrLen,[esp+4] ; get source length
mov edx, [esp+4] ; load source address
mov ecx, [esp+8] ; load append string address
add edx, eax ; set write starting position
push edi
or edi, -1
@@:
REPEAT 3
add edi, 1
movzx eax, BYTE PTR [ecx+edi] ; read byte from append string and zero extend it
mov [edx+edi], al ; write append string to end of source
add eax, eax
jz @F
ENDM
add edi, 1
movzx eax, BYTE PTR [ecx+edi] ; read byte from append string and zero extend it
mov [edx+edi], al ; write append string to end of source
add eax, eax
jnz @B
@@:
pop edi
mov eax, [esp+4] ; return start address of source for macro
ret 8
szCatStr2 endp
szCatStr2_END:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
Quote from: hutch-- on May 18, 2009, 05:46:51 AM
JJ,
Here is a quick tweak of the szCatStr algo. Changed it to using the Agner Fog StrLen algo which dropped about 200 ms from its timing then unrolled the main loop by 4 which improved its timing by about 6 or 7%. Tested the difference on this Prescott 3.2 gig PIV between TEST EAX, EAX and ADD EAX, EAX and the add was measurably faster.
364 down from 555, that's quite a big improvement :U
364 cycles for addx$
626 cycles for add2$, old syntax
253 cycles for add2$, short syntax
As to add eax, eax, I will have to test it on my two machines. May I quote you in certain wars about register destruction etc.?
:wink
Edit: P4 timings, new version:
668 cycles for addx$
894 cycles for add3$, old syntax
862 cycles for add4$, old syntax
279 cycles for add3$, short syntax
243 cycles for add4$, short syntax
add eax, eax is a lot faster on the P4! (and with movzx eax, al there is no risk of getting ever a fake zero :wink)
[attachment deleted by admin]
:bg
> As to add eax, eax, I will have to test it on my two machines. May I quote you in certain wars about register destruction etc.?
As long as you quote the following MOVZX. :P
Hutch - i thought "OR eax,eax" was in vogue, nowdays
are you going to make an update to the masm library with Agner's algo ?
Dave,
In the context of the algo, a simple blind ADD does less work that a discarded AND so at least on the PIVs I run it is usually faster if it is within an intensive loop. The code I posted will become the replacement for the szCatStr algo in the masm32 library.
Old drive died, corrupted sectors in the registry... very ugly. :(
Finally installed WinXP Pro x64 on a new drive... :bg
AMD Athlon x2 4000+ / WinXP Pro x64
376 cycles for addx$
722 cycles for add3$, old syntax
280 cycles for add3$, short syntax
262 cycles for add4$, short syntax
Code sizes:
szCatStr2 = 87
szCatStr3 = 95
szCatStr4 = 112