News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Reverse string

Started by lostcauz, October 01, 2005, 05:19:45 PM

Previous topic - Next topic

Ratch

BoR0,

Quote
This is my piece of code. Bit old though, but still works

    Not the code you submitted.  I had to correct the loading of ESI and EDI.  The source and destination were "reversed".  After making the corrections, it performed correctly in all cases.  I don't like the way it calls a string lenth subroutine internally, however.  I think the string length should be parameterized, so manual editing is not necessary to select a string length subroutine.  Since it transfers a byte at a time, it is probably slower that the dword movers.  And it would be better if it did not need a second buffer.  Neverless, it works!  Ratch

Ratch

lingo,
     You're spinning your wheels.  Both lostcauz32 and Lingo32 do NOT reverse the 'ABCDEFGHI' string correctly.  They should be fixed before timing is done.  Ratch

lostcauz

Hi Ratch,

I tested my method on every combination from string length of 1 to buffer size. Absolutely no problems, including the string you mentioned. Of course I test for correctness first and thought I was past that point. Did the zip I included work correctly for you? Or the original?

Ratch

lostcauz,
     OK, now that I know what I am looking for, I see that you and lingo compensate for the zero byte by a str1-1 term in SIZEOF.  This causes of string in my test to be truncated by one byte, because I add the zero byte to the string by a data statement on another line.  Therefore SIZEOF does not read that zero byte.  So both you and lingo's subroutines DO work correctly.

     Now about timing.  All of our programs basically use the same algorithm.  That is, pick up a DWORD or BYTE from one end of the string, BSWAP if it is a DWORD, and put it down on the other end of the string.  There is not much room for improvement in that loop where it really counts, if we use that algo.  All the other code is for setup and post loop follow up, which only gets executed once.  So unless you come up with a new kick-ass algorithm, all of our code is going to time about the same, with the variations explained by cache, CPU load, CPU brand, memory, and who knows whatever else. You should see a difference for a BYTE move vs a DWORD move, but that is about it.  So I guess the next challenge is to come up with a better algo and more features.  Like a single buffer maybe?  Ratch

NightWare

Quote from: Ratch on October 04, 2005, 01:38:55 AM
NightWare,
     Your routine still needs work.  Did you test it?  I tried it with 'ABCDEFGHI' and it did NOT reverse the characters.

? i don't understand... i've made test again, and it seams it works fine...

Quote from: Ratch on October 04, 2005, 01:38:55 AM
Also it cycled through the dword loop 9 times when it only needs to cycle once for this string.

? i'd like to know how it's possible...

and my code use a single buffer already...

Ratch

NightWare,
     Look 8 instructions past label2 in the code you submitted.  You should find a instruction cmp esi,[edi+8] .  Both ESI and EDI have source and destination addresses respectively.  You are comparing an address in ESI with the CONTENTS of memory [EDI+8] .  I can't see where that makes any sense.  I checked it with a debugger, and that is what is happening.  Maybe you want the comparison to be with the VALUE of EDI+8.   Are you sure you are using the same code you posted?  Better check.  Ratch

NightWare

Quote from: Ratch on October 05, 2005, 01:34:07 AM
NightWare,
     Look 8 instructions past label2 in the code you submitted.  You should find a instruction cmp esi,[edi+8] .  Both ESI and EDI have source and destination addresses respectively.  You are comparing an address in ESI with the CONTENTS of memory [EDI+8] .  I can't see where that makes any sense.  I checked it with a debugger, and that is what is happening.  Maybe you want the comparison to be with the VALUE of EDI+8.   Are you sure you are using the same code you posted?  Better check.  Ratch

:toothy, it make sense coz this code allow a compare instruction wich can exit with a differencie of 8, that's why it just make 1 loop instead of 9.... you should change of debugger...  :wink , and if you think a bit, the code cant' made the task correctly otherwise...  :dance:

to play with the others in the speed test, here the same code without the included szlen, (but it's still single buffer, coz i don't see why i should downgrade this code with extra lines...):


ALIGN 4
FastReverseString PROC StrAddr:DWORD,StrSize:DWORD
push esi
push edi

mov esi,StrAddr
mov eax,StrSize
lea edi,[esi+eax] ; if you use SIZEOF => lea edi,[esi+eax-1]
cmp eax,2 ; something to reverse ?
jb Label4
cmp eax,9 ; small string ?
jb Label3
Label2: sub edi,4 ; big string algo
mov edx,DWORD PTR [edi]
mov eax,DWORD PTR [esi]
bswap eax
bswap edx
mov DWORD PTR [esi],edx
mov DWORD PTR [edi],eax
add esi,4
cmp esi,[edi+8]
jb Label2
Label3: sub edi,1 ; small string algo
mov dh,BYTE PTR [edi]
mov dl,BYTE PTR [esi]
mov BYTE PTR [esi],dh
mov BYTE PTR [edi],dl
add esi,1
cmp esi,edi
jb Label3
Label4:
pop edi
pop esi
ret
FastReverseString ENDP


NightWare

 :snooty: hmm..., maybe i should explain why i've put the +8, coz it seams you don't understand...
the +8 i've added to the compare instruction is equal to a virtual loop (+4 and -4, so the distance between the left and the right of the string = -8)
in the big string algo, it's made to see if i have to made the loop again, coz i have to keep in mind that i have to pass by the small string algo after (in all cases)

voilà... i hope you understand now... coz i can't be more explicit...  :'(

Ratch

NightWare,

Quote
, it make sense coz this code allow a compare instruction wich can exit with a differencie of 8, that's why it just make 1 loop instead of 9.... you should change of debugger...   

     The question is what is it comparing, the address or the contents of the address.  I use OllyDbg, they don't come much better than that.

Quote
to play with the others in the speed test, here the same code without the included szlen, (but it's still single buffer, coz i don't see why i should downgrade this code with extra lines...):

     A single buffer is a virtue.  That should not be changed.

     OK I corrected your code so it works correctly now.  I enclose the corrected snippet.  Your code is the smallest submitted thus far.  I will have to try to beat its small size.  Ratch


Label2: sub edi,4 ; big string algo
mov edx,DWORD PTR [edi]
mov eax,DWORD PTR [esi]
bswap eax
bswap edx
mov DWORD PTR [esi],edx
mov DWORD PTR [edi],eax
      LEA EAX,[EDI+8]
add esi,4
; cmp esi,[edi+8]
      CMP ESI,EAX
; jb Label2
      JAE Label2
Label3: sub edi,1 ; small string algo

Ratch

To All,
     Here is my revised reverse string subroutine.  It's short, and only one loop.  Beat it do death everyone.  Ratch


;*******************************************************************************
; REVBSTR:   Reverses the bytes of an ASCII string.                            *
;                                                                              *
; Called by: PUSH <address of string>                                          *
;            PUSH <string length>                                              *
;            CALL REVBSTR                                                      *
;                                                                              *
; Returns:   Nothing                                                           *
;                                                                              *
; Notes:     This subroutine conforms with WIN32 conventions regarding         *
;            register preservation and stack balancing.                        *
;                                                                              *
; Coder:     Ratch                                                             *
;*******************************************************************************
REVBS$ STRUC
  RETURN    DWORD ?
RV1$ = $
  STRLEN    DWORD ?
  SADR      DWORD ?
RV2$ = $
REVBS$ ENDS

RVBS$ EQU ESP.REVBS$            ;save some typing

REVBSTR:                        ;it all begins here
PUSH ESI                       ;save register
PUSH EDI                       ;save register
MOV EAX,[RVBS$.STRLEN+2*DWORD] ;string length
MOV ESI,[RVBS$.SADR+2*DWORD]   ;string start address
SHRD EDX,EAX,3                 ;mod 8
PUSH EBX                       ;save register
LEA EDI,[ESI+EAX-DWORD]        ;string end address - DWORD
SHR EDX,32-3                   ;EDX = remainder
SHR EAX,3                      ;divide by 8

.IF EDX > 3
   INC EAX                      ;increase loop count
   XOR EDX,EDX                  ;remainder is set to zero
.ENDIF

.WHILE EAX                     ;8 bytes moved each time through loop
   MOV ECX,[ESI]                ;pick up 4 bytes
   MOV EBX,[EDI]                ;pick up 4 bytes
   BSWAP ECX
   BSWAP EBX
   MOV [EDI],ECX                ;put down 4 bytes
   MOV [ESI],EBX                ;put down 4 bytes
   ADD ESI,DWORD                ;move string end pointer
   SUB EDI,DWORD                ;move string beginning pointer
   DEC EAX                      ;loop counter
.ENDW

.IF EDX                        ;if remainder
   ADD EDI,DWORD-BYTE           ;adjust pointer
   MOV CL,BYTE PTR [ESI]        ;pick up byte
   MOV BL,BYTE PTR [EDI]        ;pick up byte
   MOV BYTE PTR [EDI],CL        ;put down byte
   MOV BYTE PTR [ESI],BL        ;put down byte
.ENDIF

POP ESI                        ;restore register
POP EDI                        ;restore register
POP EBX                        ;restore register
RET RV2$-RV1$                  ;return home

NightWare

ratch,

fisrt of all, you haven't corrected anything ! coz my code only make ONE LOOP ! if you want a proof, make a speed test (mine should be slower than your corrected version in this case... isn't it ?  :naughty: and it's not the case...). if you want another proof, the string CAN'T be reversed correctly, if i made those "extra" loops...  :bdg

to finish with a pleasant word, your revised code is quite interresting  :U , coz it avoid lot of useless work (that's not the case in mine, it made useless work in many case) so i've decided to make something similar to avoid the systematic pass by the small string algo... here the result :


AnotherStringReverse PROC StrAddr:DWORD,StrSize:DWORD
ALIGN 16
push ebx
push esi
push edi

mov esi,StrAddr
mov eax,StrSize
lea edi,[esi+eax] ; if you use SIZEOF => lea edi,[esi+eax-1]
mov ebx,eax
and eax,00000000000000000000000000000111b
shr eax,1 ; to eliminate impair values
and ebx,11111111111111111111111111111000b

; test ebx,ebx ; no dword ? (useless coz the previous and fix the zero flag)
jz Label2

Label1: sub edi,4 ; big string algo
mov edx,DWORD PTR [edi]
mov ecx,DWORD PTR [esi]
bswap ecx
bswap edx
mov DWORD PTR [esi],edx
mov DWORD PTR [edi],ecx
add esi,4
sub ebx,8
jnz Label1

Label2: test eax,eax ; no byte ?
jz Label4

Label3: sub edi,1 ; small string algo
mov dh,BYTE PTR [edi]
mov dl,BYTE PTR [esi]
mov BYTE PTR [esi],dh
mov BYTE PTR [edi],dl
add esi,1
sub eax,1
jnz Label3
Label4:
pop edi
pop esi
pop ebx
ret
AnotherStringReverse ENDP


Ratch

NightWare,

Quote
fisrt of all, you haven't corrected anything ! coz my code only make ONE LOOP ! if you want a proof, make a speed test (mine should be slower than your corrected version in this case... isn't it ?   and it's not the case...). if you want another proof, the string CAN'T be reversed correctly, if i made those "extra" loops... 

     One does not determine whether code is correct by how fast it executes.  Results are what count.  Would some neutral 3rd party test FastReverseString, preferably with a debugger, and settle this matter....Pleeezz?

     Your latest entry, AnotherStringReverse does work correctly.  It has two loops instead of one, and its main loop is one instruction longer compared to REVBSTR.  That is sure to make it just a bit slower.  But it wqrks!  Ratch

Larry Hammick

        mov edx,DWORD PTR [edi]
mov ecx,DWORD PTR [esi]
bswap ecx
bswap edx
mov DWORD PTR [esi],edx
mov DWORD PTR [edi],ecx

could instead be
        mov edx,[edi]
bswap edx
xchg edx,[esi]
bswap edx
mov [edi],edx


lingo

 :lol
Please terminate any high-priority tasks and press ENTER to begin.

Input string ->abcdefghijklmnopqrstuvwxyzabc

RevStr - lostcauz  : 34 clocks. Result: cbazyxwvutsrqponmlkjihgfedcba
RevStr - lingo     : 26 clocks. Result: cbazyxwvutsrqponmlkjihgfedcba
RevStr - Ratch     : 47 clocks. Result: cbazyxwvutsrqponmlkjihgfedcba
RevStr - NightWare : 49 clocks. Result: cbazyxwvutsrqponmlkjihgfedcba

Press ENTER to exit...



[attachment deleted by admin]

Ratch

#29
Larry Hammick,
    Your suggestion would cut on line of code from the loop, BUT the XCHG instruction is known for being "expensive" in terms of CPU time.  That is because a "lock" is implemented whenever a memory location is referenced with XCHG.  For that reason, most programmers shun that instruction.  Also the code causes 4 read/write stalls to occur within the loop.  That is, a register or memory location is changed in one instruction and read or written again in the next instruction.  The CPU has got to wait/stall until the register/memory location is updated before it can proceed to the next instruction. That is why a lot of programmers try to interleave their instructions. Ratch