News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

String Replace Procedure.

Started by hutch--, April 15, 2005, 06:54:59 AM

Previous topic - Next topic

hutch--

This algo is more or less hot off the press but its testing OK so far. It is a global replace of one substring with another substring in a source string with the results written to a user defined buffer. For replacements of the same length or shorter, memory allocation for the output buffer is simple but when the replacement is longer than the string to replace, a memory allocation strategy is required to handle the longer buffer.

What I have in mind is a word counter for the target word and once this count is done, its easy enough to work out how much extra memory is required for the buffer allocation.


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

align 4

replace proc src:DWORD,dst:DWORD,txt1:DWORD,txt2:DWORD

    LOCAL lsrc  :DWORD

    push ebx
    push esi
    push edi

    mov lsrc, len(src)          ; procedure call for src length
    sub lsrc, len(txt1)         ; procedure call for 1st text length

    mov esi, src
    add lsrc, esi               ; set exit condition
    mov ebx, txt1
    add lsrc, 1                 ; adjust to get last character
    mov edi, dst
    xor eax, eax                ; zero EAX
    sub esi, 1
    jmp rpst
  ; ----------------------------
  align 4
  pre:
    add esi, ecx                ; ecx = len of txt1, add it to ESI for next read
  align 4
  rpst:
    add esi, 1
    cmp lsrc, esi               ; test for exit condition
    jle rpout
    mov al, [esi]               ; load byte from source
    cmp al, [ebx]               ; test it agqainst 1st character in txt1
    je test_match
    mov [edi], al               ; write byte to destination
    add edi, 1
    jmp rpst
  ; ----------------------------
  align 4
  test_match:
    mov ecx, -1                 ; clear ECX to use as index
    mov edx, ebx                ; load txt1 address into EDX
  @@:
    add ecx, 1
    mov al, [edx]
    test al, al                 ; if you have got to the zero
    jz change_text              ; replace the text in the destination
    add edx, 1
    cmp [esi+ecx], al           ; keep testing character matches
    je @B
    mov al, [esi]               ; if text does not match
    mov [edi], al               ; write byte at ESI to destination
    add edi, 1
    jmp rpst
  ; ----------------------------
  align 4
  change_text:                  ; write txt2 to location of txt1 in destination
    mov edx, txt2
    sub ecx, 1
  @@:
    mov al, [edx]
    test al, al
    jz pre
    add edx, 1
    mov [edi], al
    add edi, 1
    jmp @B
  ; ----------------------------
  align 4
  rpout:                        ; write any last bytes and terminator
    mov ecx, -1
  @@:
    add ecx, 1
    mov al, [esi+ecx]
    mov [edi+ecx], al
    test al, al
    jnz @B

    pop edi
    pop esi
    pop ebx

    ret

replace endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

Here is the matching word or text count algo. The idea is to count the number of instances of the text that will be replaced if the replacement is  longer than the word being replaced so you can calculate the extra memory required above the original source length. It seems to be working OK and handles text at both the beginning and end of the source.


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

align 4

wcount proc src:DWORD,txt:DWORD

    push ebx
    push esi
    push edi

    xor edi, edi                ; zero EDI and use as counter
    mov edx, len(src)           ; procedure call for src length
    sub edx, len(txt)           ; procedure call for 1st text length

    mov esi, src                ; source in ESI
    add edx, esi                ; add src to exit position
    add edx, 1                  ; correct to get last word
    mov ebx, txt                ; text to count in EBX
    sub esi, 1
    jmp wcst

  pre:
    add edi, 1                  ; increment word count
  wcst:
    add esi, 1
    cmp edx, esi                ; test for exit condition
    jle wcout
    mov al, [esi]               ; load byte at ESI
    cmp al, [ebx]               ; test against 1st character in EBX
    jne wcst
    mov ecx, -1
  test_word:
    add ecx, 1
    cmp BYTE PTR [ebx+ecx], 0   ; if terminator is reached
    je pre                      ; jump back and increment counter
    mov al, [esi+ecx]
    cmp al, [ebx+ecx]
    jne wcst
    jmp test_word

  wcout:
    mov eax, edi

    pop edi
    pop esi
    pop ebx

    ret

wcount endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jibz

Regarding wcount, but might apply to the other one as well:

Have you tried loading al with the first byte at ebx before the loop?

The mov ecx, -1 .. should that have been 0 instead? It looks like it might check the first byte twice.

hutch--

Jibz,

Thanks for the comment, changing mov ecx, -1 to xor ecx, ecx workes in this context because the terminator is tested first. When I have laid out the branch before to test the match I tested for zero after the character comparisons and it would fail on a single character.

here is the latest version slightly gutted from the first and with your mod.


align 4

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

wcount proc src:DWORD,txt:DWORD

    push ebx
    push esi
    push edi

    mov edx, len([esp+16])      ; procedure call for src length
    sub edx, len([esp+20])      ; procedure call for 1st text length

    mov eax, -1
    mov esi, [esp+16]           ; source in ESI
    add edx, esi                ; add src to exit position
    xor ebx, ebx                ; clear EBX to prevent stall with BL
    add edx, 1                  ; correct to get last word
    mov edi, [esp+20]           ; text to count in EDI
    sub esi, 1

  pre:
    add eax, 1                  ; increment word count return value
  align 4
  wcst:
    add esi, 1
    cmp edx, esi                ; test for exit condition
    jle wcout
    mov bl, [esi]               ; load byte at ESI
    cmp bl, [edi]               ; test against 1st character in EDI
    jne wcst
    xor ecx, ecx
  align 4
  test_word:
    add ecx, 1
    cmp BYTE PTR [edi+ecx], 0   ; if terminator is reached
    je pre                      ; jump back and increment counter
    mov bl, [esi+ecx]           ; load byte at ESI
    cmp bl, [edi+ecx]           ; test against 1st character in EDI
    jne wcst                    ; exit if mismatch
    jmp test_word               ; else loop back
  wcout:

    pop edi
    pop esi
    pop ebx

    ret 8

wcount endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

The_Grey_Beast

Try using inc or dec instead of add reg, 1 or sub reg, 1.. And you can also recheck your code and move some instructions around to avoid some hazards (I don't know if there are any, but at least it was a suggestion).

hutch--

The_Grey_Beast,

ADD and SUB are faster on late hardware without being particularly slower on older hardware.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jimg

Hi Hutch-

I like the idea of these routines, so I started testing wcount for my own entertainment.

If the searched for text is zero length, the routine returns a 1.  I'm not sure if this is a problem, but thought I'd bring it to your attention.

I also ran some timing tests to see if I could eek out a cycle or two, again only for my own entertainment.

I could find no difference wether using a stack frame or not.  If anything, using the frame was very slightly faster???

For Athlon users only:    The first align should be an Align 16.  I found an eleven percent execution time difference in the worst spot the routine could land just using an Align 4.

Otherwise I could find no simple way to speed up the code.  Good job!

hutch--

Thanks Jim,

I will have a look at it a bit later before it gets released in the main library. Code alignment varies from box to box in hardware terms so I would not be surprised about variations here. I have seen instances on Intel hardware where a 16 byte aligned proc runs slower than a 4 so its no real joy to get a proc running at top speed on all hardware.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jimg

QuoteCode alignment varies from box to box in hardware terms so I would not be surprised about variations here. I have seen instances on Intel hardware where a 16 byte aligned proc runs slower than a 4 so its no real joy to get a proc running at top speed on all hardware.
Exactly.  In cases like that, some internal part of the proc needed to be aligned on a different 4 bytes, so a simple align 16 didn't line up the critical part optimally, while the align 4 accidentally lined up the critical part. As I mentioned elsewhere, I've found for the best alignment one has to do an Align 16 followed by inserting from zero to 15 bytes to get the best alignment, but the possible 31 extra bytes may not be really worth it except in heavily used code.  Additional internal alignment as you have done also will usually be required.
And also, it seems to me that the amd chips are more sensitive to alignment than intel chips, based on what I've read in the forum.

hutch--

Its easy enough to align all of the labels in a procedure including its entry to 16 bytes but its not always the fastest I have found. Ideally you set it up so that the alignment code is not executed but just sits beween the last jump and the next label.

Some of these factors sem to be more sensitive to the code executed prior to the procedure than entry or internal alignment and I have seen unaligned code run faster than aligned code in some instances. This is consistent across the two PIVs I use and while I have a late model Sempron, I have never tested it with algorithms as I use it as a scanner and printer box.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php