Fast string append algo in PowerBASIC.

Started by hutch--, December 20, 2009, 05:43:42 PM

Previous topic - Next topic

hutch--

Basic is very flexible in terms of string manipulation but fundamental to the deisgn of the language specification is the method that it uses to append data onto the end of a string which works fine on small strings but becomes progressively slower as the string gets longer. Its easy enough to get around it at low level by allocating a big enough buffer to start with and writing to the end of it but when you are bashing around large amounts of high level code this can be a bit of a hassle.

Here is an algo that maintains most of the ease of basic string handling but with enough adjustment so you can trade off initial buffer size, realloc size and speed.


#IF 0  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    ---------------------------------------

    buffer$ = ""
    ' buffer$ = nul$(1024*1024*128)

    addin$ = "12345678901234567890123456789012345678901234567890"+_
             "12345678901234567890123456789012345678901234567890" ' 100 characters

    cloc = 0    ' start at beginning of buffer
    lpcn = 0    ' zero the loop counter

    Do
      cloc = repappend(buffer$,addon$,cloc,1024*1024)
      ! add lpcn, 1
    Loop while lpcn < 1000000               ' 1 million appends at 100 bytes each

    buffer$ = left$(buffer$,cloc)           ' trim off extra memory

    ---------------------------------------

    NOTES

      The algorithm addresses a fundamental problem with sequential string appends in
      basic, the reallocation and copy of memory each time the memory is reallocated.

      This algorithm is at it fastest when the original buffer to receive the appended
      data is large enough to hold it all without any reallocation of memory. It becomes
      progressively slower as the buffer size is decreased as more reallocations must be
      done. This is adjustable with the algorithm in two (2) ways.

      1. The larger the initial buffer is, the less times the memory is reallocated
      2. The larger the extra buffer size is the less times the memory is reallocated

      When the append string operation is completed, trim off any excess memory using
      the last return value from the function using the LEFT$() standard basic string
      function.

#ENDIF ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

FUNCTION repappend(src$,addit$,ByVal cloc as DWORD,ByVal iextra as DWORD) as DWORD

    #REGISTER NONE

  ' ------------------------------------------------------
  ' src$        the main buffer to append extra data to.
  ' addit$      the byte data to append to the main buffer
  ' cloc        current location pointer
  ' iextra      size of increase to buffer, min 1 meg
  ' ------------------------------------------------------

    LOCAL pstr as DWORD

    pstr = StrPtr(src$)
    ! mov esi, pstr

    pstr = StrPtr(addit$)
    ! mov edi, pstr

    ! mov ebx, [esi-4]          ' stored buffer length
    ! mov eax, cloc
    ! add eax, [edi-4]          ' write location counter AND addit$ length to EAX

    ! cmp ebx, eax              ' test if combined length greater than source buffer size
    ! jg nxt1

    ! cmp iextra, 1024*1024     ' if realloc, check buffer size
    ! jge nxt0
    ! mov iextra, 1024*1024     ' set a minimum extra buffer size

  nxt0:
      extra$ = nul$(iextra)     ' allocate a buffer of "iextra" size
      src$ = src$ + extra$      ' realloc source string size
      pstr = StrPtr(src$)
    ! mov esi, pstr

  nxt1:
    ! mov ecx, 1
    ! or eax, -1
    ! add esi, cloc             ' add current location pointer for next copy position

  ' -----------
  ' unroll by 2
  ' -----------
  lbl0:
    ! add eax, ecx
    ! movzx edx, BYTE PTR [edi+eax]
    ! mov [esi+eax], dl
    ! test edx, edx
    ! jz lbl1                   ' exit on written terminator

    ! add eax, ecx
    ! movzx ebx, BYTE PTR [edi+eax]
    ! mov [esi+eax], bl
    ! test ebx, ebx
    ! jnz lbl0                  ' exit on written terminator

  lbl1:
    ! add eax, cloc             ' location
    ! mov FUNCTION, eax         ' return updated offset pointer

END FUNCTION

' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php