News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Improved STRLEN

Started by Ratch, August 09, 2005, 03:36:57 AM

Previous topic - Next topic

Ratch

To the Ineffable All,
     I have tweaked my non-MMX/SSE version of STRLEN to speed it up.  It now has 5 instructions within its loop instead of 6 that it had before.  If an 8-bit ASCII count is not needed, a line of code can be removed so only 4 instructions are in the loop.  In addition, it aligns itself so that its primary loop reads the string on a DWORD boundary.  And finally, I believe it is "bullet proof" with regard to reading outside its designated storage limits, as long as the zero terminating character is within the limits.  Enjoy, Ratch


;-------------------------------------------------------------------------------
;*******************************************************************************
; STRLEN:    Calculates the length of a ASCII string terminated by a zero byte.*
;                                                                              *
; Called by: PUSH <address of string>                                          *
;            CALL STRLEN                                                       *
;                                                                              *
; Returns:   EAX=String Length--Length does NOT include zero byte terminator.  *
;                                                                              *
; Notes:     This subroutine conforms with WIN32 conventions regarding         *
;            register preservation and stack balancing.                        *
;                                                                              *
;            This subroutine has a switch called Bit8, which can be set to     *
;            FALSE.  This will speed up the subroutine, but then will only     *
;            count 7-bit ASCII characters.                                     *
;                                                                              *
; Coder:     Ratch                                                             *
;*******************************************************************************
Bit8 EQU TRUE               ;TRUE for 8-bit ASCII count, FALSE for 7-bit count

STRLEN:                     ;it all begins here
  MOV EAX,[ESP+DWORD]       ;address of string

  .WHILE TRUE               ;check DWORD alignment

   TEST AL,DWORD-1          ;is DWORD aligned
   .BREAK .IF ZERO?         ;yes, DWORD aligned

   TEST BYTE PTR [EAX],0FFH ;not aligned, check  for zero byte
   JZ @F                    ;jmp if end of string

   INC EAX                  ;prepare to check next byte
  .ENDW                     ;around the horn

  MOV ECX,080808080H        ;sieve mask

  .REPEAT                   ;searching string ....
    MOV EDX,[EAX]           ;next 4 byte gulp (DWORD)

    IF Bit8                 ;toggle 7-bit or 8-bit ASCII
    AND EDX,07F7F7F7FH      ;mask out bit 8 of byte
    ENDIF

    ADD EAX,DWORD           ;EAX=character pointer
    SUB EDX,01010101H       ;make those zero bytes shine
    AND EDX,ECX             ;sieve out zero bytes
  .UNTIL !ZERO?             ;check the next DWORD

                            ;we now know the current DWORD contains a zero byte
  SUB EAX,DWORD+BYTE        ;compensate for entry additions into loops

  .REPEAT                   ;searching current DWORD ...
    INC EAX                 ;increase byte count if no carry from shift
    ROR EDX,8               ;bring leftmost bit of each byte into carry flag
  .UNTIL CARRY?             ;if carry flag set, byte is zero
@@:
  SUB EAX,[ESP+DWORD]       ;subtract beginning of string to get length

  RET DWORD                 ;all done, and string length in EAX
                            ;return to sender and balance the stack
;-------------------------------------------------------------------------------

Codewarp

Ratch,

Isn't this a rehash of what we already covered this in the szLen thread, ad nausium?  Also back in that thread, I suggested a super optimization that you are almost doing in your routine above. 

The idea is this: search 7-bit ascii, since it's faster than 8-bit.  But when you find a "zero", check it for bit7=1: 1=>resume the search for 8-bit ascii, 0=>you are done.  In other words use 7-bit search as far as you can take it, then ride the rest of the way using 8-bit search as needed.  In most text, the 8-bit part will never be needed, but when required, it covers 8-bit as well--the best of both worlds...

Ratch

Codewarp,

Quote
Isn't this a rehash of what we already covered this in the szLen thread, ad nausium?

     No, it is an incremental improvement.  Sometimes small things become great through several small changes instead one big change. 

     If one has to reexamine the byte for eight bits, that is going to cost time, but perhaps it is the only way to fix one problem.  STRLEN cannot detect 080H.  I am working to find a solution.  Ratch

Ratch

     OK, here is my final version of STRLEN, unless someone finds a bug.  It can now detect the obscure 8-bit byte 080H.  It does this by checking the byte for 080H at the end of the subroutine, and returning to the beginning of the subroutine if that value is detected. If a lot of 080H bytes are present in the string, a performance penalty will be incurred.  Ratch


;-------------------------------------------------------------------------------
;*******************************************************************************
; STRLEN:    Calculates the length of a ASCII string terminated by a zero byte.*
;                                                                              *
; Called by: PUSH <address of string>                                          *
;            CALL STRLEN                                                       *
;                                                                              *
; Returns:   EAX=String Length--Length does NOT include zero byte terminator.  *
;                                                                              *
; Notes:     This subroutine conforms with WIN32 conventions regarding         *
;            register preservation and stack balancing.                        *
;                                                                              *
;            This subroutine has a switch called Bit8, which can be set to     *
;            FALSE.  This will speed up the subroutine, but then will only     *
;            count 7-bit ASCII characters correctly.                           *
;                                                                              *
; Coder:     Ratch                                                             *
;*******************************************************************************
Bit8 EQU TRUE                ;TRUE for 8-bit ASCII count, FALSE for 7-bit count

STRLEN:                      ;it all begins here
  MOV EAX,[ESP+DWORD]        ;address of string
  MOV ECX,080808080H         ;sieve mask

  .WHILE TRUE                ;check DWORD alignment
    TEST AL,DWORD-1          ;is DWORD aligned
    .BREAK .IF ZERO?         ;yes, DWORD aligned

    TEST BYTE PTR [EAX],0FFH ;not aligned, check for zero byte
    JZ @F                    ;jmp if end of string

STL$1:                       ;return here to count 08H byte
    INC EAX                  ;prepare to check next byte
  .ENDW                      ;around the horn

  .REPEAT                    ;searching string ....
    MOV EDX,[EAX]            ;next 4 byte gulp (DWORD)

    IF Bit8                  ;toggle 7-bit or 8-bit ASCII
     AND EDX,07F7F7F7FH      ;mask out bit 8 of byte
    ENDIF

    ADD EAX,DWORD            ;EAX=character pointer
    SUB EDX,01010101H        ;make those zero bytes shine
    AND EDX,ECX              ;sieve out zero bytes
  .UNTIL !ZERO?              ;check the next DWORD

                             ;we now know the current DWORD contains a zero byte
  SUB EAX,DWORD+BYTE         ;compensate for entry additions into loops

  .REPEAT                    ;searching current DWORD ...
    INC EAX                  ;increase byte count if no carry from shift
    ROR EDX,8                ;bring leftmost bit of each byte into carry flag
  .UNTIL CARRY?              ;if carry flag set, byte is zero

  IF Bit8                    ;toggle 7-bit or 8-bit ASCII
   TEST B[EAX],0FFH          ;test if 080H is masquerading as a zero byte
   JS STL$1                  ;return to beginning of program if it is
  ENDIF

@@:
  SUB EAX,[ESP+DWORD]        ;subtract beginning of string to get length

  RET DWORD                  ;all done, and string length in EAX
                             ;return to sender and balance the stack
;-------------------------------------------------------------------------------