To the Ineffable All,
I have tweaked my non-MMX/SSE version of STRLEN to speed it up. It now has 5 instructions within its loop instead of 6 that it had before. If an 8-bit ASCII count is not needed, a line of code can be removed so only 4 instructions are in the loop. In addition, it aligns itself so that its primary loop reads the string on a DWORD boundary. And finally, I believe it is "bullet proof" with regard to reading outside its designated storage limits, as long as the zero terminating character is within the limits. Enjoy, Ratch
;-------------------------------------------------------------------------------
;*******************************************************************************
; STRLEN: Calculates the length of a ASCII string terminated by a zero byte.*
; *
; Called by: PUSH <address of string> *
; CALL STRLEN *
; *
; Returns: EAX=String Length--Length does NOT include zero byte terminator. *
; *
; Notes: This subroutine conforms with WIN32 conventions regarding *
; register preservation and stack balancing. *
; *
; This subroutine has a switch called Bit8, which can be set to *
; FALSE. This will speed up the subroutine, but then will only *
; count 7-bit ASCII characters. *
; *
; Coder: Ratch *
;*******************************************************************************
Bit8 EQU TRUE ;TRUE for 8-bit ASCII count, FALSE for 7-bit count
STRLEN: ;it all begins here
MOV EAX,[ESP+DWORD] ;address of string
.WHILE TRUE ;check DWORD alignment
TEST AL,DWORD-1 ;is DWORD aligned
.BREAK .IF ZERO? ;yes, DWORD aligned
TEST BYTE PTR [EAX],0FFH ;not aligned, check for zero byte
JZ @F ;jmp if end of string
INC EAX ;prepare to check next byte
.ENDW ;around the horn
MOV ECX,080808080H ;sieve mask
.REPEAT ;searching string ....
MOV EDX,[EAX] ;next 4 byte gulp (DWORD)
IF Bit8 ;toggle 7-bit or 8-bit ASCII
AND EDX,07F7F7F7FH ;mask out bit 8 of byte
ENDIF
ADD EAX,DWORD ;EAX=character pointer
SUB EDX,01010101H ;make those zero bytes shine
AND EDX,ECX ;sieve out zero bytes
.UNTIL !ZERO? ;check the next DWORD
;we now know the current DWORD contains a zero byte
SUB EAX,DWORD+BYTE ;compensate for entry additions into loops
.REPEAT ;searching current DWORD ...
INC EAX ;increase byte count if no carry from shift
ROR EDX,8 ;bring leftmost bit of each byte into carry flag
.UNTIL CARRY? ;if carry flag set, byte is zero
@@:
SUB EAX,[ESP+DWORD] ;subtract beginning of string to get length
RET DWORD ;all done, and string length in EAX
;return to sender and balance the stack
;-------------------------------------------------------------------------------
Ratch,
Isn't this a rehash of what we already covered this in the szLen thread, ad nausium? Also back in that thread, I suggested a super optimization that you are almost doing in your routine above.
The idea is this: search 7-bit ascii, since it's faster than 8-bit. But when you find a "zero", check it for bit7=1: 1=>resume the search for 8-bit ascii, 0=>you are done. In other words use 7-bit search as far as you can take it, then ride the rest of the way using 8-bit search as needed. In most text, the 8-bit part will never be needed, but when required, it covers 8-bit as well--the best of both worlds...
Codewarp,
Quote
Isn't this a rehash of what we already covered this in the szLen thread, ad nausium?
No, it is an incremental improvement. Sometimes small things become great through several small changes instead one big change.
If one has to reexamine the byte for eight bits, that is going to cost time, but perhaps it is the only way to fix one problem. STRLEN cannot detect 080H. I am working to find a solution. Ratch
OK, here is my final version of STRLEN, unless someone finds a bug. It can now detect the obscure 8-bit byte 080H. It does this by checking the byte for 080H at the end of the subroutine, and returning to the beginning of the subroutine if that value is detected. If a lot of 080H bytes are present in the string, a performance penalty will be incurred. Ratch
;-------------------------------------------------------------------------------
;*******************************************************************************
; STRLEN: Calculates the length of a ASCII string terminated by a zero byte.*
; *
; Called by: PUSH <address of string> *
; CALL STRLEN *
; *
; Returns: EAX=String Length--Length does NOT include zero byte terminator. *
; *
; Notes: This subroutine conforms with WIN32 conventions regarding *
; register preservation and stack balancing. *
; *
; This subroutine has a switch called Bit8, which can be set to *
; FALSE. This will speed up the subroutine, but then will only *
; count 7-bit ASCII characters correctly. *
; *
; Coder: Ratch *
;*******************************************************************************
Bit8 EQU TRUE ;TRUE for 8-bit ASCII count, FALSE for 7-bit count
STRLEN: ;it all begins here
MOV EAX,[ESP+DWORD] ;address of string
MOV ECX,080808080H ;sieve mask
.WHILE TRUE ;check DWORD alignment
TEST AL,DWORD-1 ;is DWORD aligned
.BREAK .IF ZERO? ;yes, DWORD aligned
TEST BYTE PTR [EAX],0FFH ;not aligned, check for zero byte
JZ @F ;jmp if end of string
STL$1: ;return here to count 08H byte
INC EAX ;prepare to check next byte
.ENDW ;around the horn
.REPEAT ;searching string ....
MOV EDX,[EAX] ;next 4 byte gulp (DWORD)
IF Bit8 ;toggle 7-bit or 8-bit ASCII
AND EDX,07F7F7F7FH ;mask out bit 8 of byte
ENDIF
ADD EAX,DWORD ;EAX=character pointer
SUB EDX,01010101H ;make those zero bytes shine
AND EDX,ECX ;sieve out zero bytes
.UNTIL !ZERO? ;check the next DWORD
;we now know the current DWORD contains a zero byte
SUB EAX,DWORD+BYTE ;compensate for entry additions into loops
.REPEAT ;searching current DWORD ...
INC EAX ;increase byte count if no carry from shift
ROR EDX,8 ;bring leftmost bit of each byte into carry flag
.UNTIL CARRY? ;if carry flag set, byte is zero
IF Bit8 ;toggle 7-bit or 8-bit ASCII
TEST B[EAX],0FFH ;test if 080H is masquerading as a zero byte
JS STL$1 ;return to beginning of program if it is
ENDIF
@@:
SUB EAX,[ESP+DWORD] ;subtract beginning of string to get length
RET DWORD ;all done, and string length in EAX
;return to sender and balance the stack
;-------------------------------------------------------------------------------