The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: brethren on November 29, 2010, 03:30:39 PM

Title: Optimizing string procedures
Post by: brethren on November 29, 2010, 03:30:39 PM
I'm looking for a fast algorithm for searching a string for a single character, does anyone have any examples?

for example when searching a string for the null terminator you can search the string in dword increments and use an algorithm like this one.
After running the algo y contains 0 until a null is encountered (nulls are converted to 80h, perfect for using bsf)

    x= DWORD to examine
    y=(x & 0x7F7F7F7F) + 0x7F7F7F7F;
    y=~(y | x | 0x7F7F7F7F);


i'm looking for something similar but for arbitrary characters. Or some source examples for a fast StrChr routine would be great
Title: Re: Optimizing string procedures
Post by: dedndave on November 29, 2010, 07:08:29 PM
not sure that algo is all that fast  ;)

in any case, you are now looking for a variable, rather than a constant
and, you have to look for null, too - so you don't go past the end of the string
Title: Re: Optimizing string procedures
Post by: dedndave on November 29, 2010, 07:19:27 PM
looking again - i guess that algo does have some merit   :P

well - use the same algo
if you are looking for the first space, XOR EAX,20202020h - then all spaces are nulls  :bg
Title: Re: Optimizing string procedures
Post by: brethren on November 29, 2010, 07:44:59 PM
thanks dave :wink

Title: Re: Optimizing string procedures
Post by: brethren on November 29, 2010, 07:52:01 PM
yes using a 32 bit mask is the way to go

<SNIP! LOGIC ERRORS>
its just a quickly thought out algo are there any errors you can see?
Title: Re: Optimizing string procedures
Post by: Tight_Coder_Ex on November 29, 2010, 07:59:17 PM
Eventhough SCAS B or D use 15 & 13 cycles respectively combined with REPNZ, would these be a viable alternative as pointer adjustment, character comparision and counter evaluation are all done with one instruction.  This assumes of course the purpose of the algo is to return a pointer.
Title: Re: Optimizing string procedures
Post by: brethren on November 29, 2010, 08:12:01 PM
i want the algorith to work just like C's strchr (just faster :P) so a pointer is what i need for a return value or NULL pointer if the char is not found. using the string instructions you'd need to scan a byte at a time (i think?) and thats what slows these type of algos down
Title: Re: Optimizing string procedures
Post by: jj2007 on November 29, 2010, 09:03:13 PM
By far the fastest option is SSE2. Search the forum for pcmpeqb. Below the innermost loop of a stringlen algo.

@@: add edx, 16 ; next chunk, please
pcmpeqb xmm0, [edx] ; any nullbytes in there?
pmovmskb eax, xmm0 ; show them in eax
test eax, eax
jz @B

Title: Re: Optimizing string procedures
Post by: brethren on November 30, 2010, 12:40:58 AM
thanks for the hint jochen but i would rather stick with plain vanilla asm. that sse makes my eyes bleed :eek
Title: Re: Optimizing string procedures
Post by: jj2007 on November 30, 2010, 01:20:21 AM
Quote from: brethren on November 30, 2010, 12:40:58 AM
thanks for the hint jochen but i would rather stick with plain vanilla asm. that sse makes my eyes bleed :eek

I also needed some time to get used to it :bg
Title: Re: Optimizing string procedures
Post by: dedndave on November 30, 2010, 01:22:26 AM
you practically have it written in your original post   :bg
write that algo to test for null
copy/paste and stick XOR in front of it to test for the search char
Title: Re: Optimizing string procedures
Post by: hutch-- on November 30, 2010, 01:43:28 AM
Somewhere I have seen a variation of Agner Fog's StrLen algo that could be used to find characters other than ascii 0.
Title: Re: Optimizing string procedures
Post by: MichaelW on November 30, 2010, 08:01:08 AM
Quote from: hutch-- on November 30, 2010, 01:43:28 AM
Somewhere I have seen a variation of Agner Fog's StrLen algo that could be used to find characters other than ascii 0.

I didn't think to do a search of the forum before I did this quick modification of Agner Fog's code. I had no time to do a through test (and after I posted I tested a no-match and discovered that the code faults, but no time to fix it).

;==============================================================================
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm
;==============================================================================
    .data
      str1  db "my other brother darryl x",0
      char  dd "x"
    .code
;==============================================================================

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

strchr PROC pstring:DWORD,dwchar:DWORD
    push ebx                    ; ebx must be saved

    push esi
    ;--------------------------------------
    ; Replicate search character into esi.
    ;--------------------------------------
    mov eax, 01010101h
    mov ecx, [esp+16]
    mul ecx
    mov esi, eax

    mov ecx, [esp+12]           ; get pointer to string
    mov eax, ecx                ; copy pointer
    and ecx, 3                  ; lower 2 bits, check alignment
    jz L2                       ; s is aligned by 4. Go to loop
    and eax, -4                 ; align pointer by 4
    mov ebx, [eax]              ; read from preceding boundary
    shl ecx, 3                  ; *8 = displacement in bits
    mov edx, -1
    shl edx, cl                 ; make byte mask
    not edx                     ; mask = 0FFH for false bytes
    or ebx, edx                 ; mask out false bytes
    ; check first four bytes for zero

    xor ebx, esi

    lea ecx, [ebx-01010101H]    ; subtract 1 from each byte
    not ebx                     ; invert all bytes
    and ecx, ebx                ; and these two
    and ecx, 80808080H          ; test all sign bits
    jnz L3                      ; zero-byte found
    ; Main loop, read 4 bytes aligned
    L1: add eax, 4              ; increment pointer
    L2: mov ebx, [eax]          ; read 4 bytes of string

    xor ebx, esi

    lea ecx, [ebx-01010101H]    ; subtract 1 from each byte
    not ebx                     ; invert all bytes
    and ecx, ebx                ; and these two
    and ecx, 80808080H          ; test all sign bits
    jz L1                       ; no zero bytes, continue loop
    L3: bsf ecx, ecx            ; find right-most 1-bit
    shr ecx, 3                  ; divide by 8 = byte index

    ;----------------------------------
    ; Return pointer instead of index.
    ;----------------------------------
    ;sub eax, [esp+12]          ; subtract start address

    add eax, ecx                ; add index to byte

    pop esi

    pop ebx                     ; restore ebx
    ret 8                       ; return value in eax
strchr ENDP

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;==============================================================================
start:
;==============================================================================
    print str$(ADDR str1),13,10

    invoke crt_strchr, ADDR str1, char
    ;movzx edx, BYTE PTR [eax]
    ;print str$(edx),13,10
    print str$(eax),13,10

    invoke strchr, ADDR str1, char
    ;movzx edx, BYTE PTR [eax]
    ;print str$(edx),13,10
    print str$(eax),13,10,13,10

    invoke Sleep, 3000

    counter_begin 1000, HIGH_PRIORITY_CLASS
    counter_end
    print str$(eax),13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      invoke crt_strchr, ADDR str1, char
    counter_end
    print str$(eax),13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      invoke strchr, ADDR str1, char
    counter_end
    print str$(eax),13,10

    inkey "Press any key to exit..."
    exit
;==============================================================================
end start


Running on a P3:

0
87
58

Title: Re: Optimizing string procedures
Post by: brethren on November 30, 2010, 01:51:19 PM
thanks everybody :U

i'm thinking of use this algo to search for the character. its benefit is that i won't have to test individual character in the search loop
.data
buf BYTE "asm", 0

.code
main PROC
mov ebx, 73737373h ;mask to search for 's'
mov ecx, DWORD PTR [buf] ;move 4 chars into ECX

mov edi, 7EFEFEFFh
xor ecx, ebx
add edi, ecx
xor ecx, 0FFFFFFFFh
xor ecx, edi
and ecx, 81010100h
;ECX=0 if char is not found, not-zero if found (use jz, jnz)

exit
main ENDP
END main


ps hutch, thanks i'll have a look for the agner fog modified strlen algorithm
Title: Re: Optimizing string procedures
Post by: brethren on November 30, 2010, 03:02:41 PM
Quote from: dedndave on November 30, 2010, 01:22:26 AM
you practically have it written in your original post   :bg
write that algo to test for null
copy/paste and stick XOR in front of it to test for the search char

thank you dave :thumbu

i thought that algo was just for testing for null but after reading your post i realise it is a general purpose algo for finding any character
you just do what you said use xor with a mask for the character you want to find(if you don't use a mask it'll find the null). now i can use that same algo to find the char and the null :8)
Title: Re: Optimizing string procedures
Post by: dedndave on November 30, 2010, 03:32:37 PM
you could add another section to test for opposite case, too   :U
Title: Re: Optimizing string procedures
Post by: dedndave on March 23, 2011, 12:39:24 AM
later....
here is a variation on the theme
it may not be the fastest way, but it is ~60% faster than a rep scasb version
it uses all registers, preserves none, and passes the address in a register
it could easily be made into an INVOKE'able function, though
in my application, Form Feed is used to clear screen, so it was included
StrScan PROC

;scan a line of text and return the number of characters preceeding the first "special" character
;special characters include 0,9,10,12,13
;
;Call With: ESI = offset of string
;
;  Returns: ECX = bytes preceeding special char
;           EDX = special char identifier
;                 0 (12) form feed
;                 4 (0) null terminator
;                 5 (9) tab
;                 6 (13) carriage return
;                 7 (10) line feed
;           ZF  = set if ECX = 0, otherwise cleared

        or      ecx,-1
        mov     ebx,7F7F7F7Fh
        mov     edi,9090909h
        xor     ebp,ebp

sScan0: push    edi
        mov     eax,[esi]
        push    ecx
        dec     ebp
        push    5050505h
        mov     ecx,5
        jmp short sScan2

sScan1: pop     edx
        xor     eax,edi
        xor     edi,edx
        ror     ebp,1
        sub     edx,2020202h
        and     edi,7070707h
        push    edx

sScan2: mov     edx,eax
        and     edx,ebx
        add     edx,ebx
        or      edx,eax
        or      edx,ebx
        and     ebp,edx
        dec     ecx
        jnz     sScan1

        pop     edx
        pop     ecx
        add     esi,4
        inc     ecx
        inc     ebp
        pop     edi
        jz      sScan0

        bsf     eax,ebp
        shl     ecx,2
        lea     edx,[eax+1]
        shr     eax,3
        and     edx,7
        add     ecx,eax
        ret

StrScan ENDP
Title: Re: Optimizing string procedures
Post by: qWord on March 23, 2011, 01:43:52 AM
hi dedndave,
whats about using SSEx?
Quotewhat_ever proc psz: ptr CHAR
   
    .data
        align 16
        _9 db 16 dup (9)
        _10 db 16 dup (10)
        _12 db 16 dup (12)
        _13 db 16 dup (13)
    .code
   
    pxor xmm5,xmm5
    mov edx,psz
    xor ecx,ecx
@@:      
    movdqu xmm0,OWORD ptr [edx+ecx]
    movdqa xmm1,xmm0
    movdqa xmm2,xmm0
    movdqa xmm3,xmm0
    movdqa xmm4,xmm0
    pcmpeqb xmm0,OWORD ptr _9
    pcmpeqb xmm1,OWORD ptr _10
    pcmpeqb xmm2,OWORD ptr _12
    pcmpeqb xmm3,OWORD ptr _13
    pcmpeqb xmm4,xmm5
    por xmm0,xmm1
    por xmm2,xmm3
    por xmm0,xmm2
    por xmm0,xmm4
    pmovmskb eax,xmm0
    test eax,eax
    jnz @F
    lea ecx,[ecx+16]
    jz @B
@@:
    bsf eax,eax
    lea eax,[eax+ecx]
    ret
   
what_ever endp
Title: Re: Optimizing string procedures
Post by: dedndave on March 23, 2011, 02:13:31 AM
thanks qWord
but, these are un-aligned strings to be placed in a display buffer
the special characters are "stopping points", where the string is terminated, or the char is otherwise expanded
once an expanded char has been actioned upon, you pick up where you left off in the input string
there is no practical way to align them
Title: Re: Optimizing string procedures
Post by: jj2007 on March 23, 2011, 02:18:08 AM
Quote from: dedndave on March 23, 2011, 02:13:31 AM
thanks qWord
but, these are un-aligned strings to be placed in a display buffer

qWord's code does not require alignment.
Title: Re: Optimizing string procedures
Post by: dedndave on March 23, 2011, 03:59:34 AM
oh - cool
i will give it a try
maybe i can get my feet wet with SSE and try to understand it - lol
Title: Re: Optimizing string procedures
Post by: lingo on March 24, 2011, 12:26:22 AM
If someone needs a speed, he updates to faster CPU and AFTER that is OK to try MASM optimization...
Only the idiots optimized for archaic CPU.. :lol   
Title: Re: Optimizing string procedures
Post by: dedndave on March 24, 2011, 01:26:35 AM
yah - you got a new machine - now you are gonna harp
who gives a shit what you think, lingo...
....absolutely noone
Title: Re: Optimizing string procedures
Post by: lingo on March 24, 2011, 03:35:05 AM
The forum's idiots as a dedndave who say "no" or
"....absolutely noone" to be so kind to read and compare
times of the different CPUs of the same algos here (http://www.masm32.com/board/index.php?topic=14626.135)...

For example:

Prescott P4:
110 cycles for agner fog StrLen-masmlib

i7-2600K:
28 cycles for agner fog StrLen-masmlib

Is it possible to compensate the difference with MASM optimization?  :lol
Title: Re: Optimizing string procedures
Post by: hutch-- on March 24, 2011, 05:42:51 AM
Guys,

Try and keep the agro out of here, we have other subforums for "debates". I doubt there is any argument that a late i7 is faster than a Prescott PIV but that does not help anyone who owns or uses an old timer and speed is relative to the box running it, not what someone else owns. I use my i7 Win64 box to watch TV and movies, I would rather develop on my Core2 quad.  :P
Title: Re: Optimizing string procedures
Post by: dedndave on March 24, 2011, 12:21:35 PM
good point, Hutch
if you optimize for a brand new processor, your code will be fast on only a small portion of computers that are in use
whereas, if you optimize on a processor that is 5 to 10 years old, it will run well on the vast majority of machines
it will run well on newer machines, simply because they are fast anyways - lol

as for lingo - he shits on everything i do - which - i don't care
but - more than once, he has taken all the fun out of a thread with his assholeism
Title: Re: Optimizing string procedures
Post by: redskull on March 24, 2011, 02:13:56 PM
For what it's worth, optimization for a P4 is a whole different bag of tricks than almost every other processor.  In many cases, 'optimized' code actually runs slower on a P4 than non-optimized versions, and vice-versa.  Only an "idiot" would compare the same algorithm on a PIV to a non-PIV; they are apples and oranges.

-r
Title: Re: Optimizing string procedures
Post by: lingo on March 24, 2011, 02:16:04 PM
"but that does not help anyone who owns or uses an old timer and speed is relative to the box running it, "

I'm not so crazy to keep at home ten different configurations with archaic
CPUs just to create ten different optimized versions of the same algo... :lol
Title: Re: Optimizing string procedures
Post by: donkey on March 24, 2011, 02:25:17 PM
Quote from: lingo on March 24, 2011, 02:16:04 PM
"but that does not help anyone who owns or uses an old timer and speed is relative to the box running it, "

I'm not so crazy to keep at home ten different configurations with archaic
CPUs just to create ten different optimized versions of the same algo... :lol

Except to get a rise out of Hutch, I gave up optimizing code a long time ago. The work that you might spend 20 minutes or more on could save a couple of milliseconds of runtime and be negated by things like the ADD EAX,1 / INC EAX type changes from processor to processor. That said there are some things that will always give you a boost. Any way that you can find to process more data simultaneously will always help though I would have to see timings on short strings for the XMM version of lstrlen above it looks like it would do well with large documents so in that case its a practical exercise. For X64 programming, the instruction sizes can be huge depending on addressing modes and the FASTCALL convention is bloated to say the least so I have lately been looking at reducing code size in order to make up for it, not so much for speed but for overall compactness. Compactness doesn't make a lot of difference especially considering that my main PC has 8GB of memory and 1 TB of hard disk space but I like it so I do it anyway.
Title: Re: Optimizing string procedures
Post by: lingo on March 24, 2011, 02:38:34 PM
 "For what it's worth, optimization for a P4 is a whole different bag of tricks than almost every other processor.
In many cases, 'optimized' code actually runs slower on a P4 than non-optimized versions, and vice-versa.."


Many thanks to our guru of code optimization...you just open my eyes but I can't see your P4 Prescott code here (http://www.asmcommunity.net/board/index.php?topic=21565.msg162885#msg162885) (see my replay #11) :lol

Title: Re: Optimizing string procedures
Post by: dedndave on March 24, 2011, 03:20:51 PM
i don't agree with Red's statement - comparing them is how we learn what optimal is

at any rate - what we really need is a way to create a thread in here without any "advice" from lingo
he used to provide some value to the assembler community
now, he just repeats "i am the only person on the planet that knows how to write code",
in spite of the fact that his code often fails to work properly and/or is buggish
it's not as though we can simply go to a different forum - he seems intent on polluting all of them
Title: Re: Optimizing string procedures
Post by: lingo on March 24, 2011, 04:24:47 PM
"what we really need is"
You are not the Queen to say "WE"!

"he just repeats "i am the only person on the planet that knows how to write code"

May be you are the only person on the MASM forum that doesn't knows how to write asm code...

Example:
-   I'm here from the start of the forum (March 19, 2005) and I have just 575 posts. More of them are assembly code only!
-   You are here from January 24, 2009 without asm code in your 7840!!! posts...just blab, bla, bla, blah...ONLY!
               I understand that it is not so easy for you to live alone... but  so much garbage (7840 posts) and no shame...

"in spite of the fact that his code often fails to work properly and/or is buggish"

have you just one example?
Title: Re: Optimizing string procedures
Post by: donkey on March 24, 2011, 04:43:24 PM
Take it outside boys, flame wars are for the Colosseum.

(http://www.cordeoc.ca/EOC-images/flame-animation.gif)
Title: Re: Optimizing string procedures
Post by: dedndave on March 24, 2011, 07:14:38 PM
well - we were having a good time, until post #21 of this thread
then - the forum dies