I'm looking for a fast algorithm for searching a string for a single character, does anyone have any examples?
for example when searching a string for the null terminator you can search the string in dword increments and use an algorithm like this one.
After running the algo y contains 0 until a null is encountered (nulls are converted to 80h, perfect for using bsf)
x= DWORD to examine
y=(x & 0x7F7F7F7F) + 0x7F7F7F7F;
y=~(y | x | 0x7F7F7F7F);
i'm looking for something similar but for arbitrary characters. Or some source examples for a fast StrChr routine would be great
not sure that algo is all that fast ;)
in any case, you are now looking for a variable, rather than a constant
and, you have to look for null, too - so you don't go past the end of the string
looking again - i guess that algo does have some merit :P
well - use the same algo
if you are looking for the first space, XOR EAX,20202020h - then all spaces are nulls :bg
thanks dave :wink
yes using a 32 bit mask is the way to go
<SNIP! LOGIC ERRORS>
its just a quickly thought out algo are there any errors you can see?
Eventhough SCAS B or D use 15 & 13 cycles respectively combined with REPNZ, would these be a viable alternative as pointer adjustment, character comparision and counter evaluation are all done with one instruction. This assumes of course the purpose of the algo is to return a pointer.
i want the algorith to work just like C's strchr (just faster :P) so a pointer is what i need for a return value or NULL pointer if the char is not found. using the string instructions you'd need to scan a byte at a time (i think?) and thats what slows these type of algos down
By far the fastest option is SSE2. Search the forum for pcmpeqb. Below the innermost loop of a stringlen algo.
@@: add edx, 16 ; next chunk, please
pcmpeqb xmm0, [edx] ; any nullbytes in there?
pmovmskb eax, xmm0 ; show them in eax
test eax, eax
jz @B
thanks for the hint jochen but i would rather stick with plain vanilla asm. that sse makes my eyes bleed :eek
Quote from: brethren on November 30, 2010, 12:40:58 AM
thanks for the hint jochen but i would rather stick with plain vanilla asm. that sse makes my eyes bleed :eek
I also needed some time to get used to it :bg
you practically have it written in your original post :bg
write that algo to test for null
copy/paste and stick XOR in front of it to test for the search char
Somewhere I have seen a variation of Agner Fog's StrLen algo that could be used to find characters other than ascii 0.
Quote from: hutch-- on November 30, 2010, 01:43:28 AM
Somewhere I have seen a variation of Agner Fog's StrLen algo that could be used to find characters other than ascii 0.
I didn't think to do a search of the forum before I did this quick modification of Agner Fog's code. I had no time to do a through test (and after I posted I tested a no-match and discovered that the code faults, but no time to fix it).
;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
.data
str1 db "my other brother darryl x",0
char dd "x"
.code
;==============================================================================
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
strchr PROC pstring:DWORD,dwchar:DWORD
push ebx ; ebx must be saved
push esi
;--------------------------------------
; Replicate search character into esi.
;--------------------------------------
mov eax, 01010101h
mov ecx, [esp+16]
mul ecx
mov esi, eax
mov ecx, [esp+12] ; get pointer to string
mov eax, ecx ; copy pointer
and ecx, 3 ; lower 2 bits, check alignment
jz L2 ; s is aligned by 4. Go to loop
and eax, -4 ; align pointer by 4
mov ebx, [eax] ; read from preceding boundary
shl ecx, 3 ; *8 = displacement in bits
mov edx, -1
shl edx, cl ; make byte mask
not edx ; mask = 0FFH for false bytes
or ebx, edx ; mask out false bytes
; check first four bytes for zero
xor ebx, esi
lea ecx, [ebx-01010101H] ; subtract 1 from each byte
not ebx ; invert all bytes
and ecx, ebx ; and these two
and ecx, 80808080H ; test all sign bits
jnz L3 ; zero-byte found
; Main loop, read 4 bytes aligned
L1: add eax, 4 ; increment pointer
L2: mov ebx, [eax] ; read 4 bytes of string
xor ebx, esi
lea ecx, [ebx-01010101H] ; subtract 1 from each byte
not ebx ; invert all bytes
and ecx, ebx ; and these two
and ecx, 80808080H ; test all sign bits
jz L1 ; no zero bytes, continue loop
L3: bsf ecx, ecx ; find right-most 1-bit
shr ecx, 3 ; divide by 8 = byte index
;----------------------------------
; Return pointer instead of index.
;----------------------------------
;sub eax, [esp+12] ; subtract start address
add eax, ecx ; add index to byte
pop esi
pop ebx ; restore ebx
ret 8 ; return value in eax
strchr ENDP
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;==============================================================================
start:
;==============================================================================
print str$(ADDR str1),13,10
invoke crt_strchr, ADDR str1, char
;movzx edx, BYTE PTR [eax]
;print str$(edx),13,10
print str$(eax),13,10
invoke strchr, ADDR str1, char
;movzx edx, BYTE PTR [eax]
;print str$(edx),13,10
print str$(eax),13,10,13,10
invoke Sleep, 3000
counter_begin 1000, HIGH_PRIORITY_CLASS
counter_end
print str$(eax),13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
invoke crt_strchr, ADDR str1, char
counter_end
print str$(eax),13,10
counter_begin 1000, HIGH_PRIORITY_CLASS
invoke strchr, ADDR str1, char
counter_end
print str$(eax),13,10
inkey "Press any key to exit..."
exit
;==============================================================================
end start
Running on a P3:
0
87
58
thanks everybody :U
i'm thinking of use this algo to search for the character. its benefit is that i won't have to test individual character in the search loop
.data
buf BYTE "asm", 0
.code
main PROC
mov ebx, 73737373h ;mask to search for 's'
mov ecx, DWORD PTR [buf] ;move 4 chars into ECX
mov edi, 7EFEFEFFh
xor ecx, ebx
add edi, ecx
xor ecx, 0FFFFFFFFh
xor ecx, edi
and ecx, 81010100h
;ECX=0 if char is not found, not-zero if found (use jz, jnz)
exit
main ENDP
END main
ps hutch, thanks i'll have a look for the agner fog modified strlen algorithm
Quote from: dedndave on November 30, 2010, 01:22:26 AM
you practically have it written in your original post :bg
write that algo to test for null
copy/paste and stick XOR in front of it to test for the search char
thank you dave :thumbu
i thought that algo was just for testing for null but after reading your post i realise it is a general purpose algo for finding any character
you just do what you said use xor with a mask for the character you want to find(if you don't use a mask it'll find the null). now i can use that same algo to find the char and the null :8)
you could add another section to test for opposite case, too :U
later....
here is a variation on the theme
it may not be the fastest way, but it is ~60% faster than a rep scasb version
it uses all registers, preserves none, and passes the address in a register
it could easily be made into an INVOKE'able function, though
in my application, Form Feed is used to clear screen, so it was included
StrScan PROC
;scan a line of text and return the number of characters preceeding the first "special" character
;special characters include 0,9,10,12,13
;
;Call With: ESI = offset of string
;
; Returns: ECX = bytes preceeding special char
; EDX = special char identifier
; 0 (12) form feed
; 4 (0) null terminator
; 5 (9) tab
; 6 (13) carriage return
; 7 (10) line feed
; ZF = set if ECX = 0, otherwise cleared
or ecx,-1
mov ebx,7F7F7F7Fh
mov edi,9090909h
xor ebp,ebp
sScan0: push edi
mov eax,[esi]
push ecx
dec ebp
push 5050505h
mov ecx,5
jmp short sScan2
sScan1: pop edx
xor eax,edi
xor edi,edx
ror ebp,1
sub edx,2020202h
and edi,7070707h
push edx
sScan2: mov edx,eax
and edx,ebx
add edx,ebx
or edx,eax
or edx,ebx
and ebp,edx
dec ecx
jnz sScan1
pop edx
pop ecx
add esi,4
inc ecx
inc ebp
pop edi
jz sScan0
bsf eax,ebp
shl ecx,2
lea edx,[eax+1]
shr eax,3
and edx,7
add ecx,eax
ret
StrScan ENDP
hi dedndave,
whats about using SSEx?
Quotewhat_ever proc psz: ptr CHAR
.data
align 16
_9 db 16 dup (9)
_10 db 16 dup (10)
_12 db 16 dup (12)
_13 db 16 dup (13)
.code
pxor xmm5,xmm5
mov edx,psz
xor ecx,ecx
@@:
movdqu xmm0,OWORD ptr [edx+ecx]
movdqa xmm1,xmm0
movdqa xmm2,xmm0
movdqa xmm3,xmm0
movdqa xmm4,xmm0
pcmpeqb xmm0,OWORD ptr _9
pcmpeqb xmm1,OWORD ptr _10
pcmpeqb xmm2,OWORD ptr _12
pcmpeqb xmm3,OWORD ptr _13
pcmpeqb xmm4,xmm5
por xmm0,xmm1
por xmm2,xmm3
por xmm0,xmm2
por xmm0,xmm4
pmovmskb eax,xmm0
test eax,eax
jnz @F
lea ecx,[ecx+16]
jz @B
@@:
bsf eax,eax
lea eax,[eax+ecx]
ret
what_ever endp
thanks qWord
but, these are un-aligned strings to be placed in a display buffer
the special characters are "stopping points", where the string is terminated, or the char is otherwise expanded
once an expanded char has been actioned upon, you pick up where you left off in the input string
there is no practical way to align them
Quote from: dedndave on March 23, 2011, 02:13:31 AM
thanks qWord
but, these are un-aligned strings to be placed in a display buffer
qWord's code does not require alignment.
oh - cool
i will give it a try
maybe i can get my feet wet with SSE and try to understand it - lol
If someone needs a speed, he updates to faster CPU and AFTER that is OK to try MASM optimization...
Only the idiots optimized for archaic CPU.. :lol
yah - you got a new machine - now you are gonna harp
who gives a shit what you think, lingo...
....absolutely noone
The forum's idiots as a dedndave who say "no" or
"....absolutely noone" to be so kind to read and compare
times of the different CPUs of the same algos here (http://www.masm32.com/board/index.php?topic=14626.135)...
For example:
Prescott P4:
110 cycles for agner fog StrLen-masmlib
i7-2600K:
28 cycles for agner fog StrLen-masmlib
Is it possible to compensate the difference with MASM optimization? :lol
Guys,
Try and keep the agro out of here, we have other subforums for "debates". I doubt there is any argument that a late i7 is faster than a Prescott PIV but that does not help anyone who owns or uses an old timer and speed is relative to the box running it, not what someone else owns. I use my i7 Win64 box to watch TV and movies, I would rather develop on my Core2 quad. :P
good point, Hutch
if you optimize for a brand new processor, your code will be fast on only a small portion of computers that are in use
whereas, if you optimize on a processor that is 5 to 10 years old, it will run well on the vast majority of machines
it will run well on newer machines, simply because they are fast anyways - lol
as for lingo - he shits on everything i do - which - i don't care
but - more than once, he has taken all the fun out of a thread with his assholeism
For what it's worth, optimization for a P4 is a whole different bag of tricks than almost every other processor. In many cases, 'optimized' code actually runs slower on a P4 than non-optimized versions, and vice-versa. Only an "idiot" would compare the same algorithm on a PIV to a non-PIV; they are apples and oranges.
-r
"but that does not help anyone who owns or uses an old timer and speed is relative to the box running it, "
I'm not so crazy to keep at home ten different configurations with archaic
CPUs just to create ten different optimized versions of the same algo... :lol
Quote from: lingo on March 24, 2011, 02:16:04 PM
"but that does not help anyone who owns or uses an old timer and speed is relative to the box running it, "
I'm not so crazy to keep at home ten different configurations with archaic
CPUs just to create ten different optimized versions of the same algo... :lol
Except to get a rise out of Hutch, I gave up optimizing code a long time ago. The work that you might spend 20 minutes or more on could save a couple of milliseconds of runtime and be negated by things like the ADD EAX,1 / INC EAX type changes from processor to processor. That said there are some things that will always give you a boost. Any way that you can find to process more data simultaneously will always help though I would have to see timings on short strings for the XMM version of lstrlen above it looks like it would do well with large documents so in that case its a practical exercise. For X64 programming, the instruction sizes can be huge depending on addressing modes and the FASTCALL convention is bloated to say the least so I have lately been looking at reducing code size in order to make up for it, not so much for speed but for overall compactness. Compactness doesn't make a lot of difference especially considering that my main PC has 8GB of memory and 1 TB of hard disk space but I like it so I do it anyway.
"For what it's worth, optimization for a P4 is a whole different bag of tricks than almost every other processor.
In many cases, 'optimized' code actually runs slower on a P4 than non-optimized versions, and vice-versa.."
Many thanks to our guru of code optimization...you just open my eyes but I can't see your P4 Prescott code here (http://www.asmcommunity.net/board/index.php?topic=21565.msg162885#msg162885) (see my replay #11) :lol
i don't agree with Red's statement - comparing them is how we learn what optimal is
at any rate - what we really need is a way to create a thread in here without any "advice" from lingo
he used to provide some value to the assembler community
now, he just repeats "i am the only person on the planet that knows how to write code",
in spite of the fact that his code often fails to work properly and/or is buggish
it's not as though we can simply go to a different forum - he seems intent on polluting all of them
"what we really need is"
You are not the Queen to say "WE"!
"he just repeats "i am the only person on the planet that knows how to write code"
May be you are the only person on the MASM forum that doesn't knows how to write asm code...
Example:
- I'm here from the start of the forum (March 19, 2005) and I have just 575 posts. More of them are assembly code only!
- You are here from January 24, 2009 without asm code in your 7840!!! posts...just blab, bla, bla, blah...ONLY!
I understand that it is not so easy for you to live alone... but so much garbage (7840 posts) and no shame...
"in spite of the fact that his code often fails to work properly and/or is buggish"
have you just one example?
Take it outside boys, flame wars are for the Colosseum.
(http://www.cordeoc.ca/EOC-images/flame-animation.gif)
well - we were having a good time, until post #21 of this thread
then - the forum dies