Hello!
I checked some examples and topics about it but none of them was showing what I exactly wanted. So I tried to write one by myself :)
Let's say there are 2 strings
1: "hi my name is deniz. did u like it?"
2: "deniz"
So what I am trying to achieve is:
hi my name is deniz. did u like it?
deniz
deniz
deniz
sliding like that until the match.
When I run the code below it always says "yes" :(
Btw, I've used cpu registers as much as possible for better performance (not even sure if the usages were proper thougjh). However I still can't be sure of my code about performance so if there is a faster way of comparing strings, please let me know :)
.386
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\masm32.inc
include \masm32\include\msvcrt.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\msvcrt.lib
includelib \masm32\lib\masm32.lib
.data
msgyes db 'yes ',0
msgno db 'no ',0
.data?
str1 db 100 dup(?)
str2 db 100 dup(?)
.code
start:
;Get 2 strings
invoke StdIn,ADDR str1,100
invoke StdIn,ADDR str2,100
;Get the length of str1 and save it to bl
xor bl, bl ;Zero bl
lea esi, str1
repeatstr1:
mov al, [esi]
inc esi
inc bh
cmp al, 0
jne repeatstr1
;Get the length of str2 and save it to bh
xor bh, bh ;Zero bh
lea esi, str2
repeatstr2:
mov al, [esi]
inc esi
inc bh
cmp al, 0
jne repeatstr2
;Decrease bl and bh because there will be no need to compare '0'
dec bl
dec bh
;Set cl to know how many times to check(in my case bl will be always greater then bh)
mov cl, bl
sub cl, bh
;Set initials
lea esi, str1
lea edi, str2
lea edx, str1 ;will need a copy
mov ch, bh; another copy for len of str2
jmp Compare ;start comparing
;Prepare for next comparision also check if it is the last comparision
PrepareForNextComp:
dec cl
cmp cl, 0
je notfound
lea edi, str2
inc edx
mov esi, edx
mov bh, ch
Compare:
mov al, [esi]
mov ah, [edi]
cmp al, ah
jne PrepareForNextComp
dec bh
cmp bh, 0 ;if done str2's len times, than found
je found
inc esi
inc edi
jmp Compare
notfound:
invoke StdOut,ADDR msgno
jmp finish
found:
invoke StdOut,ADDR msgyes
jmp finish
;wait for input and end
finish:
invoke StdIn,ADDR str2,100
invoke ExitProcess,0
END start
First of all, this is a very nice program, written from scratch and well commented. Compliments :U
Can it be done faster? You have to set up a benchmark, and test it...
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
838 cycles for MyTestCode
438 cycles for InString
402 cycles for Instr_ (MasmBasic)
So, yes, it can be done a bit faster, but so much speed rarely matters. What I would suggest, though, is to reflect on how to set up a testing environment for this kind of exercise. For example, every time you launch your code you have to type in the strings - that can be avoided. See the attachment for a proposal that uses conditional assembly. It contains the timer.asm file taken from the Masm32 Laboratory (http://www.masm32.com/board/index.php?topic=770.0). Make sure you tick "use path names" when unzipping.
Keep up the good work, you are on a very good track :thumbu
you might start by searching for the first letter
then scan the rest of the string
The C standard library contains a function for exactly this purpose, called: strstr()
Here's an example of its use:
.486 ; use 486 instructions
.model flat, stdcall ; Win32 memory model
option casemap:none ; case sensitive
include ..\Include\windows.inc
include ..\Include\stdio.inc
include ..\Include\stdlib.inc
include ..\Include\conio.inc
includelib ..\Lib\crtdll.lib
Main proto
.data
StringA DB 'hi my name is deniz. did u like it?',0
StringB DB 'deniz',0
StringC DB 'result=',0
style1 DB '%s',0
style2 DB '%d',0
.data?
iLVAR dd ?
iRVAR dd ?
string1 dd ?
string2 dd ?
result dd ?
.code
start:
invoke Main
invoke exit, 0
Main proc
mov eax, offset StringA
mov string1, eax
mov eax, offset StringB
mov string2, eax
FINIT
invoke strlen, string1
mov iLVAR, eax
invoke strlen, string2
mov iRVAR, eax
;
FLDZ
mov eax, iLVAR
mov ebx, iRVAR
cmp eax, ebx
jl Label1
;
mov eax, [string1]
mov iLVAR, eax
invoke strstr, string1, string2
mov iRVAR, eax
.if iRVAR != NULL
FILD iRVAR
FILD iLVAR
FSUBP ST(1), ST
FLD1
FADDP ST(1), ST
.endif
Label1:
FISTP result ; store integer, pop stack
;
invoke printf, offset style1, offset StringC
invoke printf, offset style2, result
;
call _getch ; pause
ret ; return to windows
Main endp
;
end start
When run, the output is 15.
Hi Steve,
Where can I get the following:
include ..\Include\stdio.inc
include ..\Include\stdlib.inc
include ..\Include\conio.inc
These are not in the Masm package.
Thanks.
Quote from: abnuque on April 19, 2012, 04:53:35 AM
Where can I get the following:
include ..\Include\stdio.inc
include ..\Include\stdlib.inc
include ..\Include\conio.inc
These are not in the Masm package.
Use the standard masm32rt include and prepend
crt_ to all C functions, as shown below. However, C strstr is a bit slow - and apparently, there is not even a case-insensitive version:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
756 cycles for MyTestCode
624 cycles for InString
417 cycles for Instr_ (MasmBasic)
400 cycles for Instr_, case-insensitive, result=100
560 cycles for C strstr, result=100
include \masm32\include\masm32rt.inc
Main proto
...
invoke crt_strlen, string1
invoke crt_strlen, string2
invoke crt_strstr, string1, string2
invoke crt_printf, offset style1, offset StringC
invoke crt_printf, offset style2, result
call crt__getch ; pause
@Steve: using the FPU for showing the result is unorthodox :bg
Quote from: abnuque on April 19, 2012, 04:53:35 AM
Where can I get the following:
include ..\Include\stdio.inc
include ..\Include\stdlib.inc
include ..\Include\conio.inc
These are not in the Masm package.
Oops, my bad...
in my haste, I forgot to mention that my example was using
jwasm (http://japheth.de/JWasm.html), which includes those filenames.
But, JJ is correct, use his example: Masm32rt and preappend
crt_ as he illustrated.
edit:
fixed typo.
Quote from: SteveAsm on April 19, 2012, 06:46:16 AM
JJ is correct, use his Masm32rt
The owner is Hutch - unless he declares it common forum property :green
That was an editing typo. :boohoo:
Alternately,
the win API has a similar set of functions StrStr (http://msdn.microsoft.com/en-us/library/windows/desktop/bb773436(v=vs.85).aspx) and StrStrI (http://msdn.microsoft.com/en-us/library/windows/desktop/bb773439(v=vs.85).aspx).
Quote from: SteveAsm on April 19, 2012, 07:57:03 AM
Alternately,
the win API has a similar set of functions StrStr (http://msdn.microsoft.com/en-us/library/windows/desktop/bb773436(v=vs.85).aspx) and StrStrI (http://msdn.microsoft.com/en-us/library/windows/desktop/bb773439(v=vs.85).aspx).
They do exist, indeed:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
754 cycles for MyTestCode
615 cycles for InString, result=100
410 cycles for Instr_ (MasmBasic)
400 cycles for Instr_, case-insensitive, result=100
555 cycles for C strstr, result=100
1586 cycles for StrStr, result=100
87280 cycles for StrStrI, result=100
Quote from: jj2007 on April 19, 2012, 12:34:25 AM
First of all, this is a very nice program, written from scratch and well commented. Compliments :U
Can it be done faster? You have to set up a benchmark, and test it...
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
838 cycles for MyTestCode
438 cycles for InString
402 cycles for Instr_ (MasmBasic)
So, yes, it can be done a bit faster, but so much speed rarely matters. What I would suggest, though, is to reflect on how to set up a testing environment for this kind of exercise. For example, every time you launch your code you have to type in the strings - that can be avoided. See the attachment for a proposal that uses conditional assembly. It contains the timer.asm file taken from the Masm32 Laboratory (http://www.masm32.com/board/index.php?topic=770.0). Make sure you tick "use path names" when unzipping.
Keep up the good work, you are on a very good track :thumbu
Thx alot! I'm imitating (ohhh my English... couldn't find a better fitting word but you got the idea :D) what I see in this forum :)
Looking to the attachment right now.
I will be checking like 100.000 strings on memory. Rather than that, for practical purposes I must learn and achieve the fastest.
Quote from: dedndave on April 19, 2012, 01:13:15 AM
you might start by searching for the first letter
then scan the rest of the string
Actually I am doing exactly what you are saying, continuously :) checking first letter if matched checking next or if it doesn't match I slide by 1 letter :)
Btw, I'm avoiding C functions or APIs because some of the comparisions will be done with strings that not actually string (not ending with '0') and I will not need the information where the other string is found. Also all of the strings will be lower case. These made me want to optimize it. The string lengths will not be greater than like 200. So I thought why would i let a C function use the whole register for holding string's length.
I will be checking the examples for sure in order to see a better algorithm or get a better idea.
Soo I still couldn't figure out why it
always give me the answer "yes" which means str1 contains str2 even it doesnt. Intrestingly, if I hardcode the length of strings that I will input(setting bl and bh registers which holds length of each string) I get proper results.
Thx in advance :)
uosunsetter,
Don't give up on writing your own, they are not that hard to write. Usually you search for the first character by scanning one byte at a time, if you find that character you branch and compare the rest of the search string to the current location in the main string, if it matches you return the offset in the string. You can use the C runtime versions if you like and there are plenty of others to use as well but writing your own is good value and you understand it better when you write it yourself.
Quote from: hutch-- on April 19, 2012, 01:57:23 PM
uosunsetter,
Don't give up on writing your own, they are not that hard to write. Usually you search for the first character by scanning one byte at a time, if you find that character you branch and compare the rest of the search string to the current location in the main string, if it matches you return the offset in the string. You can use the C runtime versions if you like and there are plenty of others to use as well but writing your own is good value and you understand it better when you write it yourself.
and
Quote from: dedndave on April 19, 2012, 01:13:15 AM
you might start by searching for the first letter
then scan the rest of the string
Now I get it! I'm writing the new one right now and will benchmark with jj2007's example. Thank you all!
Quote from: jj2007 on April 19, 2012, 08:27:29 AM
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
555 cycles for C strstr, result=100
1586 cycles for StrStr, result=100
87280 cycles for StrStrI, result=100
JJ,
I'm not too disappointed by the performance of the C functions, but, the Win-API functions seem extremely wak.
I don't understand it.
Could you maybe run them several times and average them ?
One of the reasons for the slowness of the API functions may be built-in buffer-overrun protections that the CRT functions lack. See the Security Note here:
http://msdn.microsoft.com/en-us/library/z9da80kz(v=vs.71).aspx
But this does not explain why StrStrI is so much slower than StrStr. I was able to throw together a replacement for StrStrI:
push esi
push edi
invoke crt__strdup, ADDR str1
mov esi, eax
invoke crt__strdup, ADDR str2
mov edi, eax
invoke crt__strlwr, esi
invoke crt__strlwr, edi
invoke StrStr, esi, edi
invoke crt_free, esi
invoke crt_free, edi
pop esi
pop edi
That running on my P3 is some 7 times faster than StrStrI.
Quote from: MichaelW on April 19, 2012, 06:49:50 PM
...See the Security Note here:
Yeh, scary stuff.
Quote
But this does not explain why StrStrI is so much slower than StrStr. I was able to throw together a replacement for StrStrI: ...
That running on my P3 is some 7 times faster than StrStrI.
7x is a significant improvement.
It's probably bad design. Even the Unicode version of MasmBasic Instr, wInstr, runs in 1600 cycles instead of 35000:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
852 cycles for MyTestCode (UoSunSetter)
430 cycles for InString, result=100
391 cycles for Instr_ (MasmBasic), result=100
379 cycles for Instr_, case-insensitive, result=100
1117 cycles for wInstr, case-sensitive, result=100
1564 cycles for wInstr, case-insensitive, result=100
383 cycles for C strstr, result=100
937 cycles for StrStr, result=100
35480 cycles for StrStrI, result=100
... and that one works fine with case-insensitive Russian Unicode strings, for example. Why this would need a check for buffer overrun is a mystery - we are talking zero-delimited strings here ::)
OMG! How about this? :D I hope I didn't make any mistake on benchmark... I put an attachment which includes benchmark. I set MasmBasic flag to 0 'cause couldn't wait to see the results :D
Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz (SSE4)
56 cycles for MyTestCode (UoSunSetter)
560 cycles for InString, result=100
524 cycles for C strstr, result=100
1163 cycles for StrStr, result=100
44536 cycles for StrStrI, result=100
56 cycles for MyTestCode (UoSunSetter)
571 cycles for InString, result=100
468 cycles for C strstr, result=100
1176 cycles for StrStr, result=100
43617 cycles for StrStrI, result=100
.386
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\masm32.inc
include \masm32\include\msvcrt.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\msvcrt.lib
includelib \masm32\lib\masm32.lib
.data
msgyes db 'yes ',0
msgno db 'no ',0
.data?
str1 db 100 dup(?)
str2 db 100 dup(?)
.code
start:
;Get 2 strings
invoke StdIn,ADDR str1,100
invoke StdIn,ADDR str2,100
;Get the length of str1 and save it to bl
xor bl, bl ;Zero bl
lea esi, str1
repeatstr1: ;repeat until the end of str1
mov al, [esi]
inc esi
inc bh
cmp al, 0
jne repeatstr1
;Get the length of str2 and save it to bh
xor bh, bh ;Zero bh
lea esi, str2
repeatstr2: ;repeat until the end of str2
mov al, [esi]
inc esi
inc bh
cmp al, 0
jne repeatstr2
;Decrease bl and bh because there will be no need to compare '0'
dec bl
dec bh
;Set cl to know how many times to check(in my case bl will be always greater then bh)
mov cl, bl
sub cl, bh
inc cl
;Set initials
lea esi, str1
lea edi, str2
lea edx, str1 ;will need a copy
mov ch, bh; another copy for len of str2
jmp MatchFirstLetters ;start by finding first match
PrepareForNextMatchingCheck: ;Prepare registers for finding next first letter match also test if it is the last check
inc edx
mov esi, edx
dec cl
cmp cl, 0
je NotFound
MatchFirstLetters:
mov al, [esi]
mov ah, [edi]
cmp al, ah
jne PrepareForNextMatchingCheck ;no luck? try next
lea edi, str2 ;need to reset edi since it will manipulated by CompareTheRest
inc edi ;first letter already checked so start from second one
inc esi ;same as above
mov bh, ch ;reset the iterator used in CompareTheRest
CompareTheRest: ;Match the rest of the chars one by one
mov al, [esi]
mov ah, [edi]
cmp al, ah
jne PrepareForNextMatchingCheck
dec bh
cmp bh, 1
je Found ;if comparision is done str2's length-1 times and there is no mismatch then found
inc esi
inc edi
jmp CompareTheRest
NotFound:
mov eax, 0
jmp finish
Found:
mov eax, 1
jmp finish
;end
finish:
invoke ExitProcess,0
END start
Forgot to delete driver prefix from include lines in asm file included in benchmark.zip :(
Here is the attachment with fix...
In every test I ran this, the C strstr ran the fastest by about 8-9%.
Intel(R) Celeron(R) M processor 1.70GHz (SSE2)
856 cycles for MyTestCode (UoSunSetter)
430 cycles for InString, result=100
399 cycles for Instr_ (MasmBasic), result=100
404 cycles for Instr_, case-insensitive, result=100
1124 cycles for wInstr, case-sensitive, result=100
1606 cycles for wInstr, case-insensitive, result=100
369 cycles for C strstr, result=100
984 cycles for StrStr, result=100
35407 cycles for StrStrI, result=100
849 cycles for MyTestCode (UoSunSetter)
430 cycles for InString, result=100
403 cycles for Instr_ (MasmBasic), result=100
405 cycles for Instr_, case-insensitive, result=100
1262 cycles for wInstr, case-sensitive, result=100
1428 cycles for wInstr, case-insensitive, result=100
369 cycles for C strstr, result=100
930 cycles for StrStr, result=100
35429 cycles for StrStrI, result=100
MasmBasic was always a close second.
Can it be the old codes I've written? Did you download the latest attachment I posted?
Quote from: uosunsetter on April 19, 2012, 11:51:59 PM
Can it be the old codes I've written?
Yes, it is the old code.
Not a mistake, I was trying to figure why the API functions are so slow.
Quote
Did you download the latest attachment I posted?
Yes, I did download the new code.
I haven't had the chance to look at the source code yet.
But, the result is quite impressive, if it's right.
Quote from: uosunsetter on April 19, 2012, 10:39:49 PM
OMG! How about this? :D
Okay, here is your latest benchmark:
Intel(R) Celeron(R) M processor 1.70GHz (SSE2)
51 cycles for MyTestCode (UoSunSetter)
427 cycles for InString, result=100
369 cycles for C strstr, result=100
938 cycles for StrStr, result=100
35567 cycles for StrStrI, result=100
49 cycles for MyTestCode (UoSunSetter)
431 cycles for InString, result=100
393 cycles for C strstr, result=100
944 cycles for StrStr, result=100
35595 cycles for StrStrI, result=100
My concern, is that since MyTestCode does not output a value, we don't really know if it's working correctly.
I think it must be short circuiting somewhere.
Okay,
looking at your code for MyTestCode, I discovered a flaw in the execution.
When the benchmark executes, it does not wait for any input.
All the other programs use a static string to test against, MyTestCode does not.
As the code is written, it wants the user to enter str1 and str2. This never happens.
So, it proceeds to compare two empty strings and it fails.
For varification, I made two modifications to the code to test this.
Additionally, when I assemble and test the code, it works, but, fails the test as well.
Here are the two changes I made:
.data
msgyes db 'yes ',0
msgno db 'no ',0
msgFailed db 'procedure failed',0
msgSuccess db 'procedure suceeded',0
NotFound:
mov eax, 0
printstr msgFailed ; my own print routine
printstr lf
jmp finish
Found:
mov eax, 1
printstr msgSuccess ; my own print routine
printstr lf
jmp finish
;end
finish:
call _getch ; pause
invoke ExitProcess,0
END start
The output message is: "procedure failed"
EDIT:
I just compared the code you posted above with the code in "benchmark.asm" and I see a difference.
I was testing the code above.
The "benchmark.asm" code does use static strings for comparison, but, I believe it is still failing.
I would expect the CRT strstr function to be fairly fast on short strings. But my previous tests against code that utilizes a Boyer-Moore search (the FreeBASIC Instr function), and the tests below, suggest that strstr is nothing fancy, just simple compiler-optimized code.
;==============================================================================
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
;==============================================================================
.data
str1 db "The three algorithms are variations of a Boyer Moore design.",0
size1 equ $-str1
str2 db "design.",0
size2 equ $-str2
str3 db "The three algorithms are variations of a Boyer Moore design."
db "The original design BMBinSearch is the most flexible across "
db "different search conditions. The two variations BMHBinsearch "
db "and SBMBinSearch are respectively a Horspool variation and a "
db "simplified variation which perform well depending on the "
db "hardware and the patterns being searched."
db "The minimum pattern length is 2 bytes but these algorithms are "
db "slower on very short paterns than a classic byte scanner."
db "The algorithms have about a 300 character penalty at startup "
db "as they must construct a table so for either very short "
db "patterns or short data it is more efficient to use a byte "
db "scanner.",0
size3 equ $-str3
str4 db "it is more efficient to use a byte scanner.",0
size4 equ $-str4
.code
;==============================================================================
start:
invoke GetCurrentProcess
invoke SetProcessAffinityMask, eax, 1
invoke crt_strstr, ADDR str1, ADDR str2
sub eax, OFFSET str1
printf("%d\n",eax)
invoke BMBinSearch, 0, ADDR str1, size1, ADDR str2, size2
printf("%d\n\n",eax)
invoke crt_strstr, ADDR str3, ADDR str4
sub eax, OFFSET str3
printf("%d\n",eax)
invoke BMBinSearch, 0, ADDR str3, size3, ADDR str4, size4
printf("%d\n\n",eax)
invoke Sleep, 3000
counter_begin 1000000, HIGH_PRIORITY_CLASS
invoke crt_strstr, ADDR str1, ADDR str2
counter_end
printf("%d cycles, strstr\n", eax)
counter_begin 1000000, HIGH_PRIORITY_CLASS
invoke BMBinSearch, 0, ADDR str1, size1, ADDR str2, size2
counter_end
printf("%d cycles, BMBinSearch\n\n", eax)
counter_begin 1000000, HIGH_PRIORITY_CLASS
invoke crt_strstr, ADDR str3, ADDR str4
counter_end
printf("%d cycles, strstr\n", eax)
counter_begin 1000000, HIGH_PRIORITY_CLASS
invoke BMBinSearch, 0, ADDR str3, size3, ADDR str4, size4
counter_end
printf("%d cycles, BMBinSearch\n\n", eax)
inkey
exit
;==============================================================================
end start
Running on a P3:
53
53
600
600
211 cycles, strstr
812 cycles, BMBinSearch
2579 cycles, strstr
1396 cycles, BMBinSearch
Oops, I'm off by one on my sizes, but it's not worth fixing.
QuoteThe output message is: "procedure failed"
good time on that, though :P
i am actually a little surprised
Quote from: SteveAsm on April 19, 2012, 11:43:05 PM
In every test I ran this, the C strstr ran the fastest by about 8-9%.
...
MasmBasic was always a close second.
Timings may change by some % depending on the location of the code (see here (http://www.masm32.com/board/index.php?topic=11454.0)), and of course they differ by CPU:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
754 cycles for MyTestCode (UoSunSetter)
615 cycles for InString, result=100
412 cycles for Instr_ (MasmBasic), result=100
399 cycles for Instr_, case-insensitive, result=100
554 cycles for C strstr, result=100
86779 cycles for StrStrI, result=100
The MasmBasic implementation is pretty fast especially for longer strings, but it has a bit of overhead since it handles also case-insensitive and Unicode strings. The C functions are generally very competitive, though. The only surprise here is the StrStrI result. By the way, Lingo wrote SSE4 code (http://www.masm32.com/board/index.php?topic=9370.msg132644#msg132644) that beats everything here, but beware of GPFs :wink
From the beginning I didn't noticed one of the fixes that jj2007 (about getting strings' pointer) made to my codes. SteveAsm was also right :) I just paid too much attention to algorithm that I missed other points.
Also, after spending 2-3 more hours, I accepted some facts: Whoever wrote the InString function is far more experienced than me. Also the time he/she spent to write the codes is probably more than me. So understanding what it is written and manipulating it according to the cases that I will face seems quite beneficial.
But before I go, I got a question: Is there anyway to check if the CPU supports SSE4 instruction set so that I can take the advantage of the link below(thanks jj2007 for pointing and lingo for codes)???? Also if the CPU doesn't support I should be able to call standart(or edited) InString function. And of course it should be checked on execution.
http://www.masm32.com/board/index.php?topic=9370.msg132644#msg132644
if you enable pentium instructions...
.586
you can use CPUID
with all it's caveats, it is pretty easy to get the level of SSE support
most everyone has SSE2 or SSE3 - at least (not all, though)
more recent processors support SSE4 and better
http://www.intel.com/Assets/PDF/appnote/241618.pdf
http://support.amd.com/us/Embedded_TechDocs/25481.pdf
@ uosunsetter
I was looking at your code, trying to debug it and I decided it was easier to just rewrite it, since I knew what you were trying to do.
Maybe somebody can insert this as a replacement for MyTestCode and time it.
hi Steve
timing code is pretty easy
i know Jochen uses some rather "advanced" conditional assembly techniques :red
most of us use MichaelW's timers.asm file
it may be found in the first post of the first thread in the Laboratory sub-forum
http://www.masm32.com/board/index.php?topic=770.msg5281#msg5281
i have included the timers.asm file in the attachment, though
we generally place the timers.asm file in the \masm32\macros folder
here is a very simple example of timing some code...
Quote from: SteveAsm on April 21, 2012, 02:38:44 AM
@ uosunsetter
I was looking at your code, trying to debug it and I decided it was easier to just rewrite it, since I knew what you were trying to do.
Maybe somebody can insert this as a replacement for MyTestCode and time it.
Doesn't even assemble without errors on my Masm32 installation. Could you rewrite it in the form used by the others? Thanks.
InstrSteve proc pString, pPattern
.. code ..
ret ; return position in eax
InstrSteve endp
Thanks dedndave!
Quote from: SteveAsm on April 21, 2012, 02:38:44 AM
@ uosunsetter
I was looking at your code, trying to debug it and I decided it was easier to just rewrite it, since I knew what you were trying to do.
Maybe somebody can insert this as a replacement for MyTestCode and time it.
Actually, as I said, on jj2007's benchmark attachments, he has the fixed version of my code. The errors simply caused by using lea instead of using mov and offset.
What I was doing:
lea esi, str1
What should have been:
mov esi, offset str1
After the replacement I got proper benchmark results which were quite disappointing(and normal :P).
Quote from: uosunsetter on April 21, 2012, 09:52:29 AM
After the replacement I got proper benchmark results which were quite disappointing(and normal :P).
Your code is already faster than the Windows API StrStr version. Comment out two lines, and it's 100 cycles faster:
PrepareForNextComp:
dec cl
;cmp cl, 0
je notfound ; cl=0: NOT FOUND
mov edi, pPattern
inc edx
mov esi, edx
mov bh, ch
Compare:
mov al, [esi]
mov ah, [edi]
cmp al, ah
jne PrepareForNextComp
dec bh
; cmp bh, 0 ;if done pPattern's len times, than found
je found
inc esi
inc edi
jmp Compare
Ewww wouldn't it crash or loops infinetely if I remove them?
no - the DEC instruction sets the Zero Flag if the result is zero - no need to compare with 0
also - if the upper 3 bytes of EBX and ECX are 0, then DEC EBX and DEC ECX are probably slightly faster
that's because DWORD operations are generally prefered on 32-bit processors
and - DEC CL/BL are 2 byte instructions - DEC ECX/EBX are 1 byte instructions
I just needed that kind of informations! Very good tips. Also intresting but makes sense. I learned a lot from this topic :U :8)
ok - another tip :P
it seems that DEC ECX is a little slow on some processors
actually, the instruction is fast if you do not have to wait for the flags
dec ecx
jz notfound
the processor has to wait for the flags to be altered before it can decide which way to branch
sub ecx,1
jz notfound
may be slightly faster, even though SUB ECX,1 is 3 bytes
another way around it is to put some other instruction in there that does not affect the flags
in this code, you don't have one to stick in there :P
but, to illustrate...
dec ecx
mov esi,123456
jnz SomeLabel
is probably faster than
mov esi,123456
dec ecx
jnz SomeLabel
the processor can do something else while it waits for the flags to be set
Hi,
As Dave says, DEC and INC will set the Zero Flag properly.
But unlike ADD and SUB they do not affect the Carry Flag.
So you cannot use all of the conditioal jumps with INC and DEC,
while you can with ADD and SUB. Just a heads up.
$0.02,
Steve N.
Quotethe processor can do something else while it waits for the flags to be set
It makes sense. This is, however, a bit confusing for me. I always thought that only one thing can be done at a time on CPU. Again I thought that maybe there as some exceptions on multi-threading; but as far as I know, even multi-threading only organize and put things in queue without blocking each other.
And 1 other thing that I've been always wondering about is
waiting. All these stuff including waiting(sleep functions, or waiting for a flag etc) do they rapidly check for the condition to be satisfied?
modern CPU's execute many instructions "out-of-order" to achieve speed
they look ahead to see what instructions may be executed without being affected by those before them
and, they execute code in multiple pipe-lines
a major part of optimizing code is understanding how the CPU executes instructions
with modern CPU's - that is not a simple thing - i am mystified by what is faster quite often - lol
as for the Sleep function - that is unrelated to code execution time
when your process (or program, or thread) is asleep, the CPU and operating system allow other threads to execute
a good example is waiting for a key to be pressed by the user
if i call the operating system function to get a key in a loop, it executes very fast
and - it hogs all the processor power waiting for a key :P
so - we insert a call to the Sleep function to allow other threads to execute
i typically use something like 40 or 50 milliseconds
40 mS will accommodate a typist going 250 WPM :bg
even faster, actually...
i write the loop so that, if a keypress is received, it does not Sleep before the next test to see if a key is present
only when no keypress is in the buffer do i call Sleep