I was thinkin is someone out there can make a string searching faster than mine.
Have a look for this function
TextBoxProc proc hWnd:dword,uMsg:dword,wParam:dword,lParam:dword
LOCAL lptext:dword
LOCAL buff[256]:dword
LOCAL hFile:dword
LOCAL nIndex:dword
.if uMsg==WM_KEYUP
invoke mAlloc,1024
mov lptext,eax
xor ecx,ecx
mov ecx,0fffeh
mov [eax],ecx
invoke SendMessageW,hEdit,EM_GETLINE,0,lptext
invoke memfill,addr buff,1024,0
invoke WideCharToMultiByte,CP_ACP,0,lptext,38,addr buff,MAX_PATH,0,0
invoke SendMessageW,hEdit,EM_LINELENGTH,0,0
.if eax!=0
invoke SearchString,fFile,addr buff
mov nIndex,eax
.endif
dec nIndex
.if nIndex!=-1
push nIndex
pop f_ind
.endif
invoke SendMessage,lb,LB_SETCURSEL,f_ind,0
invoke GlobalFree,lptext
.endif
invoke CallWindowProc,tbp,hWnd,uMsg,wParam,lParam
ret
TextBoxProc endp
Farabi,
Tell us what character set you are using or more correctly what code page you are using as you should be able to use nearly any search algo to find words that you select. It looks like you are converting from Unicode to a multibyte format and if we know what you are doing, someone may be able to help.
Farabi,
I've tried to get your app assembled but it does not contain hardcoded include paths, so I gave up.
Without testing, the only section that is probably slow is
Quoteinvoke GetWord,lpFile,ecx,1
mov eax,l_str
; .if [edx].frbWordStruct.nSize>=eax
invoke szCmpi,[edx].frbWordStruct.lpAddress,lpString,l_str
No idea what GetWord is, it's not found in your *.asm code, but szCmpi is probably the bottleneck. Try replacing it with crt_strstri. Have a look at the strstri (http://msdn.microsoft.com/en-us/library/bb773439%28VS.85%29.aspx) docu.
Quote from: hutch-- on December 30, 2009, 08:49:29 AM
Farabi,
Tell us what character set you are using or more correctly what code page you are using as you should be able to use nearly any search algo to find words that you select. It looks like you are converting from Unicode to a multibyte format and if we know what you are doing, someone may be able to help.
Im using the conversion only because the Text box is unicode. So the character set should be the standard
Quote from: jj2007 on December 30, 2009, 09:28:04 AM
Farabi,
I've tried to get your app assembled but it does not contain hardcoded include paths, so I gave up.
Without testing, the only section that is probably slow is
Quoteinvoke GetWord,lpFile,ecx,1
mov eax,l_str
; .if [edx].frbWordStruct.nSize>=eax
invoke szCmpi,[edx].frbWordStruct.lpAddress,lpString,l_str
No idea what GetWord is, it's not found in your *.asm code, but szCmpi is probably the bottleneck. Try replacing it with crt_strstri. Have a look at the strstri (http://msdn.microsoft.com/en-us/library/bb773439%28VS.85%29.aspx) docu.
I think I miss uploaded it.
I used my template and some of the component is not on your.
This should work on your system. GetWord is on hNullString
Hi Farabi,
I haven't checked the code yet, I will though, but from a look at the posts I have to ask why you would convert back and forth to Unicode ? Since the NT API is natively Unicode for the most part, definitely the string functions, you're wasting an enormous amount of time in conversions. I have not written an application in ANSI for quite some time, the overhead for conversion is too costly for any string APIs as well as anything that uses COM. Since your target character set is Arabic, you need to convert back and forth to Unicode to search strings, R2L reading is used and the sorting would be affected since it is a non-roman alphabet. Trying to handle all of these in ANSI would be a chore and would definitely slow things down, again I haven't examined the code yet but switching your application to native Unicode can't do anything but help. Remember that most of the string search algos you'll find here are designed for L2R reading and a roman alphabet and would have to be adapted or discarded as unworkable in your target character set.
I also worked with trying to make a arabic dictionary/translator, it might work, after all arabic unicodes hibyte is same 06 all the time ala all arabic letters are in the shape 06xxh, but maybe you need to reverse the alphabetically search on arabic, after all my small book dictionary is backwards and so is all arabic text, reversed compared to latin letters
all it requires is change the code that handles unicode arabic buffers from this
MOV al,[ebx+buffer] ;ebx holds char number
to
MOV al,[ebx*2+buffer]
you see I mean just change to scale by 2 on all places you access the buffer in a sort algo or whatever algo you need to use there
Quote from: daydreamer on December 30, 2009, 07:35:40 PM
all arabic letters are in the shape 06xxh
If speed is an issue, that opens another road: Translate your source file into a single byte file without the 06 bytes, and use standard strstr or whatever. For displaying, you need to add the 06h again, of course.
Quote from: donkey on December 30, 2009, 04:00:26 PM
Hi Farabi,
I haven't checked the code yet, I will though, but from a look at the posts I have to ask why you would convert back and forth to Unicode ? Since the NT API is natively Unicode for the most part, definitely the string functions, you're wasting an enormous amount of time in conversions. I have not written an application in ANSI for quite some time, the overhead for conversion is too costly for any string APIs as well as anything that uses COM. Since your target character set is Arabic, you need to convert back and forth to Unicode to search strings, R2L reading is used and the sorting would be affected since it is a non-roman alphabet. Trying to handle all of these in ANSI would be a chore and would definitely slow things down, again I haven't examined the code yet but switching your application to native Unicode can't do anything but help. Remember that most of the string search algos you'll find here are designed for L2R reading and a roman alphabet and would have to be adapted or discarded as unworkable in your target character set.
No Im not going to make an arabic string searching, thats why I convert the string to an ANSI.
Farabi, to make it fast, you have several options. Here are two of them:
1. Load the dictionary to memory, allocate sorted (important) string pointer array and fill it with string pointers (to each keyword). Use then binary search to find a string in up to log2(stringcount) iterations: max 16 string-compare calls for 64k keywords.
2. If the dict is too large, divide it into blocks, for example each block for word-start letters (A-Z in english). Save block file offset and length somewhere. Now before searching a string, pick the first letter, load the corresponding group into memory as in first step, and again do a binary search.
blockindex = pszKeywordToFind[0] - firstletter_ascii ; english: pszKeywordToFind[0] - 'A'
loadBlock(hFile, blockOffsets[blockindex], blockSizes[blockindex])
To search without carrying of the letters case, convert your dict and the keyword to upper or lower case.
Binary search:
min = 0
max = NumberOfKeywords-1
while (min <= max)
{
pos = min + ((max - min) / 2)
cmpr = strcmp(keywords[pos], pszKeywordToFind)
if (cmpr == 0) then return pos ; found
if (cmpr < 0) then min = pos+1
else max = pos-1
}
return -1 ;not found
Note: string compare function MUST be the same in search and sort subroutines: strcmp, wcscmp, lstrcmp, ...
Well, the above code is the best search function I had.
I just wonder if anyone beat it.
Quote from: Farabi on January 02, 2010, 03:13:26 AM
Well, the above code is the best search function I had.
I just wonder if anyone beat it.
We (Hutch, Lingo, ...) can beat any search algo (http://www.masm32.com/board/index.php?topic=9370.0), provided we get a testbed that meets some basic standards, i.e.
1) is written in
ml or
Jwasm cpmpatible syntax
2) has Masm32 include paths, i.e. include
\masm32\include\masm32rt.inc etc
3) contains
all necessary libraries and includes.
Quote from: akane on December 31, 2009, 12:16:38 AM
To search without carrying of the letters case, convert your dict and the keyword to upper or lower case.
Good news: In Arabic there is only one case. :green