Print Page - A demonstration of indexed lookup

Title: A demonstration of indexed lookup
Post by: Larry Hammick on December 13, 2009, 02:40:24 PM

Attached is a demonstration of the generation and use of an index into a sorted table. The program reads a sorted list of strings from a text file. It then counts the lines and generates
-- a table of the addresses of those lines in memory
-- a table of dwords, one for each line, in which the lower word is a negative offset and the upper word a positive offset, allowing the main routine (IndexedLookup) to find a string among the 198 given ones after at most 8 string comparisons. :8)
The ASM file is about 350 lines. It's approach would be okay for a list of up to 16K different strings, but the code could be easily adjusted for larger tables.

Title: Re: A demonstration of indexed lookup
Post by: z941998 on December 14, 2009, 07:29:18 AM

Thanks Larry.

I was not expecting re-entrant code, but now I see how to use re-entrant code with indexs/arrays/tables.
Of course with me programing, I would have to add many safety checks everywhere, as I know there is a limit before I blow it up.

Steve

Title: Re: A demonstration of indexed lookup
Post by: Larry Hammick on December 14, 2009, 11:39:17 AM

Since each re-entry splits a portion of table into two nearly equal pieces, the stack growth is predictable, and it is only on the order of log_2(tablesize). There might be a better way to construct that sort of index. It's easy if the table has 2^n - 1 entries, but not so easy in the general case. For say 15 entries, it looks like this:

Code Select

Title: Re: A demonstration of indexed lookup
Post by: ecube on December 14, 2009, 11:51:41 AM

Can you elaborate more on how this works please, how does the at most 8 searches yeild the found entry always with up to 16k entries? I get you're turning them into dwords or something, but how does signed dwords help you? Also why do you start in the middle of the table?

Title: Re: A demonstration of indexed lookup
Post by: z941998 on December 14, 2009, 12:14:03 PM

When you do a serach starting at the midway point, there are three posible outcomes. One: the serach table and the serach item match and you are done, two: the search item is less than the current (midway) value therefore you must seek the value to the right persay, three: the search item is greater than the current (midway) value therefore you must seek the value to the left persay.

Each decision made reduces your search area in approximatelly have.

Example 16 items, midway point is 8, min is 0, search for 2.
first evaluation [16,8,0,2] 8>4 therefore look left (adjust your pointers here min=min, max=oldmid, newmid = (max-min)/2
second evaluate [8,4,0,2] 4>2 therefore leftagain
third evalue [4,2,0,2] 2=2 we are done.

When quanties grow each decision is approximately n/2 away if they don't match up.

Title: Re: A demonstration of indexed lookup
Post by: dedndave on December 14, 2009, 12:30:39 PM

E^cube - i think the 8 passes was for 198 entries - sounds like a recursive binary search tree

Title: Re: A demonstration of indexed lookup
Post by: ecube on December 14, 2009, 12:44:53 PM

Wow that's awesome, wikipedia said D. A. Heger (2004)[2] presented a performance comparison of binary search trees.Treap was found to have the best average performance, while red-black tree was found to have the smallest amount of performance fluctuations. How do you think recursive binary search trees rate against those, as thing seems really fast from your explanation.

Title: Re: A demonstration of indexed lookup
Post by: z941998 on December 14, 2009, 12:50:57 PM

Dedndave is right.

Use recursive to build your lookup table as Larry as shown.

Then control your three pointers to do the evaluations, no recursive at this level.

With word ptrs we have 64K possible outcomes, divide in have for signed control gives 32K and with 8 decisions you have 16k per side.

Sorry my earilier message was bit off, Daves reply always puts me back on track.

Title: Re: A demonstration of indexed lookup
Post by: z941998 on December 14, 2009, 12:54:01 PM

Do a search for multiway trees, you get better examples that the red-black examples.

Title: Re: A demonstration of indexed lookup
Post by: ecube on December 14, 2009, 01:03:06 PM

thanks i'll bing that,also what about really large searches, of files containing megabytes/gigabytes of entries, will this method still be plausible?

Title: Re: A demonstration of indexed lookup
Post by: Larry Hammick on December 15, 2009, 01:40:22 AM

8 searches are enough for up to 255 strings. For 2^n - 1 strings you would need n searches, never more, but usually n or n-1.

Quote from: E^cube on December 14, 2009, 01:03:06 PM
thanks i'll bing that,also what about really large searches, of files containing megabytes/gigabytes of entries, will this method still be plausible?

Good question. :D I'm gonna try it, and that's why I am right now writing a tool for making very big obj files containing data only -- world's crudest assembler, written in QBasic for DOS!

In the source code that I posted, the index entries are multiples of 4. Better idea: make them just signed short integers and let the searching routine apply the scaling factor of 4.

Title: Re: A demonstration of indexed lookup
Post by: Slugsnack on December 15, 2009, 01:44:37 AM

Now try skip lists ^^ Those are even cooler

Title: Re: A demonstration of indexed lookup
Post by: hutch-- on December 15, 2009, 02:31:52 AM

Larry,

Give either FDA.EXE or FDA2.EXE a try, they work is already done to make object modules from data files. FDA and FDA2 are two toys in the masm32 project.

Title: Re: A demonstration of indexed lookup
Post by: Larry Hammick on December 15, 2009, 04:38:41 PM

Here's a variant of the re-entrant routine called "tree" that generates an index table. This time the records are 8 bytes apiece, and the scaling factor of 8 is left out of the index. Thus it's general enough, in principle, for tables of up to 2^32 - 1 records.

Code Select

tree_8 proc bottom:DWORD,top:DWORD
;Re-entrant routine, fills in a table of quadwords with pairs
;of signed integers determined by the size of the table.
;The input variables are the top and bottom of a memory chunk of
;uninitialized 8-byte records. tree_8 fills it in with positive
;and negative offsets (by record number, not record address)
;and returns eax=the address (not record number) of the entry point
;for a search routine.
local midd:DWORD
    mov  edx,bottom
    mov  eax,top
    sub  eax,edx
    shr  eax,4
    shl  eax,3         ;sets or clears the zero flag
    lea  eax,[eax+edx]
    mov  midd,eax
    mov  dword ptr[eax],0
    jz   short @F
      invoke tree_8,edx,eax
    mov  edx,midd
    sub  eax,edx
    sar  eax,3
    mov  [edx],eax
@@: mov  edx,midd
    add  edx,8
    cmp  edx,top
    je   short @F
      invoke tree_8,edx,top
    mov  edx,midd
    sub  eax,edx
    sar  eax,3
    mov  [edx+4],eax
@@: mov  eax,midd
    ret
tree_8 endp

I googled up a word list of 90,000+ words, and my low-tech data object assembler is working, so, soon I'll have a demo of a honkin' fast dictionary-style lookup.

Title: Re: A demonstration of indexed lookup
Post by: z941998 on December 16, 2009, 06:10:23 AM

@slugsnack: Very nice evolution to multiway trees/nodes and I like the probalistic (conditional logic) approach.

@E^cube: if you have 300 keys (indexs) per a node and using a multiway tree approach with an order hieght of 5, you can have approximately 6 billions keys to search/find. I have included a small spreadsheet that you can change the number of keys per node and then look at the 5 levels and see how many keys you will have. If you evolutionize to skip tree, I don't know what will happen, but at first glance it appears to be better than multiway process.

@Larry: Question: Your first listing had a cmp 0A0Ah, if zero then exit out. Does this mean a match was found?

Title: Re: A demonstration of indexed lookup
Post by: Larry Hammick on December 16, 2009, 08:26:18 AM

Quote from: z941998 on December 16, 2009, 06:10:23 AM
@Larry: Question: Your first listing had a cmp 0A0Ah, if zero then exit out. Does this mean a match was found?

Yes. The strings have been trimmed and the line-ends have been replaced with the line feed character 0Ah.

Title: Re: A demonstration of indexed lookup
Post by: Larry Hammick on December 17, 2009, 12:37:11 PM

http://www3.telus.net/ldh/asm/WORDLIST.ZIP

That zip contains the source code and build files for wordlist.dll, wordlist.inc, wordlist.lib, and demo.exe. The dll contains the indexing and lookup routines, and demo.exe specifies a text file (included) of 90,000+ sorted strings, and calls the dll to look up specific strings or determine if they are not present.

Update: that demo breaks down for bigger text files, because of these errors in wordlist.asm:

Code Select

    jc   short IL_look_higher
;IL_look_lower:
    movsx eax,word ptr[esi+8*ebx+4]
    jmp  short IL_adjust
IL_look_higher:
    movsx eax,word ptr[esi+8*ebx]
IL_adjust:

Should be:

Code Select


    jc   short IL_look_higher
;IL_look_lower:
    mov eax,[esi+8*ebx+4]
    jmp  short IL_adjust
IL_look_higher:
    mov eax,[esi+8*ebx]
IL_adjust:

I used my old friend QBasic to generate a text file of 450,000+ "words", and after a test or two, I'll upload the demo for files of that size.

Update:
Okay, the above zip file is now corrected and contains the new and bigger word list.

Title: Re: A demonstration of indexed lookup
Post by: ecube on December 25, 2009, 09:32:15 PM

Outstanding work :U I tested and it seems to work correctly :bg this would be great on databases, I think with your permission Hutch should try and add to masm32 so all can benefit from blazing fast searches

Title: Re: A demonstration of indexed lookup
Post by: Larry Hammick on January 02, 2010, 04:00:48 AM

Quote from: E^cube on December 25, 2009, 09:32:15 PM
Outstanding work :U I tested and it seems to work correctly :bg this would be great on databases, I think with your permission Hutch should try and add to masm32 so all can benefit from blazing fast searches

Anybody can do anything he likes with that one.

Title: Re: A demonstration of indexed lookup
Post by: z941998 on May 13, 2011, 02:19:53 AM

I took the next step to creating an automatic platform for handling Trees/Indexed Lookups for my own library of tools.

I added a few extra features, like being able to control the terminator value when normalizing and added more compare routines (ones that can handle numeric data - integer format at this time).

I can load my test data, create the tree/index's with no apparent problems, however, when I perform the lookup, I am getting GPFs.

I think this is a result of not understanding how each designed level (ie: byte, word, and dword) of Trees/index's should be controlled. I am thinking I may have an issue with creating the Tree and also the lookup to sync with the tree.

I need to get back on track, so have a crack and provide me your feedback. See attachment. Thanks Steve

The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: Larry Hammick on December 13, 2009, 02:40:24 PM