News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Table

Started by ecube, July 16, 2009, 11:08:57 PM

Previous topic - Next topic

hutch--

cube,

what is the criterion that you are searching by ? It effects how you best do the search you need to do. If it was words of text you do it alphabetically, with numbers you can use a range etc ...

Give us some idea and there may be a really neat way to do what you are after.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Keep it simple:

include \masm32\include\masm32rt.inc

.data
MyTable dd 12, 34, 56, 78

.code
start:
xor ebx, ebx
.Repeat
print str$(ebx), 9
.if MyTable[4*ebx]>77 ; value to compare with
print "Gotcha"
.Break
.endif
print str$(MyTable[4*ebx]), 13, 10
inc ebx
.Until ebx==4
getkey
exit

end start

Neo

Quote from: E^cube on July 17, 2009, 08:17:58 AM
Slugsnack can you elaborate on what a binary tree is and why its faster? I mean we're loopin through dwords, how can you move any faster? I've always been curious how databases operate so fast, but been strugglin to find an answer.
It completely depends on what you're searching through and what you're searching for.  Slugsnack was probably referring to a Binary Search Tree (BST).  A Balanced BST can guarantee O(logn) time to search through n data points, which can be better than linear searching for large enough n, but the cross-over point when it becomes faster varies a lot with the problem at hand.

When databases use a tree instead of a hash (a data structure good for looking up values based on exact key-numbers), they use a slightly different type of search tree known as a B-Tree (or a B+ Tree).  The tree has the advantage of being able to check for values near a query value or in a range of values, whereas a good hash has the advantage of faster lookup of exact values (and sometimes less space).

Hope that helps (or at least gives you some interesting stuff to read)  :U

Slugsnack

well at the moment you have a linear search time. the average time taken to find something is n/2. a binary search tree works by having nodes which you traverse down to get to the one you want

first off, each node has 2 pointers attributed to it. this can be coded as a structure.. with a dword for the value and a left and a right pointer. so 3 dwords in total.

now then say we started with a value 50. then our first node is 50 with 2 null pointers for left + right. in an add function you wanna check whether the next node you wanna add is bigger or smaller. let's say we wanna add 25. allocate memory for the new structure. initialise the value of the new structure. and zero its left + right node. then since it's smaller than 50, change the first structure's left pointer to the address of the newly allocated structure. you keep doing this and then when you are finding something you traverse down the tree on average splitting the tree in half each time. i think you can see how that's faster.

however if you wanna make it better you want to make it a balanced tree. unbalanced/degenerate trees are where each parent only has one child. so to get to something the tree then has the same efficiency as a regular linear search like what you're currently doing.

bsts are very easy to implement and i'm sure you will not have problems with doing so. the 'best' way or imo the most natural way to code the functions for traversal/deletion/adding is to do so recursively. since a binary search tree is essentially a recursive abstract data type this seems to me the best way to do it and in the past every time i've implemented one i found this was the easiest. alternatively.. do it in a while loop and break when you succeed finding the node you want. i have some C code for it if you want but there's loads of example code around the net

ecube

I have no specific goals in mind I was just curious how someone could possible speed up a search when you're already loopin via dwords. Granted using some kind of cache or somthing on common requests and jump right to that  common spot when needed would provide a speed up other then that im lost, i'm trying to understand this binary search tree stuff but still isn't syncing in, so i'll google more thanks for the information.

Slugsnack

oh you're looking at it wrong haha. it's not about the code. it's about the algorithm. a binary search tree works essentially because when you add nodes you make sure the resulting tree is ORDERED. and then you traverse down saying, is what i am looking for bigger or smaller than this node. if smaller go left, if bigger go left. anyway i think with a few diagrams you'll understand straight away

dedndave

simply put...
a binary search tree continually divides the data in half until the item is found
1) is it this half or that half
2) if item is in this half - split that in half and go to 1
3) item not in this half - select the other half and go to 2
that is why it's called a "binary" tree - it's either a 0 or a 1

ecube

oh thats simple enough, why couldn't they have said that on various sites I been too  :bg I was beginning to think binary trees even in their simplest forms where some kind of heavy duty calculas.  ::)

dedndave

or a tree with two kinds of leaves   :P

hutch--

There are a few tricks with trees, randomise the source before building the tree then balance the tree and it will keep performance up across a wider range of items in the tree. If you don't need to expand it forever a hash table is usually faster if its designed properly.

A database usually uses an index so you scan the index by whatever criterion rather than scan all of the data and this is not all that hard to emulate in assembler, allocate an array of whatever sized elements then allocate an array of pointers to the first array and you can then rip through the pointers reasonably quickly.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

FORTRANS

Quote from: E^cube on July 17, 2009, 10:24:58 PM
I have no specific goals in mind I was just curious how someone could possible speed up a search when you're already loopin via dwords. Granted using some kind of cache or somthing on common requests and jump right to that  common spot when needed would provide a speed up other then that im lost, i'm trying to understand this binary search tree stuff but still isn't syncing in, so i'll google more thanks for the information.

Hi, E^cube

   A simple analysis of a linear search versus a binary search.
I leave out the tree structure for now.  And I will over simplify
some things as well, just to let the real guys criticize.

   Let there be a set of numbers that need to be searched.  On
Friday there is a raffle at some place.  255 people give their
business cards to the office droid, he types the information into
a spreadsheet, and then puts the card in a fish bowl.  After all
the cards are processed, the cards are jumbled up, and one is
selected.  C3PO then searches the data array in a linear fashion.

   (1)  He starts at the first element in the array.
   (2)  He then compares the local phone number on the card to the
one in the array, and sees if they are equal.
   (2a)  If yes, he marks that record as the winner.  Then he goes
to 3.
   (2b)  If they are not equal, he looks at the next record and
goes to 2.
   (3)   Quit, or process the second and third place winners as
required.
   Eventually the right number will be found.

   In the best case the numbers will match on the first number in
the array (#1) with one comparision.  In the worst case the numbers
will match on the last number in the array (#255), and there will
have been 255 comparisions.  (You can go from zero instead of one
and go to 254, or from any # to # + 254 if wanted.)  On average it
will take 127.5 comparisions to find a match.

   Now let's look at a binary search.  After the data is entered,
sort the array from smallest to the largest.  The smallest is in
Array(1) and the largest is in Array(255).  3CPO is now told to set
a minimum indicator to 1 for Array(1), and a maximum indicator to
255 for Array(255).  I will call those AMin and AMax as I'm a poor
typist.  And now he uses the following algorithm.

   (1)  He takes AMin, adds AMax, and divides by 2, call that AMid.
   (2)  He looks at Array(Amid) and compares that to the phone
number on the business card, and sees if they are equal.
   (2a)  If yes, he marks that record as the winner.  Then he goes
to 4.
   (3)  They are not equal, he sees if the card is larger than the
Array(Amid) number.  If no, go to 3a, else go to 3b.
   (3a)  Current Array entry is too small, look at the remaining
larger values.  Take AMid + 1 and assign that to AMin.  Go to 1.
   (3b)  Current Array entry is too large, look at the remaining
smaller values.  Take AMid - 1 and assign that to AMax.  Go to 1.
   (4)   Quit, or process the second and third place winners as
required.
   Eventually the right number will be found.

   Let's walk through this for some contrived example.  Set the
Array elements equal to their array index.  Array(1) equals one
and so forth.  Set the card number to 48.

   (1)  AMid = ( 1 + 255 ) / 2 = 128. (2) Array(128) is 128 and
not equal to 48, so go to 3.  (3)  128 is larger than 48, go to 3b.
(3b)  So set AMax to 128 - 1, and go to 1.
   (1)  AMid = ( 1 + 127 ) / 2 = 64. (2) Array(64) is not equal,
go to 3.  (3)  64 is larger than 48, go to 3b.  (3b)  So set AMax
to 64 - 1, and go to 1.
   (1)  AMid = ( 1 + 63 ) / 2 = 32. (2) Array(32) is not equal,
go to 3.  (3)  32 is smaller than 48, go to 3a.  (3a)  So set
AMin to 32 + 1, and go to 1.
   (1)  AMid = ( 33 + 63 ) / 2 = 48. (2) Array(48) is equal to 48,
mark winner, and go to 4.  (4)  Quit.

   It took four iterations to find the match.  With the linear
search, it would have taken 48 comparisions to find it.  The best
case is still finding the match on the first comparision.  But now
the worst case is nine sets of comparisons A.O.T. 255.  A crude
example of a search for 1 in our contrived case gives the following
values.

Iteration  AMin AMid AMax
    1         1  128  255
    2         1   64  127
    3         1   32   63
    4         1   16   31
    5         1    8   15
    7         1    4    7
    8         1    2    3
    9         1    1    1

   A slight modification can eliminate the last degenerate case
for the stated conditions.  If the search key can be missing from
the database, it will be required.

   So, a binary search uses much fewer comparisons than a linear
search for most searches.  A big advantage.  The binary algorithm
is more complex, a disadvantage.  However it requires sorting the
data before using the search algorithm.  A much bigger disadvantage.

   For a small number of searches, you are better off using a linear
search.  But if you need a large number of searches in a large
database, the binary algorithm will eat the linear algorithm's lunch.

   The tree idea uses an index array, or list, to avoid the prelimiary
sort, and makes adding and deleting records in the database easier.
Another increase in complexity to gain speed when used often.

   Did that help any?

Regards,

Steve N.

Slugsnack

one nice thing about using an ADT is that it is dynamic. whether you are creating a new object if you are using OOP or allocating new memory for a sequential program, that is one benefit. of course you have a slight overhead on each new/deleted item. mind you, if you have a regular array of dwords and you were to delete an item.. then unless you shift all the elements after the deleted one left one, then you now waste 1 compare every loop. a sort of 'dynamic version' of an array is thereby a linked list ^_^ where deleting/adding is a case of allocating/freeing memory and changing pointers again. you can get ordered linked lists.. doubly linked lists, etc.

ADTs are very cool and useful in practice and something worth learning about. i'll leave you to read up on the ones i've mentioned if you're interested : ) but any questions and ask back here and i'm sure a lot of people will know. admittedly i've rarely seen ADTs used in assembly language but there is no real reason for that other than that it's a bit more complicated but once you do get to know them you can see where they are applicable and you'll learn to love them !

if you do want some example code i can rustle some up for you ; )