News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Binary Search Help

Started by chris.bowden, December 09, 2008, 02:31:23 PM

Previous topic - Next topic

chris.bowden

Hi Everyone! Hopefully most of the traffic on this site doesn't come from us kiddies in assembler classes  :wink
I'm having slight troubles with a Binary Search program in assembler. We're not using MASM (we're using good 'ol Motorola 68k) but I think that someone will be able to lead me in the right direction :)

The purpose of the program is to load an array of data from some random file. The data in this random file is always integers -- positive and negative. Then we had to craft 2 subroutines, one to collect user input, one for the binary tree recursion. My user input works just dandy. My binary tree subroutine however is a little ... well off. I am purposely controlling the input so that elements will be "found" in the array. For some values, the algorithm can find it and correctly return the index back to the driving program to let the user know "hey, we found your value".

I have spent a few hours debugging and have found the troublesome code, but I can't seem to piece it together in the right way. What is happening is the binary search function will run a couple times, and then randomly get hooked up on one index, and continually stare at that index. We are passing addresses rather than values from the driving program for the hi & lo properties of the binary search.

Any ideas are appreciated :)

Mirno

Firstly take your problem down to a minimum simplified example...
As your code is recursive, unless there is some state persisting (a stack overflow perhaps) then each recursive call should be identical to the others semantically - so drop down to the simplest search of 2 numbers (one doesn't work as you never traverse a branch).
If that works then 3 (there are subtleties with these two cases).

My immediate thought was that you'll be failing whenever the last item is either the last thing in the left or right hand side of your list.

The second thing will be testing that you don't accidentally discard the middle item when searching an odd length list (which is why you also need to test 3 items too).

Mirno

Rockoon

Can you give a high level pseudo code example of your traversal algorithm?

A rough implementation is something like:

function bsearch(node, value)
  if (node=0) then return node
  if (value < node.value) return bsearch(node.lowchild, value)
  if (value > node.value) return bsearch(node.highchild, value)
  return node


here, 'node' is assumed to be a pointer with '0' indicating an illegal/unallocated node.. it is also presumed that all nodes either have a valid pointer in lowchild and highchild, or a pointer = 0.

It is the last part there that could be tripping you up. Are unallocated nodes handled correctly?

The above function would return a pointer of 0 if the value is not found within the tree
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

KeepingRealBusy

Chris,

I have found that the problem was always related to calculating the mid point. If using indexes, add the low and high and divide by 2. However, if the low is 1 and the high is 2, 1+2=3, 3/2, is 1, and then 1+1 is 2, 2/2 = 1 = you are stuck there  forever since you need to get to 2 for the match. The solution is to check if the first+1 is the same as the second, if so choose the second, if not a match, then search for 1 to 2-1 or 1 to 1 thus finding the match. Watch out if using offsets to values. Insure tha the array of pointers (offsets) is aligned on a DWORD. (Low+High)/2 could get you an odd address, i.e., 100000 + 100004 = 200004, 200004/2 = 100002, not a valid offset in the array. You have to mask any result with -4 to insure good pointers, and the same procedure as with indexes, if the first offset + sizeof DWORD is the same as the second offset, then use the second.

Dave.

Rockoon

You mentioned that your data was random, but never mentioned sorting it. I assumed it was a binary TREE search...

A regular binary search through a sorted list definitely has a few gotchas. I always refer to Knuth when implementing it.

His technique essentialy uses the idea that your high index should never be inclusive to the window being searched.

A window of 1 element has high = low + 1. The only possibility is table[low]
A window of 2 elements has high = low + 2. The only possibilites are table[low] or table[low + 1]
and so on

In this way, you never have an issue with the middle point rounding

This implementation has the added advantage that you only need 1 comparison per iteration!

pseudo-code:

function bsearch(table, value)
  low = 0
  high = table.length
  while (low < high)
    mid = (low + high) >> 1
    if (table[mid] < value)  low = mid + 1  else  high = mid
  wend
  return low

upon return, table[low] = value OR low is the insertion point if you intend to insert said value

a single comparison per iteration, returns the insertion point too.. whats not to like?

When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.