How do I Search an Array?
This lesson describes two frequently used algorithms for finding entries in an array. Sample code is also provided.
There are a number of ways to search arrays and the choice of search depends largely on the type and size of the data set you are searching through. In particular, you may want to consider whether or not an array is to be sorted. A sorted array can significantly increase the speed with which you can search for an item in an array.
Refer to: How do I sort an array? for information on sorting arrays.
If an array is not sorted, your search algorithm will most likely have linear performance characteristics, and you would have to test every item in the array until you find the matching record. In the worst case scenario you would have to test every single item in the array. This is not necessarily a problem if the array is small or you run searches infrequently.
If you are using large data sets or you are searching for records on a frequent basis, then you may want to use a binary search algorithm that has logarithmic performance characteristics. This kind of algorithm is also called a boolean search and makes assumptions about how the data is stored in the array. These assumptions allow the search space to be halved at each iteration.
To start with, we will create the array we want to search through. LiveCode arrays are associative arrays and store both a key and an entry. We can use this representation to sort the arrays or index the entries through the keys. In order to provide an array that works for both the linear and logarithmic searching algorithm, we order the content of the array alphabetically. You can test how the binary search algorithm starts to deteriorate when you start to change the order of the array.
# first we set up the entries in our array local tArray put "Adams, Alexander" into tArray put "Beaumont, Benjamin" into tArray put "Best, Tony" into tArray put "Case, Doreen" into tArray put "Frog, Diana" into tArray put "Kenyon, Oliver" into tArray put "Macphail, Ian" into tArray put "Miller, Kevin" into tArray put "Nobody, Peter" into tArray put "Pepe, Tio" into tArray put "Ralph, Sue" into tArray put "Scott, Peter" into tArray put "Ship, Steve" into tArray put "Smith, John" into tArray put "Walters, Chris" into tArray
Note: The array key starts at 1 and increments by 1 for each new item that is added. It is important that you follow this numbering scheme for the binary search algorithm in this example to work. An item stored in array element 0 would not be found.
A Linear Search algorithm is an iterative process that tests one entry of the array at a time. In this case we start at the beginning of the array and compare every item in the array against the item we are searching for. The process stops once the item we were looking for has been found or all the array items have been tested and a match could not be found.
The worst case scenario for this algorithm is not finding a matching record. In this case it is necessary to step through the entire array and match against every single item. If we have 1024 items in the array, then we have to test all 1024 items to confirm that a match cannot be found.
function linearSearch @pArray pItem local tIndex local tResult # return 0 if we cannot find an item in the array put 0 into tResult # get the keys of the array get the keys of pArray split it by return # visit every array element indexed by a key repeat for each element tIndex in it # test if the current array element is a match with the item we are looking for if pArray[tIndex] = pItem then put tIndex into tResult exit repeat end if end repeat return tResult end linearSearch
The function can be called as follows:
put linearSearch (tArray, "Best, Tony") into tArrayKey
The first argument to the function is the array we are searching through. This is passed in by reference only. The second argument is the item we are looking for. The result is the key that refers to the item of the array we matched against. This function returns 0 if no item is found. Remember how this ties in with the representation of the array where the keys start at 1.
A Logarithmic Search algorithm reduces the search area in the array by one half at each iteration. This provides significant performance advantages, compared to linear approaches. This performance is maintained, regardless if a match can be found in the array or not. If the array we are searching through had 1024 items, then the worst case scenario would only take 10 iterations to complete.
As the array is sorted, we compare our search item against the item that lies in the middle of the array and then test whether or not our item is greater or smaller than the item we are comparing it against. The result of this comparison allows us to discard half of the array items. So the first comparison would reduce the search space from 1024 items to 512. The second search would reduce the search space from 512 to 256 and so on, until we narrow our search down to one item that is or is not a match.
The algorithm in this example implements binary search using a programming technique referred to as recursion. Recursion means that the function can call itself from within itself. The variables in each instance of the call history are local to that instance of the function call. This allows us to break down the search space by a half, each time the binary search algorithm calls itself, without having to worry about keeping track of where exactly we are searching in the array.
function logarithmicSearch @pArray pItem pLeft pRight local tIndex local tResult # create a new range pointer that points to the item that lies # right between the left and right range pointers put round ((pLeft + pRight) / 2) into tIndex # if we have found the matching item then stop processing and return it if pArray[tIndex] = pItem then put tIndex into tResult # if any of the range pointers have the same value # then the item we are looking for does not exist in the array and we return 0 else if (pLeft = pRight or pLeft = tIndex or pRight = tIndex) then put 0 into tResult # if we have not yet found a match and none of the range pointers have the same # value then call logarithmicSearch again with a smaller search range # we are effectively halving the search area, each time we call logarithmicSearch else if pArray[tIndex] > pItem then put logarithmicSearch (pArray, pItem, pLeft, tIndex) into tResult else put logarithmicSearch (pArray, pItem, tIndex, pRight) into tResult end if return tResult end logarithmicSearch
The function can be called as follows:
put logarithmicSearch (tSortedArray, "Best, Tony", 0, 15) into tArrayKey
The first argument to the function is the array we are searching through. We are passing this into the function by reference only. The second argument is the item we are looking for. The third and fourth arguments provide the array range we are searching through. The range starts with 0 and ends with 15. 0 is one less than the start key of the array and 15 is the last key of the array. The result is the value of the key that refers to the item of the array we matched against. This function returns 0 if no item is found. This ties in with the representation of the array when the keys start at 1.
Note: The reason for providing 0 as the starting point of the array is due to the way by which the array range is halved at each iteration. In this implementation we round the result of halving the array space. This means that we would never access an array item with a key of 0.