Serial and binary search in data structure c++


For searching an array, a common approach is to count one operation each time that the algorithm accesses an element of the array. Usually, when we discuss running times, we consider the "hardest" inputs, for example, a search that requires the algorithm to access the largest number of array elements.

This is called the worst-case running time. For serial search , the worst-case running time occurs when the desired item is not in the array. In this case, the algorithm accesses every element. Thus, for an array of n elements, the worst-case time for serial search requires n array accesses. An alternative to worst-case running time, is the average-case running time, which is obtained by averaging the different running times for all inputs of a particular kind.

For example, if our array contains ten elements, then if we are searching for the target that occurs at the first location, then there is just one array access.

If we are searching for the target that occurs at the second location, then there are two array accesses. And so on through the final target, which requires ten accesses. The average of all these searches is:. Both worst-case time and average-case time are O n , but nevertheless, the average case is about half the time of the worst-case.

A third way to measure running time is called best-case , and as the name suggests, it takes the most optimistic view. The best-case running time is defined as the smallest of all the running times on inputs of a particular size. For serial search, the best-case occurs when the target is found at the front of the array, requiring only one array access. Thus, for an array of n elements, the best-case time for serial search requires just 1 array access. Unless the best-case behavior occurs with high probability, the best-case running time is generally not used during analysis.

Hashing has a worst-case behavior that is linear for finding a target, but with some care, hashing can be dramatically fast in the average-case. Hashing also makes it easy to add and delete elements from the collection that is being searched. To be specific, suppose the information about each student is an object of the following form, with the student ID stored in the key field:. We call each of these objects a record.

Of course, there might be other information in each student record. If student IDs are all in the range The record for student ID k can be retrieved immediately since we know it is in data[k]. What, however, if the student IDs do not form a neat range like Suppose that we only know that there will be a hundred or fewer and that they will be distributed in the range We could then use an array with 10, components, but that seems wasteful since only a small fraction of the array will be used.

It appears that we have to store the records in an array with elements and to use a serial search through this array whenever we wish to find a particular student ID. If we are clever, we can store the records in a relatively small array and still retrieve students by ID much faster than we could by serial search. In this case, we can store the records in an array called data with only components. We'll store the record with student ID k at location:.

The record for student ID is stored in array component data[7]. This general technique is called hashing. Each record requires a unique value called its key. In our example the student ID is the key, but other, more complex keys are sometimes used. A function called the hash function , maps keys to array indices. Suppose we name our hash function hash. If a record has a key of k , then we will try to store that record at location data[hash k ]. Using the hash function to compute the correct array index is called hashing the key to an array index.

The hash function must be chosen so that its return value is always a valid index for the array. Given this hash function and keys that are multiples of , every key produces a different index when it was hashed.

Thus, hash is a perfect hash function. Unfortunately, a perfect hash function cannot always be found. Suppose we no longer have a student ID , but we have instead. The record with student ID will be stored in data[3] as before, but where will student ID be placed? So there are now two different records that belong in data[3].

This situation is known as a collision. In this case, we could redefine our hash function to avoid the collision, but in practice you do not know the exact numbers that will occur as keys, and therefore, you cannot design a hash function that is guaranteed to be free of collisions. Typically, though, you do know an upper bound on how many keys there will be. The usual approach is to use an array size that is larger than needed. The extra array positions make the collisions less likely.

A good hash function will distribute the keys uniformly throughout the locations of the array. If the array indices range from 0 to 99, then you might use the following hash function to produce an array index for a record with a given key:.

The search stops when the item is found or when the search has examined each item without success. This technique is probably the easiest to implement and is applicable to many situations. The running-time of serial search is easy to analyze.

We will count the number of operations required by the algorithm, rather than measuring the actual time. For searching an array, a common approach is to count one operation each time that the algorithm accesses an element of the array. Usually, when we discuss running times, we consider the "hardest" inputs, for example, a search that requires the algorithm to access the largest number of array elements.

This is called the worst-case running time. For serial search , the worst-case running time occurs when the desired item is not in the array. In this case, the algorithm accesses every element. Thus, for an array of n elements, the worst-case time for serial search requires n array accesses. An alternative to worst-case running time, is the average-case running time, which is obtained by averaging the different running times for all inputs of a particular kind.

For example, if our array contains ten elements, then if we are searching for the target that occurs at the first location, then there is just one array access. If we are searching for the target that occurs at the second location, then there are two array accesses. And so on through the final target, which requires ten accesses. The average of all these searches is:. Both worst-case time and average-case time are O n , but nevertheless, the average case is about half the time of the worst-case.

A third way to measure running time is called best-case , and as the name suggests, it takes the most optimistic view. The best-case running time is defined as the smallest of all the running times on inputs of a particular size. For serial search, the best-case occurs when the target is found at the front of the array, requiring only one array access.

Thus, for an array of n elements, the best-case time for serial search requires just 1 array access. Unless the best-case behavior occurs with high probability, the best-case running time is generally not used during analysis. Hashing has a worst-case behavior that is linear for finding a target, but with some care, hashing can be dramatically fast in the average-case.

Hashing also makes it easy to add and delete elements from the collection that is being searched. To be specific, suppose the information about each student is an object of the following form, with the student ID stored in the key field:. We call each of these objects a record. Of course, there might be other information in each student record.

If student IDs are all in the range The record for student ID k can be retrieved immediately since we know it is in data[k]. What, however, if the student IDs do not form a neat range like Suppose that we only know that there will be a hundred or fewer and that they will be distributed in the range We could then use an array with 10, components, but that seems wasteful since only a small fraction of the array will be used.

It appears that we have to store the records in an array with elements and to use a serial search through this array whenever we wish to find a particular student ID. If we are clever, we can store the records in a relatively small array and still retrieve students by ID much faster than we could by serial search. In this case, we can store the records in an array called data with only components.

We'll store the record with student ID k at location:. The record for student ID is stored in array component data[7]. This general technique is called hashing. Each record requires a unique value called its key. In our example the student ID is the key, but other, more complex keys are sometimes used. A function called the hash function , maps keys to array indices. Suppose we name our hash function hash. If a record has a key of k , then we will try to store that record at location data[hash k ].

Using the hash function to compute the correct array index is called hashing the key to an array index. The hash function must be chosen so that its return value is always a valid index for the array. Given this hash function and keys that are multiples of , every key produces a different index when it was hashed. Thus, hash is a perfect hash function. Unfortunately, a perfect hash function cannot always be found. Suppose we no longer have a student ID , but we have instead.

The record with student ID will be stored in data[3] as before, but where will student ID be placed? So there are now two different records that belong in data[3]. This situation is known as a collision.

In this case, we could redefine our hash function to avoid the collision, but in practice you do not know the exact numbers that will occur as keys, and therefore, you cannot design a hash function that is guaranteed to be free of collisions. Typically, though, you do know an upper bound on how many keys there will be.