Supplemental Material for Baldwin and Scragg, Algorithms and Data Structures: The Science of Computing; Charles River Media, 2004
This lab revolves around a recursive algorithm for finding the largest value in an array. The algorithm’s best and worst case execution times differ dramatically. Recurrence relations are used to predict both times, and then an experiment determines whether these mathematical derivations describe the real behavior of the algorithm.
Consider finding the largest value in the first n positions in an array of integers. This problem might be described pictorially as follows:
One approach is to find the largest value in the first n-1 positions of the array and compare it to the nth value. If the nth value is larger, return it, otherwise return the largest value from the first n-1 positions. This idea can be phrased as a recursive algorithm in Java as follows. Note that A is the array in which to find the largest item.
static int findLargest( int[] A, int n ) {
if ( n <= 1 ) {
return A[0];
}
else if ( A[n-1] > findLargest( A, n-1 ) ) {
return A[n-1];
}
else {
return findLargest( A, n-1 );
}
}
Notice that if the nth value is the largest, then this
algorithm only sends one recursive findLargest
message (the one
in the comparison to A[n-1]). However, if the nth value
is not the largest, then this algorithm sends two recursive messages — one
in the comparison, and a second in order to return the maximum of the first
n-1 values. These observations suggest that the algorithm’s
best and worst case execution times will be different. This lab involves quantifying
and measuring the difference.
The algorithm’s best case will happen when every invocation of findLargest
discovers
that the last value in its section of the array is the largest — in
other words, when the values in the array are exactly in increasing order.
Conversely,
the worst case occurs when no invocation of findLargest
ever discovers
that the last value in its section is the largest — in other words, when
the values in the array are exactly in decreasing order.
To abstractly estimate the execution time of a recursive algorithm, set up
and solve a recurrence relation that counts the number of times the algorithm
executes some “typical” operation — an operation that is
somehow representative of the amount of work the algorithm does. In many value-producing
algorithms,
the number of return
statements executed is a good choice
for such an operation: every invocation of the algorithm must return exactly
once, so the number of return
statements executed “represents” the
total number of times the algorithm executes.
Recurrence analysis of findLargest
will produce mathematical
expressions for its best and worst case operation counts in terms of n.
The experiment then tests
whether these expressions also describe the algorithm’s real-world running
time. At best, a real program’s running time will be proportional to
the value of a theoretical operation count, so the hypothesis
for the experiment will be, roughly, that the best case running time of findLargest
is
proportional to one function of n, and the worst case running time
proportional to another.
The best case running times will probably be small, and may contain outliers. This makes those times hard to measure accurately. Relatively large values of n may be necessary in order to get measurable times. Chapter 4 of Algorithms and Data Structures: The Science of Computing contains other information about measuring running times, averaging multiple measurements to reduce random error, etc. If the experiment is done on a particularly fast computer, or with a low resolution clock function, the technique in Section 9.3 for measuring times less than the clock’s resolution may be helpful.
In at least some Java systems, the first time a program does something often
takes significantly longer than subsequent times. For example, the first findLargest
below
might take more time than the later two:
public static void main( String[] args ) {
int[] A = new int[ ... ];
int n = ...;
findLargest( A, n );
findLargest( A, n );
findLargest( A, n );
}
This suggests that writing a program that calls (and times) findLargest
once,
and then running that program many times, is not an accurate way to get multiple
measurements
of how long findLargest
takes. Rather, write a program that calls (and times)
findLargest
many times, and run the program once.
The worst case running times for findLargest
will probably be very large.
Surprisingly small values of n may be necessary to keep those times low enough
that the experiment completes in a reasonable amount of time.
As mentioned earlier, the hypothesis for this experiment will be that execution times are proportional to certain functions of n. Data analysis will therefore require testing the measured times for proportionality to the functions. Think about the precise meaning of “proportional” in order to devise an appropriate test.
This exercise consists of two main steps: mathematically deriving the best
and worst case performances of findLargest
, and then conducting
an experiment to verify that the mathematical results describe real-world performance.
Set up recurrence relations for the best and worst (i.e., smallest and largest)
number of return
statements findLargest
executes
while finding the largest value in an n-item array section.
Find, and
prove
correct,
closed
forms for
both recurrence relations. The recurrence relations and closed forms should
give the number of operations executed as functions of n.
The closed forms from the recurrence relations are, theoretically, proportional
to the actual best and worst case running times of findLargest
.
The second step in this lab is
to design
and conduct an experiment that tests this hypothesis. This entails writing
a complete program around the findLargest
method. This program
will call findLargest
on various n values for both
best and worst case arrays, time
those
calls,
and display the results in a form that can be analyzed for consistency with
the closed forms. See “The Experiment” in
the “Background” section
of this document for more suggestions on designing and conducting
this experiment.
This lab is due on Monday, February 28. Turn in a lab report describing your experiment. This report should discuss the following:
I would expect that you can cover these things in about 2 pages of text, not counting the program printout. However, I do not have hard and fast limits on how long the report has to be — the real requirement is that it say everything a reader needs to know in order to determine what your results are and whether they are believable, without using more text than necessary to say those things.
Portions copyright © 2004. Charles River Media. All rights reserved.
Revised Feb. 20, 2005