Supplemental Material for Baldwin and Scragg, Algorithms and Data Structures: The Science of Computing; Charles River Media, 2004 (Now published by Cengage Learning)
This document discusses the laboratory exercise entitled “Recurrence Relations and Execution Time”. This exercise requires students to use recurrence relations to predict the performance of an algorithm, and then do an experiment to test their predictions.
The primary purpose of this lab is to establish the relevance of recurrence relations to algorithms’ real-world behavior. To some extent, every experiment is about establishing the relevance of some theoretical model to real-world behavior, so this exercise can be seen as a paradigmatic exposure to experimentation’s role in science.
In addition to demonstrating that recurrence relations really can predict observable behavior, this lab has a number of secondary and tertiary roles. In order to demonstrate the real-world relevance of recurrence analyses, the lab asks students to perform two such analyses, and then to conduct an experiment. Thus the lab provides important practice working with recurrences, and designing and conducting experiments. Other benefits include
findLargest
algorithm this way, in order to
begin building students’ intuition for how dramatically different exponential
and linear are, and for how appallingly exponential times
grow. Similarly, we deliberately set this lesson in an experiment, so that
students will have direct, personal, experience with the consequences of
exponential growth.This lab is intended to be used with Section 7.2 of Algorithms and Data Structures: The Science of Computing, the section that introduces recurrence relations. However, certain things in the lab preview material from later chapters: exponential growth reappears as the main topic in Chapter 15, and testing empirical data for consistency with mathematical predictions is done more rigorously in Section 9.2.3, when asymptotic notation is available. We feel that these are important ideas, and that exposing students to them relatively informally now helps set the stage for their later formal treatment. Instructors may want to explicitly refer to this lab as an example or motivator when students encounter the material more formally later in the book.
This lab consists of two distinct parts (setting up and solving recurrence
relations, and conducting an experiment), each of which can be used by itself
if desired. For example, the recurrence analysis of findLargest
can
be used as a homework exercise separately from the experiment (or as a separately
collected
prelab exercise for the experiment). Similarly, instructors can give students
the closed forms for the recurrence relations, and only ask students to do
the experiment — this is particularly appropriate if the schedule
of a course is such that this lab serves as an introductory
motivator for studying recurrence relations, before students learn to
create them and find closed forms on their own. Instructors who wish to provide
an easier exercise in recurrence relations can also give students the recurrence
relations, and ask students just to find the closed forms (and do the experiment).
Instructors should be prepared to help students with several features of this lab.
Many students have only a very vague notion of what “proportional” really means, at best understanding it to mean that one function gets bigger when another does. These students don’t realize that “proportional” specifically means that one function is a constant multiple of the other. We deliberately did not tell students how to test their data for proportionality to the functions from the recurrence analysis, because we wanted to take this opportunity to force students to come to some understanding of proportionality. This does not, however, mean that every student should be forced to devise a test by themselves — in most cases, simply coming to the realization that they do not understand proportionality well enough to test it readies students to understand and remember an explanation offered by the instructor. Give students a chance to think about proportionality, but then answer their questions when they ask.
As mentioned earlier, many students initially understand an algorithm’s best case to be the problem that produces the smallest possible execution time, i.e., the smallest problem. Instructors may have to explain that the technical notions of “best case” and “worst case” are really ways of capturing factors other than problem size that influence an algorithm’s execution time. In this lab, the relevant factor is where in the array section the maximum value lies, which is independent of the natural measure of problem size (the size of the section).
In the environment in which we developed this lab (Macintosh G4 processors
running Mac OS X), n values of two or three thousand are required
to get measurable times for the best case runs of findLargest
,
while n values
around 25 yield painful (several seconds) run times for the worst case, and
worst case run times become intolerable (several minutes) for n values
around 30. This huge difference in ns will cause problems for some
students — either
they won’t realize how big n should be for the best case,
or they will try to use values too large for the worst case.
Best case running times will probably be small, with lots of variation between measurements, even with large values of n. These conditions may make it hard for students to gather data that definitively supports or refutes their hypothesis. Instructors should encourage students to do everything they can to gather good data (e.g., use a wide range of n values, make many measurements for each, try to eliminate as many external error sources as possible, etc). Even with these efforts, however, students may not be able to get a consistent proportionality between their theoretical formula and their measured times.
Students who use excessively large ns to measure worst case running time present an excellent opportunity to drive home the significance of exponential time. These students will often have some preliminary data for smaller ns, from which they can extrapolate the running time for the n they are currently using. If the closed form for the relevant recurrence relation is f(n), n1 and n2 are two n values, and t1 and t2 are the corresponding running times, then one expects that
t2 = t1 f(n2)/f(n1)
Instructors may have to coach students a little bit on how to do this extrapolation. When students do the extrapolation, it is not uncommon for the them to discover that they have started a run that is literally likely to last millions of years. The problem of exponential time is far more meaningful to a student who realizes that he or she has personally begun to wait out such a running time than to one who has only read or heard about it.
findLargest
AlgorithmThe findLargest
algorithm in this lab is not an obvious way of
finding the largest element in an array. We deliberately chose it for its interesting
running
times, not for its practical utility. A few students will need help understanding
how it works.
As n gets larger, the recursion in findLargest
will
eventually produce a stack overflow. This may be a problem for measuring best
case running times; worst case running times become prohibitive long before
the stack overflows. The value of n at which stack overflow occurs
varies considerably from one Java environment to another. For example, we ran
best case searches with n as high as 20000 without any overflow
under Mac OS X on a PowerBook G4 with 768 megabytes of RAM, using both the
CodeWarrior development environment and the command line Java interpreter
(version 1.4.2_03). On a Sun SparcStation 5 using the command line Java interpreter
(version 1.4.2_02), stack overflow occurred when n was slightly
under 4000.
It is very tempting for students to state the conclusion that they believe is the desired one for an experiment, even if it is not the conclusion that the data supports. Such behavior should not be tolerated! The whole point of experimentation is to find out if expectations are wrong. We tell students to always draw the conclusion their data supports (after carefully checking for errors in the experiment if the data seems unbelievable), and we take support for the stated conclusion into consideration when grading lab reports.
In this experiment, the most likely reason (other than poor experiment design)
for a student to have data that do not support the expected conclusion is low-quality
times for findLargest
’s best case (see the discussion under “Running
Times and n Values” for reasons why this data may be poor).
In this situation, students should draw a cautious conclusion (e.g., “the
data are unclear, but seem to suggest…”, or even “the data
neither support nor refute the hypothesis”).
While Java’s System.currentTimeMillis
method always measure
time in units of milliseconds, it need not have a resolution of
1 millisecond in all Java implementations. In other words, the clock may only “tick” every
couple of milliseconds, as long as the value returned by System.currentTimeMillis
changes
by the appropriate number of milliseconds at each tick. For example, we have
seen some Java implementations in which the clock ticks every millisecond,
and others in which it ticks approximately every 10 milliseconds. Thus, do
not be surprised if students cannot measure a change in time as small as 1
millisecond.
The following code can be run to find the resolution of System.currentTimeMillis
in
a particular Java implementation:
long start = System.currentTimeMillis();
while ( System.currentTimeMillis() == start ) { }
long end = System.currentTimeMillis();
System.out.println( "Clock resolution is " + ( end - start ) );
Copyright © 2004 Charles River Media. All rights reserved.
Revised Aug. 9, 2005 by Doug Baldwin