Instructor Notes

Recurrence Relation Experiment

Supplemental Material for Baldwin and Scragg, Algorithms and Data Structures: The Science of Computing; Charles River Media, 2004 (Now published by Cengage Learning)

This document discusses the laboratory exercise entitled “Recurrence Relations and Execution Time”. This exercise requires students to use recurrence relations to predict the performance of an algorithm, and then do an experiment to test their predictions.

Design and Philosophy of the Exercise

The primary purpose of this lab is to establish the relevance of recurrence relations to algorithms’ real-world behavior. To some extent, every experiment is about establishing the relevance of some theoretical model to real-world behavior, so this exercise can be seen as a paradigmatic exposure to experimentation’s role in science.

In addition to demonstrating that recurrence relations really can predict observable behavior, this lab has a number of secondary and tertiary roles. In order to demonstrate the real-world relevance of recurrence analyses, the lab asks students to perform two such analyses, and then to conduct an experiment. Thus the lab provides important practice working with recurrences, and designing and conducting experiments. Other benefits include

This lab is intended to be used with Section 7.2 of Algorithms and Data Structures: The Science of Computing, the section that introduces recurrence relations. However, certain things in the lab preview material from later chapters: exponential growth reappears as the main topic in Chapter 15, and testing empirical data for consistency with mathematical predictions is done more rigorously in Section 9.2.3, when asymptotic notation is available. We feel that these are important ideas, and that exposing students to them relatively informally now helps set the stage for their later formal treatment. Instructors may want to explicitly refer to this lab as an example or motivator when students encounter the material more formally later in the book.

This lab consists of two distinct parts (setting up and solving recurrence relations, and conducting an experiment), each of which can be used by itself if desired. For example, the recurrence analysis of findLargest can be used as a homework exercise separately from the experiment (or as a separately collected prelab exercise for the experiment). Similarly, instructors can give students the closed forms for the recurrence relations, and only ask students to do the experiment — this is particularly appropriate if the schedule of a course is such that this lab serves as an introductory motivator for studying recurrence relations, before students learn to create them and find closed forms on their own. Instructors who wish to provide an easier exercise in recurrence relations can also give students the recurrence relations, and ask students just to find the closed forms (and do the experiment).

Teaching the Lab

Instructors should be prepared to help students with several features of this lab.

Proportionality

Many students have only a very vague notion of what “proportional” really means, at best understanding it to mean that one function gets bigger when another does. These students don’t realize that “proportional” specifically means that one function is a constant multiple of the other. We deliberately did not tell students how to test their data for proportionality to the functions from the recurrence analysis, because we wanted to take this opportunity to force students to come to some understanding of proportionality. This does not, however, mean that every student should be forced to devise a test by themselves — in most cases, simply coming to the realization that they do not understand proportionality well enough to test it readies students to understand and remember an explanation offered by the instructor. Give students a chance to think about proportionality, but then answer their questions when they ask.

Best Case and Worst Case

As mentioned earlier, many students initially understand an algorithm’s best case to be the problem that produces the smallest possible execution time, i.e., the smallest problem. Instructors may have to explain that the technical notions of “best case” and “worst case” are really ways of capturing factors other than problem size that influence an algorithm’s execution time. In this lab, the relevant factor is where in the array section the maximum value lies, which is independent of the natural measure of problem size (the size of the section).

Running Times and n Values

In the environment in which we developed this lab (Macintosh G4 processors running Mac OS X), n values of two or three thousand are required to get measurable times for the best case runs of findLargest, while n values around 25 yield painful (several seconds) run times for the worst case, and worst case run times become intolerable (several minutes) for n values around 30. This huge difference in ns will cause problems for some students — either they won’t realize how big n should be for the best case, or they will try to use values too large for the worst case.

Best case running times will probably be small, with lots of variation between measurements, even with large values of n. These conditions may make it hard for students to gather data that definitively supports or refutes their hypothesis. Instructors should encourage students to do everything they can to gather good data (e.g., use a wide range of n values, make many measurements for each, try to eliminate as many external error sources as possible, etc). Even with these efforts, however, students may not be able to get a consistent proportionality between their theoretical formula and their measured times.

Students who use excessively large ns to measure worst case running time present an excellent opportunity to drive home the significance of exponential time. These students will often have some preliminary data for smaller ns, from which they can extrapolate the running time for the n they are currently using. If the closed form for the relevant recurrence relation is f(n), n₁ and n₂ are two n values, and t₁ and t₂ are the corresponding running times, then one expects that

Instructors may have to coach students a little bit on how to do this extrapolation. When students do the extrapolation, it is not uncommon for the them to discover that they have started a run that is literally likely to last millions of years. The problem of exponential time is far more meaningful to a student who realizes that he or she has personally begun to wait out such a running time than to one who has only read or heard about it.

The findLargest Algorithm

The findLargest algorithm in this lab is not an obvious way of finding the largest element in an array. We deliberately chose it for its interesting running times, not for its practical utility. A few students will need help understanding how it works.

As n gets larger, the recursion in findLargest will eventually produce a stack overflow. This may be a problem for measuring best case running times; worst case running times become prohibitive long before the stack overflows. The value of n at which stack overflow occurs varies considerably from one Java environment to another. For example, we ran best case searches with n as high as 20000 without any overflow under Mac OS X on a PowerBook G4 with 768 megabytes of RAM, using both the CodeWarrior development environment and the command line Java interpreter (version 1.4.2_03). On a Sun SparcStation 5 using the command line Java interpreter (version 1.4.2_02), stack overflow occurred when n was slightly under 4000.

Conclusions from the Experiment

It is very tempting for students to state the conclusion that they believe is the desired one for an experiment, even if it is not the conclusion that the data supports. Such behavior should not be tolerated! The whole point of experimentation is to find out if expectations are wrong. We tell students to always draw the conclusion their data supports (after carefully checking for errors in the experiment if the data seems unbelievable), and we take support for the stated conclusion into consideration when grading lab reports.

In this experiment, the most likely reason (other than poor experiment design) for a student to have data that do not support the expected conclusion is low-quality times for findLargest’s best case (see the discussion under “Running Times and n Values” for reasons why this data may be poor). In this situation, students should draw a cautious conclusion (e.g., “the data are unclear, but seem to suggest…”, or even “the data neither support nor refute the hypothesis”).

A Note on Clock Resolution

While Java’s System.currentTimeMillis method always measure time in units of milliseconds, it need not have a resolution of 1 millisecond in all Java implementations. In other words, the clock may only “tick” every couple of milliseconds, as long as the value returned by System.currentTimeMillis changes by the appropriate number of milliseconds at each tick. For example, we have seen some Java implementations in which the clock ticks every millisecond, and others in which it ticks approximately every 10 milliseconds. Thus, do not be surprised if students cannot measure a change in time as small as 1 millisecond.

The following code can be run to find the resolution of System.currentTimeMillis in a particular Java implementation:

Instructor Notes

Recurrence Relation Experiment

Design and Philosophy of the Exercise

Teaching the Lab

Proportionality

Best Case and Worst Case

Running Times and `n` Values

The `findLargest` Algorithm

Conclusions from the Experiment

A Note on Clock Resolution

Instructor Notes

Recurrence Relation Experiment

Design and Philosophy of the Exercise

Teaching the Lab

Proportionality

Best Case and Worst Case

Running Times and n Values

The findLargest Algorithm

Conclusions from the Experiment

A Note on Clock Resolution

Running Times and `n` Values

The `findLargest` Algorithm