Supplemental Material for Baldwin and Scragg, Algorithms and Data Structures: The Science of Computing; Charles River Media, 2004 (Now published by Cengage Learning)
This document supplies information beyond what is in the student instructions for the “Search Times for Lists and Trees” laboratory exercise.
The main goal of the exercise is to demonstrate to students the huge difference between linear and logarithmic times. The exercise does this most effectively when students measure the times for searches in both data structures in a single experiment. However, the resulting lab will probably take considerably longer than our nominal lab session of two hours.
Pre- and post-lab activities can significantly reduce the amount of time needed in the lab. For example, students can derive the theoretical execution time for searching a list, and design the algorithm for initializing a balanced tree, before coming to lab. They can write the lab report after leaving the lab.
The exercise can also be broken into two experiments, one on lists and one on trees, to fit it into two lab sessions. Many students can do the list experiment in well under two hours; the time for doing the tree experiment will be mostly determined by the time it takes students to devise the algorithm for initializing balanced trees. Splitting the lab in this manner can also be helpful if a course covers lists and trees at widely separated times, and an instructor wants to use the list and tree parts of this experiment to reinforce the corresponding sections of the course. In any split experiment, however, try to do something (e.g., class discussion, a required section in the second lab report) to make students reflect on the observable relative performances of lists and trees.
We frequently notice that it takes a Java program longer to do something for
the first time than to do it subsequently. For example, in this lab, the first
find
message a program sends to a list or tree is likely to take
much longer to execute than other find
messages to the same object.
For best accuracy, students should therefore send one or even two find
messages
to their data structures before starting to time find
s.
One natural extension to this lab is to ask students to prove their algorithm
for constructing a balanced binary tree correct, and/or to derive its execution
time in terms of the number of items in the tree. The first of these exercises
is interesting because the algorithm will probably not be obviously correct
to students, and because the validity of the experiment depends crucially on
its correctness — i.e., students have a personal stake in the algorithm
being correct. The second exercise is interesting because there are two slightly
different algorithms students are likely to devise, one of which builds the
tree from the root down via addNode
messages, and the other of
which builds the tree from the leaves up by recursively creating new trees.
The former of
these algorithms runs in Θ( n logn )
time, while the latter runs in Θ( n )
time. Deriving execution times and comparing them across students (perhaps
via class discussions
after the lab) can thus demonstrate
in a distinctly personal way the reality of faster and slower solutions
to a single problem.
We did not include these extensions in the lab exercise because it is already very long, but instructors who wish can add them. This may be particularly appropriate if the lab is split over several laboratory sessions.
The subclasses that students write for this exercise both have a constructor
that initializes a data set containing a range of integers. The constructor
for the List
subclass can be written either iteratively
or recursively.
We try
to
encourage
students to think recursively,
and so favor the recursive approach, but there is no other reason to prefer
one approach
over the other. The constructor
for the OrderedTree
subclass would be very difficult to write
without recursion.
Searching large lists can lead to stack overflows in the Java interpreter. In the environment most of our students use (the CodeWarrior development environment running under Mac OS X on dual G4 processors with a total of 768 megabytes of main memory), lists with tens of thousands of elements can be searched. However, other environments may encounter stack overflows for smaller lists. In any environment, students should be able to search lists of up to a thousand elements or more, and as long as students measure times carefully and accurately, this should provide an ample range of data for testing the hypothesis.
Graphing the search times for both data structures on the same axes can be an illuminating way to see the difference between linear and logarithmic performance — typically, a graph that shows the linear growth of search time for lists makes the logarithmic time to search trees indistinguishable from 0. Drawing such a graph drives home to students that “log n” is not just an obscure mathematical function when it describes execution time. Beware, however, that because the graph does not clearly show the shapes of both execution time functions, it is not by itself an adequate analysis of the data.
Copyright © 2004 Charles River Media. All rights reserved.
Revised Aug. 9, 2005 by Doug Baldwin