SUNY Geneseo, Department of Computer Science


Lab 12

CSci 141 , Fall 2003

Prof. Doug Baldwin

Due Wednesday, November 19

Purpose

This lab is intended to show you directly what the Theta(log n) and Theta(n) performances of binary tree search and list search mean, especially in comparison to each other. The lab is also intended to increase your experimental skills.

Background

Mathematical analysis predicts that the worst-case time to search a balanced binary search tree of n items is Theta(log n). Mathematical analysis also predicts that the worst-case time to search a list of n items is Theta(n). It's not immediately clear what these mathematical abstractions mean in real-world terms. Your job in this lab is to test the above claims experimentally, and in so doing get some personal sense of what Theta(log n) and Theta(n) mean in practice.

Experimental Techniques

The execution times of searches on small data structures will probably be less than the resolution of Java's clock function (System.currentTimeMillis). The Science of Computing describes a technique for measuring times less than the clock's resolution that you will therefore probably find helpful for this experiment. It is in Section 9.2 of the book.

Once you have collected data on how long it takes the algorithms to run, you will need to test your data to see if it is consistent with the theoretical execution times. The theoretical times are given as asymptotic expressions, although this is basically just a way of saying that the times are roughly proportional to certain formulas. You have already analyzed experimental data to see if it is consistent with hypotheses about proportionality, and you can use the same idea (dividing the time to process an n-item data structure by the appropriate formula's value for that n, and seeing if the ratios settle towards some constant of proportionality) here. For a longer discussion of this analysis and its justification in terms of asymptotic notation, see Section 9.3.3 of The Science of Computing.

Be sure you are familiar with the material in Sections 9.2 and 9.3.3 of The Science of Computing before lab!

Exercise

Design and conduct an experiment that tests two hypotheses:

To do this experiment, you will need classes of trees and lists that carry out searches. I suggest that you define a "binary search tree" class that extends our BinaryTree class, and a "searchable list" class that extends our List class. Both of these subclasses should provide a search message. This message causes a tree or list to search itself for a particular string (the parameter to the message), returning true or false according to whether the string was in the data structure or not. You can then time how long instances of your classes take to handle search messages.

I have provided a number of trees and lists that you can search. They are all in files, and all the files are in a folder named "Data Structures", in our course folder on the "Course Materials" file server. All the data structures contain words taken from the beginning of a dictionary of English. The files have names such as "1WordTree", "7WordList", etc., which indicate the size and type of the data structure they contain. All the trees are full, and ordered as binary search trees.

Since the trees and lists I have given you contain English words ('though some are pretty obscure words), you can guarantee that your searches will fail (which is the worst case) if you search for something that isn't an English word -- for example "blstf", "this is a phrase", "Geneseo" (there are no proper nouns in the data structures either), etc.

The trees and lists I give you are plain, regular, binary trees and lists, respectively, so you will need a way to build instances of your searchable tree and list classes from them. I suggest that you provide copying constructors for your classes that initialize a searchable tree or list to be a copy of a regular tree or list, much as you did in Lab 9 and Lab 11.

The times you measure in this experiment will generally be short, which means they are highly influenced by random error. You will therefore probably need to be more careful in this experiment than in past ones to average multiple measurements of each time, to watch out for outliers and other evidence of error in your data, etc.

Follow-Up

Turn in a report on your experiment. This report should explain how you did the experiment (as usual, a printout of any code that you wrote can provide a large part of the necessary explanation), show the data you collected and the results of your analysis of that data, and explain the conclusions you reached about the hypotheses from that analysis.

Your report should also contain a discussion of how the times to search lists compare to the times to search trees, and how significant you think any differences between the structures' search times are in practical terms. I expect that one paragraph will be enough for this discussion.