SUNY Geneseo Department of Computer Science

Lab 2 -- Introduction to Experimentation

CSci 240, Spring 2007

Prof. Doug Baldwin

Due Tuesday, January 30

Purpose

The purpose of this lab is to give you concrete experience with doing experiments in computer science, and with some of the concerns that arise in doing experiments.

Background

This exercise asks you to design and conduct an experiment that tests a hypothesis about how the time needed to draw certain tree-like figures relates to the size of the figure. General information about conducting experiments in computer science appears in chapter 4 of our text, and will be discussed in the January 24 lecture. The folowing sections of this handout provide information on the hypothesis you will test and some updated information about measuring time in Java 1.5.

Drawing Trees

A simple tree-like figure can be defined recursively as follows:

A tree of size 1 is a short green line (looks a bit like a leaf)
A tree of size n > 1 is a gray line (a trunk) at whose end are two trees of size n-1, angled left and right of the trunk.

For example, here are trees of sizes 1, 2, and 3:

[Recursive Trees]

I have provided you a Java program that draws such trees. The program is on the "Outbox" file server, in folder ComputerScience/baldwin/CSci240/TreePicture. It is set up as a complete XCode project. If you use XCode, you can simply copy this project to your own disk and open it up. If you use some other environment, the key things you need are the source files "TreePicture.java" and "TreeCanvas.java." As delivered, the program will draw a tree of size 2. However, it is easily changed to draw trees of any size from 1 to 13, simply by changing the constant TREE_SIZE defined in "TreePicture.java."

Notice that a tree of size n contains 2 trees of size n-1. This suggests that the time required to draw a tree of size n should be roughly twice the time required to draw a tree of size n-1. Put another way, the time required to draw a tree roughly doubles every time you increase the size by 1. This suggests that the time to draw a tree of size n should be roughly proportional to 2ⁿ. This supposed proportionality is the hypothesis you will test in your experiment.

Measuring Time in Java 1.5

Java 1.5 introduced a new function for measuring time, System.nanoTime. This function is superior to the older System.currentTimeMillis for timing, because System.nanoTime is guaranteed to use the most accurate clock available on the computer running the program, and it returns a number that measures time in units of nanoseconds. While you are unlikely to find a computer whose clock actually has a precision of 1 nanosecond, you will often find better precisions than 1 millsecond. For example, my Macintosh seems to offer a precision of 1 microsecond via System.nanoTime. System.nanoTime takes no parameters, and returns a long. Typical code to time something via System.nanoTime thus looks like this:

    long startTime = System.nanoTime();
    // ... The code you want to time goes here ...
    long endTime = System.nanoTime();
    // Do whatever you want based on startTime and endTime.
    // For example, the time for the code to execute is endTime - startTime, etc.

Exercise

Design and carry out an experiment that tests the hypothesis that the time required to draw the trees described in the "Background" section is indeed roughly proportional to 2ⁿ, where n is the size of the tree.

You can use my tree-drawing program as a starting point for your experiment. The program has everything needed to draw trees, and a redraw method that may be helpful if you want to time several tree-drawing operations in a single run of the program, but no code that really gathers any data on how long it takes to draw a tree or otherwise performs any really useful part of your experiment. You are free to add any such code you wish to either "TreePicture.java" or "TreeCanvas.java," as you see fit.

The times required to draw anything in Java tend to be highly variable. In particular, the time required to draw a tree when a window first becomes visible is likely to be much longer than the time for the average drawing thereafter. Take this variability, and the likelihood that the time for the first drawing is not very representative of drawing times in general, into account in designing your experiment.

Because of the variability in drawing times, you will probably see a number of outliers in the data you gather. There are several ways you can deal with outliers (see the discussion on page 104 of the text), although my preference is to rely on averaging many measurements to smooth away their impact. If you set up a spreadsheet to do your data analysis, you can easily average tens or hundreds of samples for each n value.

Note that if you try to draw a tree of size greater than 13 in my program as delivered, parts of the tree will lie outside the program's window. I would assume that "drawing" things outside of a window is faster than drawing things inside it, since things outside the window don't have to be drawn at all. This means that if you want to draw trees of sizes greater than 13, you will have to change the size of the program's window. Unless you have a very big screen, it is unlikely that a larger window will fit on it. Thus, I expect that you will have to restrict your experiment to looking at trees with sizes between 1 and 13. In fact, if you are running on a smaller screen than the one on which I developed the tree-drawing program, you may have to reduce the size of the window, and thus the largest tree size you can draw.

Your job is to test a hypothesis that tree drawing time is "roughly proportional" to something. To test this hypothesis, you will need some prediction that you can look for in the data you gather. A good prediction is based on the definition of "proportional:" x is proportional to y if there is some constant of proportionality, c, such that x = cy. Thus a good prediction to start with is that if drawing time is indeed proportional to 2ⁿ, then dividing drawing time by 2ⁿ for a variety of different n values will reveal a common constant of proportionality. Unfortunately, the real world is nowhere near this simple -- theoretical analyses tend to gloss over minor details (for instance, the time required to draw the trunk of a tree), computers do all sorts of things behind the scenes that add to real execution time without appearing in theoretical analyses, etc. Thus, it is more realistic to predict that if execution time is proportional to 2ⁿ, then for large enough n values, ratios of running time to 2ⁿ will settle into some range of "constants" of proportionality.

Follow-Up

Turn in a written report on your experiment and the conclusions you reach from it. This report must be in my hands on the due date given at the beginning of this handout.

Your report should follow common professional practice in computer science for reporting scientific results. Specifically, it should address each of the following areas:

Introduction: What is the problem or question your work addresses? Why is it an interesting problem or question? For this report, you might or might not feel that this is an interesting problem -- if you don't, that's OK; if you do, you should certainly say why.
Background: What is already known about this problem or question? This is a good place to cite or quote others' work that is related to what you are doing. For a class laboratory report, you do not need elaborate citations and bibliographies -- it is sufficient to summarize your starting points with phrases such as "as shown in class..." or "as explained in the lab handout...," etc. For more formal reports, bibliographies and citations are definitely called for.
What You Did: This is likely to be one of the main parts of the report, describing what you did. In this report, it will be a description of your experiment, including details of how you modified the program, what n values you collected data for, etc. Attaching a printout of the modified program as an appendix to the report is an excellent way to provide many details of your experiment. This allows you to say things such as "as shown in method such-and-such in the attached code..." in the main report instead of trying to explain programming details in prose.
Results: What were the results of the things you did? In the case of a report on an experiment, this will primarily be a presentation of the data you collected and any analysis of that data that you did (e.g., averages you computed, other calculations you did to test your hypothesis, etc.) Some explanation of why you analyzed the data the way you did, if not already discussed as part of "What You Did," is also appropriate here.
Conclusions: What do you conclude from your results? Was your initial hypothesis or other belief confirmed or not? Is further work needed to finish answering the questions you started with, or to follow up new questions raised by your results?

Typically, each of these things will be addressed in a separate section of the report, usually appearing in the order listed above. For a report on a class experiment such as this one, many sections will only be a paragraph, or even a couple of sentences, long. I would expect your entire report to fit in 2 to 3 pages, exclusive of any program printout you attach.