Instructor Notes

Word Counting Lab

Supplemental Material for Baldwin and Scragg, Algorithms and Data Structures: The Science of Computing; Charles River Media, 2004 (Now published by Cengage Learning)

Site Index


The exercise on Counting Words takes longer to complete, and is more open-ended, than most lab exercises written for Algorithms and Data Structures: The Science of Computing. For these reasons, students may need somewhat different support for this exercise than for others. These notes discuss some of the support I have found helpful for students, and provide other tips for using the exercise in a course.

Integrating the Lab into a Course

Because of its length, this project works better as a homework assignment than as a closed laboratory exercise. Allowing students one and a half to two weeks to complete the project gives them time to identify the design choices they face and make those choices carefully, to thoroughly test their program, etc. The instructor also has an opportunity to discuss common questions or problems in class meetings. If done as a closed laboratory exercise, students should be given specific parts of the project to do before and after the lab meeting. For example, students might be required to choose data structures and design a class structure for their program before the lab meeting, to code the program and test it for logical correctness during the lab, and to test it for performance on large texts and write a lab report after the lab.

Because of its open-endedness, this exercise makes a good capstone project for a course. It asks students to think back to and compare multiple data structures, to understand the implications of those structures’ theoretical performance for real programs, and to make some non-trivial design decisions. All of these things should help students synthesize what they learned from individual sections of Algorithms and Data Structures: The Science of Computing into a coherent whole.

Testing

One of the beauties of this exercise is that asymptotic differences in data structures and algorithms translate into significant differences in the real performance of the final programs. For example, I once compared a list-based solution (Θ(n2) time to process an n-word text) to a tree-based solution (Θ(nlogn) time) as an in-class demonstration. The list-based solution took about 9 minutes to process Tolstoy’s novel War and Peace (allowing lots of time to discuss the pros and cons of theoretical performance analysis); the tree-based solution took only a few seconds (barely allowing time to note that the program had started). Good performance is therefore as important a criterion for success in this exercise as is correct code. This aspect of the project will be foreign to most students, and few will decide on their own to test their programs’ performance on large texts. I therefore recommend providing texts on which students can test their programs. I give students a few short, artificial, test files with which to test their program’s correctness, but also require them to demonstrate their programs running on files containing tens to hundreds of thousands of words.

Where can instructors find text files containing tens of thousands of words? One excellent source is a project called “Project Gutenberg.” Project Gutenberg is a long-running effort to make as many as possible of the world’s uncopyrighted books freely available in electronic form. Generally the books available from Project Gutenberg are older works on which the copyrights have expired. They include many classic works of literature (i.e., exactly the kinds of text used to motivate the exercise), ranging in length from short stories to large novels (I got the copy of War and Peace mentioned above from Project Gutenberg). Project Gutenberg is a good resource for anyone interested in on-line literature to know of, and using it in connection with this exercise is a good opportunity to expose students to it. For more information on Project Gutenberg, see http://www.gutenberg.org/

Associative Data Structures

Most of the examples of data structures in Algorithms and Data Structures: The Science of Computing contain simple, atomic, data (e.g., strings or integers). Many students have trouble generalizing these examples to structures that can associate a word with a count. I try to lead these students to seeing that they can define objects that contain both a word and a count (i.e., records), and can then define variations (i.e., subclasses) of the text’s data structures that use just the word part in searches.

Variations

The exercise as written concentrates on appropriate choice of data structures for solving a realistically sized problem. It does not emphasize such things as defining words in humanly intuitive ways, sophisticated text analysis, etc. These are, however, all ways in which the project can be extended. Instructors who want a more challenging or realistic project can assign a variation on the original exercise; other instructors may want to simply suggest extensions as options for students. Examples of possible extensions include…


Copyright © 2005 Charles River Media. All rights reserved.

Revised Aug. 29, 2005 by Doug Baldwin

Site Index