SUNY Geneseo Department of Computer Science
Motivation for Trees
{Date}
CSci 240, Spring 2007
Prof. Doug Baldwin
Return to List of Lectures
Previous Lecture
Misc
Hour Exam 2
- Monday, April 2
- Covers material since exam 1 (e.g., loop invariants,
correctness and performance analysis of loops,
sorting)
- Similar in style and rules to exam 1
Problem Set 11 due today
Questions?
Recurrences for Master Theorem
Word Counting, Revisited
Idea for a really fast word counter (already explored by
some of you)
- Define a class that represents word "records" containing
the text of a word and the number of times it has been
seen
class WordRecord {
public String text;
public int occurrences;
}
- Maintain a set of these records as you read the file.
- For each word read, see if there is already a record for
it in the set
- If so, increment that record's number of occurrences
- Otherwise, add a word record for that word, with 1
occurrence
- Let d be the number of distinct words in a document
- d ≤ w
- Processing this set of word records is likely to take
time dependent on d, not w
- e.g., sorting it should take Θ(d logd ) time which is
less than Θ( w logw ), printing should take Θ(d) time,
less than Θ(w), etc.
But...
while not at end of file
String word = file.next()
search data set for a record containing this word
if search succeeds, record is r
r.occurrences = r.occurrences + 1
else
add new WordRecord(word,1) to data set
- Running time depends on search algorithm
- Linear search, Θ(d)
- Total time for word count would be approximately Θ(wd) > Ω(d2)
- Binary search, Θ( logd )
- Requires sorted data set
- If data set is a vector, array, etc. keeping it
sorted means adding in right place, which takes Θ(d) time for shifting
- (hash tables, Θ(1) average, Θ(d) worst case)
Want a data structure that supports
- Searching in Θ(logd) time
- Inserting in Θ(logd) time
- A tree!
Next
(After exam)
Tree terminology, properties, etc.
Read Chapter 13 to but not including Section 13.3.2
Next Lecture