SUNY Geneseo Department of Computer Science

Motivation for Trees

{Date}

CSci 240, Spring 2007
Prof. Doug Baldwin

Return to List of Lectures

Previous Lecture

Misc

Hour Exam 2

Monday, April 2
Covers material since exam 1 (e.g., loop invariants, correctness and performance analysis of loops, sorting)
Similar in style and rules to exam 1

Problem Set 11 due today

Questions?

Recurrences for Master Theorem

[a = Number of Recursions, b = Divisor in Recursion, g(n) = Work Other than Recursion]

So for problem 3
- T(n) = 8 T(n/2) + Θ(n²)

Word Counting, Revisited

Idea for a really fast word counter (already explored by some of you)

Define a class that represents word "records" containing the text of a word and the number of times it has been seen

        class WordRecord {
            public String text;
            public int occurrences;
        }

Maintain a set of these records as you read the file.
- For each word read, see if there is already a record for it in the set
- If so, increment that record's number of occurrences
- Otherwise, add a word record for that word, with 1 occurrence
Let d be the number of distinct words in a document
- d ≤ w
- Processing this set of word records is likely to take time dependent on d, not w
- e.g., sorting it should take Θ(d logd ) time which is less than Θ( w logw ), printing should take Θ(d) time, less than Θ(w), etc.

But...

        while not at end of file
            String word = file.next()
            search data set for a record containing this word
            if search succeeds, record is r
                r.occurrences = r.occurrences + 1
            else
                add new WordRecord(word,1) to data set

Running time depends on search algorithm
- Linear search, Θ(d)
  - Total time for word count would be approximately Θ(wd) > Ω(d²)
- Binary search, Θ( logd )
  - Requires sorted data set
  - If data set is a vector, array, etc. keeping it sorted means adding in right place, which takes Θ(d) time for shifting
    - Θ(d²) time again!
- (hash tables, Θ(1) average, Θ(d) worst case)

Want a data structure that supports

Searching in Θ(logd) time
Inserting in Θ(logd) time
A tree!

(After exam)

Tree terminology, properties, etc.

Read Chapter 13 to but not including Section 13.3.2

Pages 421 - 437

Next Lecture