SUNY Geneseo Department of Computer Science

Probabilistic Algorithms

Tuesday, May 7

CSci 242, Spring 2013
Prof. Doug Baldwin

Return to List of Lectures

Previous Lecture

Misc

Final

Tuesday, May 14, noon
Comprehensive, but emphasizing material since 2nd hour exam (e.g., backtracking, hashing, probability & expectation, randomized algorithms)
Designed for 2 hours, you have 3
Format and rules otherwise similar to hour exams (especially open reference materials)
Donuts and cider

Questions?

Probabilistic and Randomized Algorithms

Example: an alternative way to represent a dictionary (good if you want to minimize memory use)

Hash table mapping words to bits

T[ h(w) ] = 1 iff w is in the dictionary

String w is considered to be in the dictionary iff T[ h(w) ] = 1
Correctness is a matter of probability
- Probability that if T[ h(w) ] = 0 then w is not in the dictionary
  - P{ w is misspelled | T[ h(w) ] = 0 }
  - = 1
- Probability that if T[ h(w) ] = 1 then w is in the dictionary
  - P{ w is correctly spelled | T[ h(w) ] = 1 } = ?
  - From definition of conditional probability
    - P{ w is correctly spelled | T[ h(w) ] = 1 } = P{ w is correctly spelled and T[ h(w) ] = 1 } / P{ T[h(w)] = 1 }
    - P{ T[h(w)] = 1 | w is correctly spelled } = 1
    - P{ T[h(w)] = 1 | w is misspelled } = 1/2 (approximately)
    - So P{ T[h(w)] = 1 }
      - = P{ T[h(w)] = 1 | w is correctly spelled } * P{w is correctly spelled} + P{ T[h(w)] = 1 | w is misspelled } * P{w is misspelled}
      - Let S = P{w is correctly spelled}
      - P{ T[h(w)] = 1 } = P{ T[h(w)] = 1 | w is correctly spelled } * S + P{ T[h(w)] = 1 | w is misspelled } * (1-S)
      - = S + (1-S)/2
    - P{ w is correctly spelled | T[ h(w) ] = 1 }
      - = P{ w is correctly spelled and T[ h(w) ] = 1 } / P{ T[h(w)] = 1 }
      - = P{ w is correctly spelled and T[ h(w) ] = 1 } / ( S + (1-S)/2 )
    - But now what?
  - Applying Bayes’s Theorem...
    - Bayes’s Theorem
      - P{ A | B } = P{ A and B } / P{B}
      - P{ A and B } = P{ A|B } P{B}
      - P{ B and A } = P{B|A} P{A}
      - P{ A|B } P{B} = P{B|A} P{A}
      - P{ A|B } = P{B|A} P{A} / P{B} (Bayes’s Theorem)
    - P{ w is correctly spelled | T[ h(w) ] = 1 }
      - = P{T[h(w)] = 1 | w is correctly spelled} * P{w is correctly spelled} / P{T[h(w)] = 1}
      - = 1*S/( S + (1-S)/2 )
      - = 2*S/(1+S)
Depending on S (i.e., how good a speller the user is), this may or may not be good enough
But if you add a second hash table with an independent hash function, and check w in both, the error probability goes down quadratically
- Error probability with 1 hash table = 1 - 2S/(1+S)
- Error probability with 2 hash tables = (1 - 2S/(1+S))²
- With 3 hash table = (1 - 2S/(1+S))³
- With n hash tables = (1 - 2S/(1+S))ⁿ
- Hash tables are small and thus fairly cheap to add, and doing so drives down the error probability exponentially!

Generalizing…

Probabilistic algorithms have the property seen above that something about them (e.g., correctness) is a matter of probability, not certainty
Randomized algorithms use a random number generator to be probabilistic
In practice, algorithms that are probabilistically correct are decision algorithms (i.e., they answer “yes” or “no”), and are guaranteed to be right with one answer and have a probability less than 1 of being right with the other
The error probability can then be made arbirtrarily low by running multiple copies of the algorithm with different parameters and only accepting the probably-correct answer if all copies produce it
A common example today is randomized primality checking

(No More Lectures!)