SUNY Geneseo Department of Computer Science
Probabilistic Algorithms
Tuesday, May 7
CSci 242, Spring 2013
Prof. Doug Baldwin
Return to List of Lectures
Previous Lecture
Misc
Final
- Tuesday, May 14, noon
- Comprehensive, but emphasizing material since 2nd hour exam (e.g., backtracking, hashing, probability & expectation, randomized algorithms)
- Designed for 2 hours, you have 3
- Format and rules otherwise similar to hour exams (especially open reference materials)
- Donuts and cider
Questions?
Probabilistic and Randomized Algorithms
Example: an alternative way to represent a dictionary (good if you want to minimize memory use)
- Hash table mapping words to bits
- String w is considered to be in the dictionary iff T[ h(w) ] = 1
- Correctness is a matter of probability
- Probability that if T[ h(w) ] = 0 then w is not in the dictionary
- P{ w is misspelled | T[ h(w) ] = 0 }
- = 1
- Probability that if T[ h(w) ] = 1 then w is in the dictionary
- P{ w is correctly spelled | T[ h(w) ] = 1 } = ?
- From definition of conditional probability
- P{ w is correctly spelled | T[ h(w) ] = 1 } = P{ w is correctly spelled and T[ h(w) ] = 1 } / P{ T[h(w)] = 1 }
- P{ T[h(w)] = 1 | w is correctly spelled } = 1
- P{ T[h(w)] = 1 | w is misspelled } = 1/2 (approximately)
- So P{ T[h(w)] = 1 }
- = P{ T[h(w)] = 1 | w is correctly spelled } * P{w is correctly spelled} + P{ T[h(w)] = 1 | w is misspelled } * P{w is misspelled}
- Let S = P{w is correctly spelled}
- P{ T[h(w)] = 1 } = P{ T[h(w)] = 1 | w is correctly spelled } * S + P{ T[h(w)] = 1 | w is misspelled } * (1-S)
- = S + (1-S)/2
- P{ w is correctly spelled | T[ h(w) ] = 1 }
- = P{ w is correctly spelled and T[ h(w) ] = 1 } / P{ T[h(w)] = 1 }
- = P{ w is correctly spelled and T[ h(w) ] = 1 } / ( S + (1-S)/2 )
- But now what?
- Applying Bayes’s Theorem...
- Bayes’s Theorem
- P{ A | B } = P{ A and B } / P{B}
- P{ A and B } = P{ A|B } P{B}
- P{ B and A } = P{B|A} P{A}
- P{ A|B } P{B} = P{B|A} P{A}
- P{ A|B } = P{B|A} P{A} / P{B} (Bayes’s Theorem)
- P{ w is correctly spelled | T[ h(w) ] = 1 }
- = P{T[h(w)] = 1 | w is correctly spelled} * P{w is correctly spelled} / P{T[h(w)] = 1}
- = 1*S/( S + (1-S)/2 )
- = 2*S/(1+S)
- Depending on S (i.e., how good a speller the user is), this may or may not be good enough
- But if you add a second hash table with an independent hash function, and check w in both, the error probability goes down quadratically
- Error probability with 1 hash table = 1 - 2S/(1+S)
- Error probability with 2 hash tables = (1 - 2S/(1+S))2
- With 3 hash table = (1 - 2S/(1+S))3
- With n hash tables = (1 - 2S/(1+S))n
- Hash tables are small and thus fairly cheap to add, and doing so drives down the error probability exponentially!
Generalizing…
- Probabilistic algorithms have the property seen above that something about them (e.g., correctness) is a matter of probability, not certainty
- Randomized algorithms use a random number generator to be probabilistic
- In practice, algorithms that are probabilistically correct are decision algorithms (i.e., they answer “yes” or “no”), and are guaranteed to be right with one answer and have a probability less than 1 of being right with the other
- The error probability can then be made arbirtrarily low by running multiple copies of the algorithm with different parameters and only accepting the probably-correct answer if all copies produce it
- A common example today is randomized primality checking
(No More Lectures!)