SUNY Geneseo Department of Computer Science
CSci 242, Spring 2013
Prof. Doug Baldwin
Due Wednesday, May 1In our April 18 lecture we talked about the problem of breaking substitution ciphers, and considered how a backtracking search through possible substitutions, supported by a dictionary for the language in question, could solve the problem. Hash tables are an ideal data structure for representing the dictionary, and this exercise asks you to do exactly that. Note, however, that this exercise asks you to implement a standard dictionary, specifically a data structure that can be searched quickly for any given English word. The dictionary representation we talked about on April 18 was designed to allow quick searches by length of word. You can use dictionaries as implemented in this assignment for cryptanalysis, and conversely, you can use ideas from hash tables to build a dictionary accessed by word length should you decide to do the larger cryptanalysis exercise that way.
Chapter 11 of our textbook talks about hash tables. The most important sections for this exercise are probably 11.2 and 11.3, describing the basic idea of hashing and hash functions, respectively. We talked about this material in our April 23 lecture.
To test your solution to this exercise, and to eventually apply it to code breaking, you will need a dictionary of English. One popular choice is the Unix spelling dictionary, derived from Webster’s second international edition and thus no longer copyrighted. It is simply a plain text list of approximately 1/4 million words, one per line. There are no definitions, parts of speech, or similar information that a complete dictionary would provide, but for applications such as this exercise you don’t need those things. On my Macintosh it can be found in directory /usr/share/dict; on other Unix-derived operating systems you can probably find it in this directory or one with a similar name. I have also put a copy on the Web at http://cs.geneseo.edu/~baldwin/csci242/web2.txt.
Write code to implement a hash-table-based dictionary. The main operation the dictionary should provide is one that takes a word (i.e., a string) as its parameter and searches for that word in the hash table; this operation only needs to return True if the word is present in the table and False otherwise. Searches should plausibly run in Θ(1) expected time, although you do not have to rigorously demonstrate that they do. You will also want some way to initialize the hash table from a dictionary file such as the Unix dictionary described above.
You can write your dictionary code in any programming language you choose. Keep in mind, however, that a future exercise will ask you to combine it with a backtracking search algorithm for breaking substitution ciphers, so you should pick a language that you are willing to write further code in too.
Because your dictionary will be used in a larger program, you should put some thought into making it reusable. For example, it makes sense to write it as a class with a fairly simple interface (e.g., a search message as described above and a constructor that initializes it from a file). Code that you need to test the dictionary can be in a separate class in order to be easily left behind when you use the dictionary proper in another program.
I will grade this exercise in a face-to-face meeting with you. Please bring your dictionary to your meeting in some form that I can both read and run. During the meeting, I will look over your code, give you my reactions to it, answer any questions you have about it, etc.
Meetings should take about 15 minutes. Sign up either on the paper schedule outside my office or via Google calendar. If you work in a group, the entire group can schedule a single meeting with me. Be sure your meeting ends by the end of the day on the due date above.