SUNY Geneseo Department of Mathematics
Math 230 02
Spring 2015
Prof. Doug Baldwin
Complete by Friday, May 1
Grade by Tuesday, May 5
This lesson develops your understanding of cell arrays in Matlab. In the process, it introduces you to the mathematical notion of a “relation” and how it provides a mathematical foundation for computer databases.
Many programs operate on collections of data. Vectors (and matrices) are familiar ways of forming collections in Matlab. However, all of the elements in a vector or matrix must be the same type (for instance, you can’t have a vector in which some elements are numbers and others are strings—either all must be numbers, or all must be strings—and in fact, strings of the same length). Cell arrays are a Matlab data structure that allow you to create things much like vectors and matrices, but with elements of different types. Section 8.1 of Attaway’s text discusses cell arrays, and we talked about them in class on April 22.
One type you will need for this lab that we haven’t talked about in class, although you have used it informally, is the string. We will talk about strings in class on April 27, and Attaway discusses them in sections 7.1 and 7.2 of her text.
Programs that work with large data structures (such as cell arrays) typically also need to save those structures in, and retrieve them from, files. Matlab provides a very easy-to-use way of saving Matlab variables in so-called “.mat” files. Matlab also makes it easy to read variables from “.mat” files. See section 9.3 of Attaway’s book for more information. We will discuss files in class on April 24.
More sophisticated data storage than single files is often provided by “databases.” A database is typically a collection of files that work together to support quick and large-scale searching and updating. For example, most modern online services such as email, e-commerce, etc. are supported by databases that record email messages and their status (read, archived, etc.), items for sale, customer information (name, password, credit card number, etc.), and so forth. Databases are very conveniently modelled via the mathematical concept of a “relation.” For an introduction to relations, see the “Introduction to Relations” video lecture at http://www.geneseo.edu/proofspace/ch5sec1. We talked about this material and some extensions to it in class on April 17.
The following questions introduce you to the relational data model, and some of the operations that are common in relational databases. These operations are technically operations of the so-called “relational algebra.”
A relation can very nicely be represented by a two-dimensional cell array in Matlab,
with each row representing one tuple. I have provided a small database represented in
this way. The database is in a file named “database.mat,”
which you can download from our “Exercises”
page on myCourses. It consists of four relations: three of them (in variables
mathContributions
, physicsContributions
, and
csContributions
) describe some famous mathematicians, physicists, and
computer scientists, respectively, some of their major contributions to
mathematics/physics/computer science, and a year (more or less, some are historically
unclear) for that contribution. The fourth relation, in variable institutions
,
describes institutions with which some of the people in the first relations were
associated, along with the years the association began and ended.
Load the database into Matlab, and look at the four relations to get a sense for what is in them and how it is represented.
Figure out the general form of Matlab expression that picks a single attribute out of a single tuple in a relation. Test your ideas on the relations from my database. Is picking an attribute out of a tuple better done with cell indexing or content indexing? (Be sure you understand the difference!) Why? Can you use a similar expression to assign a new value to an attribute within a tuple?
A simple but useful operation in relational algebra is projection,
which simply amounts to extracting one or more attributes from every tuple in a
relation, producing a new relation with just those attributes. For example, the query
“in what year(s) did each mathematician make his or her contributions?”
could be answered by projecting the year attribute out of the
mathContributions
relation in the database. You could also project the
mathematician’s name attribute at the same time, so that you didn’t end
up with years without any indication of who did something in them.
Figure out the general form of Matlab expression that does a projection on a relation represented by a cell array. Since projection is supposed to produce a new relation, the result of your projection should be another cell array—should you use cell indexing or content indexing? Try your ideas on some of the relations from the database.
Another common relational algebra operation is selection, which
amounts to extracting one or more tuples from a relation (again, producing a new
relation containing just those tuples). Selection typically extracts all tuples that
contain a given value in a given attribute. For example, if you wanted to know what
Isaac Newton did, you could select the tuples containing “Isaac Newton”
in their name attribute from the mathContributions
and physicsContributions
relations in the database.
Write a Matlab function that performs a selection from a relation represented as a cell array. Your function should take the cell array, a value, and the index of an attribute to compare to that value as its arguments. It should return a new cell array that contains all, but only, the tuples from the input relation that have the specified value in the specified attribute.
Beware that the sizes of the value you are looking for and the atttribute value you are comparing it to might be different. This will make doing the comparison a little bit tricky, but not impossible.
Test your selection function on one or more relations from the database.
Create an entire relation of your own for the database. The relation can represent anything you want, although it might work better with the existing relations if it has something to do with the history of math or science. The facts stored in your new relation can come from any source you wish.
Save your modified database in a file. You can save it in a different “.mat” file from the original if you don’t want to risk changing the original.
Since relations are sets, all set operations (union, intersection, etc.) can be
performed on them. Union is particularly useful when the relations come from a database.
For example, the database I gave you might in many cases be more useful if you could
take the union of the mathContributions
, physicsContributions
,
and csContributions
relations to get a new relation that contained all
the contributions.
Write a function that calculates the union of two relations, and returns the resulting
relation. (Recall that the union of two sets is a new set that contains every element
that was in either, or both, of the input sets. The only thing that is tricky about
doing this in a program is that if the same element—i.e., tuple in this
case—appears in both sets, only one copy of it should appear in the result.
Matlab has a union
function, but I don’t believe it works on cell
arrays as they are used in this database.)
I will grade this exercise in a face-to-face meeting with you. During this meeting I will look at your solution, ask you any questions I have about it, answer questions you have, etc. Please bring a written solution to the exercise to your meeting, as that will speed the process along.
Sign up for a meeting via Google calendar. If you worked in a group on this exercise, the whole group should schedule a single meeting with me. Please make the meeting 15 minutes long, and schedule it to finish before the end of the “Grade By” date above.