SUNY Geneseo Department of Computer Science


Lesson 12—Cell Arrays, Files, and the Mathematics of Databases, Continued

CSci 120, Spring 2014

Prof. Doug Baldwin

Complete by Monday, May 5

Purpose

This lesson mainly reinforces your ability to write Matlab programs, particularly ones that work with cell arrays. It secondarily extends your knowledge of relational databases a bit.

Background

This lesson continues the work you started in lesson 11 with cell arrays and relational databases. In order to be ready for this new lesson you should be finished with lesson 11, and have read or watched the readings and videos associated with it (including the discussion of relations in the lesson 11 handout).

Exercise

In lesson 11 you learned about the projection, selection, and union operations from relational algebra. Your job in this lesson is to write Matlab functions that implement two other relational algebra operations that are more complicated than these.

You can use the database from lesson 11 as a source of test data for this lesson.

I give examples of both of the new operations based on the following relations: The first describes some historical mathematicians and scientists and the subjects they helped develop:

Name Subject Date
Isaac Newton Calculus 1666
Gottfried Leibniz Calculus 1675
Isaac Newton Mechanics 1687

The second describes some academic subjects and the prices of hypothetical textbooks for them:

Subject Title Price
Calculus Calc 1 $150.00
Calculus Calc 2 $102.50
Mechanics Intro Mechanics $190.00

Question 1: Cross Product

The cross product of two relations, A and B, is what you get if you append every tuple from B onto every tuple from A. “Appending” one tuple, u, onto another, t, creates a new tuple that contains all the attributes of t followed by all the attributes of u. Thus, if A is the mathematicians/scientists relation above, and B is the textbooks relation, their cross product is

Name Subject (A) Date Subject (B) Title Price
Isaac Newton Calculus 1666 Calculus Calc 1 $150.00
Isaac Newton Calculus 1666 Calculus Calc 2 $102.50
Isaac Newton Calculus 1666 Mechanics Intro Mechanics $190.00
Gottfried Leibniz Calculus 1675 Calculus Calc 1 $150.00
Gottfried Leibniz Calculus 1675 Calculus Calc 2 $102.50
Gottfried Leibniz Calculus 1675 Mechanics Intro Mechanics $190.00
Isaac Newton Mechanics 1687 Calculus Calc 1 $150.00
Isaac Newton Mechanics 1687 Calculus Calc 2 $102.50
Isaac Newton Mechanics 1687 Mechanics Intro Mechanics $190.00

Notice how each tuple in this cross product consists of a tuple from the mathematicians/scientists relation followed by a tuple from textbooks relation. Each tuple from each original relation appears three times, once paired with each of the tuples from the other relation. Also notice that each tuple in the cross product has two “Subject” attributes, one from each of the original relations; I’ve identified these as “Subject (A)” and “Subject (B)” to indicate which relations each comes from.

Use the notation “[a,b]” to denote appending tuple b onto tuple a. Then the cross product of A and B, written A × B, can be formally defined as the set of all tuples [a,b] such that a is a tuple from A and b is a tuple from B.

Generate two or three examples of cross products of your own, and be sure you understand what a cross product is. Then write a Matlab function that takes two relations as its parameters, and returns a new relation that is the cross product of the parameters.

Question 2: Join

Another important relational algebra operation is the so-called (natural) “join.” This operation combines two relations by matching their tuples on some shared attribute. The intuition behind joining two relations, A and B, on attributes i and j respectively, is to find every pair of tuples, one from A and one from B, such that the value of attribute i in the tuple from A equals the value of attribute j in the tuple from B, and to then append these tuples to create a new one. (Technically the new tuple should have only one copy of the shared attribute, but you can ignore this feature for the sake of this exercise.)

As an example, suppose A is the mathematicians/scientists relation above, and B is the textbooks relation. Then joining these relations on their “Subject” attributes would be…

Name Subject (A) Date Subject (B) Title Price
Isaac Newton Calculus 1666 Calculus Calc 1 $150.00
Isaac Newton Calculus 1666 Calculus Calc 2 $102.50
Gottfried Leibniz Calculus 1675 Calculus Calc 1 $150.00
Gottfried Leibniz Calculus 1675 Calculus Calc 2 $102.50
Isaac Newton Mechanics 1687 Mechanics Intro Mechanics $190.00

The first two tuples in this new relation are produced by pairing the “Isaac Newton / Calculus” tuple from the mathematicians/scientists with the “Calc 1” and “Calc 2” tuples from the textbooks. Similarly, the third and fourth tuples in the join come from pairing the “Gottfried Leibniz / Calculus” tuple in the mathematicians/scientists with the “Calc” tuples from the textbooks. Finally, the last tuple in the join is the pairing of “Isaac Newton / Mechanics” from the mathematicians/scientists with “Intro Mechanics” from the textbooks. Note that it is common for tuples to participate in multiple pairs in a join.

Formally, a join of A and B is a subset of A × B. In particular, joining A on attribute i with B on attribute j is the set of all tuples [a,b] such that a is a tuple from A, b is a tuple from B, and a’s attribute i equals b’s attribute j.

As the example above suggests, the result of a join is often a strange-looking relation. But joins are very important in real-world databases, because they are how you pull information from several relations together to answer questions that no relation could answer by itself. For instance, the joined relation above could be useful if you wanted to know who established now-expensive fields of study.

Generate two or three examples of joins of your own, and be sure you understand how join works. Then write a Matlab function that takes two relations, and two attribute indices, one for each relation, as parameters, and that returns the join of the two relations on the two attributes.

Bonus

You have been studying operations that form something called “relational algebra.” From the name, you might expect that there are algebraic laws about how these operations interact, and in fact there are. For example, see if you can prove each of the following:

For the first of these proofs, define projection formally as follows: the projection of relation A on attribute i is the set of all values x such that some tuple in A has value x in attribute i. Note that this definition is slightly different from (but technically more correct than) the one you used in lesson 11: in lesson 11 you could have duplicates in the result of a projection, under this new definition you cannot.

Also, give a counter-example to show that cross product is not commutative (i.e., that A × B ≠  B × A).

Follow-Up

Every study group should have a meeting with me after you start this lesson in class. During those meetings, I would like to see what progress you are making on this lesson. You don’t have to have the lesson finished by that meeting, but you should have made some progress on it. You do not necessarily have to show me your finished results from this lesson, although I will be happy to look at them and give you my reactions if you would like.