Using Castor JDO for SQL Mapping
Class Relationships
For purposes of illustration, it's best to have a set of classes that
exhibit the following relations between objects:
- one-to-one
- one-to-many
- many-to-many
- class dependent on another (weak entity)
- class related to another
- class type extending another
The first three are self-explanatory cardinality relationships among class
instances. Number 4 has already been covered. The fifth item, class
related to another, is the case where a class has a data member
reference to another, nondependent class. The referenced class in this case
is a strong entity, whose instances exist independently of the
parent class instance.
The sixth type of relation is that where a class extends (is derived from)
another Java class. The extended class type may contain additional data
members (fields) not defined in the base class. If the base class and its
data members are already mapped to a database table, then it would be
desirable to be able to reuse that mapping for a derived class. In this case,
a new table is created to hold the extended class' data members, and the
members that are inherited are stored in the table used for the base class. In
other words, a class that extends another class can be mapped in way that,
when instances of the extended class are saved, data member values of
that class are stored in two tables: the base type table, and the extended
type table.
Example: Shipping Line
I'll use for an example a description of a merchant shipping line. At the
top level, the shipping line has ships, and each ship has ports-of-call. This
gives us three main domain objects:
- the shipping line company
- the ships of the line
- the ports-of-call
Each of these domain objects in turn will have properties, both simple and
complex. The easiest way to show the relation between line, ship, and port,
along with their respective properties, is to see the object marshalled into
an XML document that contains sample information for all of the above:
<shipping-line name="Burble Bros. Shipping">
<ship name="Bubbles" registry="Liberian">
<port-of-call name="Seattle" pier="41">
<arrival-time day="2002-12-03" time="02:35" zone="GMT"/>
<departure-time day="2002-12-05" time="22:45" zone="GMT"/>
</port-of-call>
<port-of-call name="San Fransisco" pier="1">
...
</port-of-call>
...
</ship>
<ship name="Spring Surprise" registry="Sierra Leone">
<port-of-call ...>
...
</shipping-line>
As you can see in this XML document, each shipping line has multiple
ships, and each ship has multiple ports-of-call. In addition, each element
above has one or more attributes. In Java, such attributes are represented as
class properties. Later, I'll show how these same attributes correspond to
columns in a database table.
The ShippingLine Class
Each shipping line has a name and several ships. An interface to the
ShippingLine class might look like the following:
public interface IShippingLine {
public String getName();
public Ship getShip( int i );
public ArrayList getShips();
public void addShip( Ship s );
public void setName( String name );
// more methods...
};
The ShippingLine class has a Java-bean-type interface, with get/set
methods for every data member of the class. Because a ShippingLine object can
contain many Ship instances, the methods for adding and retrieving Ship
objects are different, because they manipulate objects in a collection.
The actual data members of ShippingLine are as follows:
public class ShippingLine {
private String _name;
private ArrayList _ships;
private int _id;
// methods...
};
You'll note the addition of an _id data member that was not mentioned in
the interface methods shown previously. The _id member will be used only by the
JDO persistence engine as a table row identifier. It holds, in effect, the
value of the primary key of the row in database table that contains
ShippingLine object data. The _id key value will be
autogenerated (as you shall see), and will therefore be unique.
The shipping_line Table
We will use the following SQL statement to create a table for holding
ShippingLine information in our Postgres database:
CREATE TABLE shipping_line ( id SERIAL, name VARCHAR(80) );
Since most SQL databases (including Postgres) are case-insensitive, we've
named the table shipping_line rather than shippingline to
make the table name easier to read. For the definition itself, the language
keywords have been capitalized, but again, that is only for legibility. Since
SQL does not have an array type, there isn't a ships column in the
table definition. In order to incorporate ships in shipping line, we will
have to use two tables. The Mapping Extended Classes section describes how this is done. For now, I'll concentrate on binding just this one table's information in the section to follow.
Before moving on to explain how to map a Java class to a table, let's
first explain the table column definitions above in more detail.
id: The id column is defined as SERIAL, which is a Postgres notational shortcut for defining an auto-incrementing integer field. Such
auto-incrementing fields are useful as primary keys; in other words, the
columns used to uniquely identify rows in a table. When defining a column
of type SERIAL, a SEQUENCE table is generated in the background. A
sequence table is a special database table that holds the last generated
sequence number and has methods for retrieving the next sequence value.
More on this in the key generators section to follow.
name: The name column is defined as being of type VARCHAR
(a.k.a. CHARACTER VARYING) of maximum length 80. Postgres doesn't
actually require a maximum VARCHAR length to be specified, but most
database systems do. VARCHAR columns vary the number of bytes being
stored based on the length of the string values, and are therefore more
storage efficient than fixed-lenghth columns.
Natural Keys vs. Surrogate Keys
It may be that the name of a shipping line is sufficient to identify it
from any other. It may be likely that such is the case, and if so, then
shipping_line.name is sufficient as a row identifier in our table. The
shipping line name would then serve as a natural key. To be safe,
though, I'm using a generated sequential id, known as a surrogate
key. Surrogate keys have no semantics associated with them -- they merely
identify each row in the shipping_line table, and therefore each shipping
line. With surrogate keys, it is easy to ensure that no identity collisions
take place.
For table joins performed during a SQL query, integer-based surrogate key
lookups will prove faster than natural keys that require string comparisons.
Therefore, our foreign-key/primary-key columns in the database will use these
surrogate keys instead of the natural ones. However, it will be necessary to
place a unique index on the natural key as well as the surrogate, so that the
two are always mapped one-to-one (and the id is associated with one and only one
name):
CREATE TABLE shipping_line ( id SERIAL, name VARCHAR(20)
CONSTRAINT uniq_name UNIQUE (name));
Note that it was not necessary to define id as unique, since the SERIAL
datatype covers that base already.
Let's now proceed to write the XML mapping file.