Java RMI: Serialization
How to Make a Class Serializable
So far, we've focused on the mechanics of serializing an object.
We've assumed we have a serializable object and discussed, from the point of
view of client code, how to serialize it. The next step is discussing how to
make a class serializable.
There are four basic things you must do when you are making a
class serializable. They are:
- Implement the
Serializableinterface.
- Make sure that instance-level, locally defined state is
serialized properly.
- Make sure that superclass state is serialized properly.
- Override
equals( )and
hashCode( ).
Let's look at each of these steps in more detail.
Implement the Serializable Interface
This is by far the easiest of the steps. The
Serializableinterface is an empty interface; it declares
no methods at all. So implementing it amounts to adding "implements
Serializable" to your class declaration.
Reasonable people may wonder about the utility of an empty
interface. Rather than define an empty interface, and require class
definitions to implement it, why not just simply make every object
serializable? The main reason not to do this is that there are some classes
that don't have an obvious serialization. Consider, for example, an instance
of
File. An instance of
Filerepresents a file. Suppose, for example, it was
created using the following line of code:
File file = new File("c:\\temp\\foo");
It's not at all clear what should be written out when this is
serialized. The problem is that the file itself has a different lifecyle than
the serialized data. The file might be edited, or deleted entirely, while the
serialized information remains unchanged. Or the serialized information might
be used to restart the application on another machine, where
"C:\\temp\\foo"is the name of an entirely different
file.
Another example is provided by the
Thread class. (If you don't know much about threads, just wait a few chapters and then revisit this example. It will make more sense then.)
A thread represents a flow of execution within a particular JVM. You
would not only have to store the stack, and all the local variables, but also
all the related locks and threads, and restart all the threads properly when
the instance is deserialized.
TIP: Things get worse when you consider
platform dependencies. In general, any class that involves native code is
not really a good candidate for serialization.
Make Sure That Instance-Level, Locally Defined State Is
Serialized Properly
Class definitions contain variable declarations. The
instance-level, locally defined variables (e.g., the nonstatic variables) are
the ones that contain the state of a particular instance. For example, in our
Moneyclass, we declared one such field:
public class Money extends ValueObject {
private int _cents;
....
}
The serialization mechanism has a nice default behavior -- if all
the instance-level, locally defined variables have values that are either
serializable objects or primitive datatypes, then the serialization mechanism
will work without any further effort on our part. For example, our
implementations of
Account, such as
Account_Impl, would present no problems for the default
serialization mechanism:
public class Account_Impl extends UnicastRemoteObject implements Account {
private Money _balance;
...
}
While
_balancedoesn't have a
primitive type, it does refer to an instance of
Money, which is a serializable class.
If, however, some of the fields don't have primitive types, and
don't refer to serializable classes, more work may be necessary. Consider, for
example, the implementation of
ArrayListfrom the
java.utilpackage. An
ArrayListreally has only two pieces of state:
public class ArrayList extends AbstractList implements List, Cloneable, java.io.
Serializable {
private Object elementData[];
private int size;
...
}
But hidden in here is a huge problem:
ArrayListis a generic container class whose state is
stored as an array of objects. While arrays are first-class objects in Java,
they aren't serializable objects. This means that
ArrayListcan't just implement the
Serializableinterface. It has to provide extra
information to help the serialization mechanism handle its nonserializable
fields. There are three basic solutions to this problem:
- Fields can be declared to be transient.
- The
writeObject( )/
readObject(
) methods can be implemented.
serialPersistentFields can be declared.
Declaring transient fields
The first, and easiest, thing you can do is simply mark some
fields using the
transientkeyword. In
ArrayList, for example,
elementDatais really declared to be a transient
field:
public class ArrayList extends AbstractList implements List, Cloneable, java.io.
Serializable {
private transient Object elementData[];
private int size;
...
}
This tells the default serialization mechanism to ignore the
variable. In other words, the serialization mechanism simply skips over the
transient variables. In the case of
ArrayList, the
default serialization mechanism would attempt to write out
size, but ignore
elementDataentirely.
This can be useful in two, usually distinct, situations:
- The variable isn't serializable
- If the variable isn't serializable, then the
serialization mechanism will throw an exception when it tries to serialize
the variable. To avoid this, you can declare the variable to be transient.
- The variable is redundant
- Suppose that the instance caches the result of a
computation. Locally, we might want to store the result of the computation,
in order to save some processor time. But when we send the object over the
wire, we might worry more about consuming bandwidth and thus discard the
cached computation since we can always regenerate it later on.
Implementing writeObject() and readObject( )
Suppose that the first case applies. A field takes values that
aren't serializable. If the field is still an important part of the state of
our instance, such as
elementDatain the case of an
ArrayList, simply declaring the variable to be
transientisn't good enough. We need to save and
restore the state stored in the variable. This is done by implementing a pair
of methods with the following signatures:
private void writeObject(java.io.ObjectOutputStream out) throws IOException
private void readObject(java.io.ObjectInputStream in) throws IOException,
ClassNotFoundException;
When the serialization mechanism starts to write out an object,
it will check to see whether the class implements
writeObject( ). If so, the serialization mechanism will
not use the default mechanism and will not write out any of the instance
variables. Instead, it will call
writeObject( )and
depend on the method to store out all the important state. Here is
ArrayList's implementation of
writeObject( ):
private synchronized void writeObject(java.io.ObjectOutputStream stream) throws java.
io.IOException {
stream.defaultWriteObject( );
stream.writeInt(elementData.length);
for (int i=0; i<size; i++)
stream.writeObject(elementData[i]);
}
The first thing this does is call
defaultWriteObject( ).
defaultWriteObject( )invokes the default serialization
mechanism, which serializes all the nontransient, nonstatic instance
variables. Next, the method writes out
elementData.lengthand then calls the stream's
writeObject( )for each element of
elementData.
There's an important point here that is sometimes missed:
readObject( )and
writeObject(
)are a pair of methods that need to be implemented together. If you do
any customization of serialization inside one of these methods, you need to
implement the other method. If you don't, the serialization algorithm will
fail.
Unit Tests and Serialization
Unit tests are used to test a specific piece of
functionality in a class. They are explicitly not end-to-end or
application-level tests. It's often a good idea to adopt a
unit-testing harness such as
JUnitwhen
developing an application.
JUnitgives you
an automated way to run unit tests on individual classes and is
available from http://www.junit.org/.
If you adopt a unit-testing methodology, then any
serializable class should pass the following three tests:
- If it implements
readObject( ), it should implement
writeObject( ), and vice-versa.
- It is equal (using the
equals( )method) to a serialized copy of
itself.
- It has the same hashcode as a serialized
copy of itself.
Similar constraints hold for classes that
implement the
Externalizableinterface.
|
Declaring serialPersistentFields
The final option that can be used is to explicitly declare which
fields should be stored by the serialization mechanism. This is done using a
special static final variable called
serialPersistentFields, as shown in the following code
snippet:
private static final ObjectStreamField[] serialPersistentFields = { new
ObjectStreamField("size", Integer.TYPE), .... };
This line of code declares that the field named
size, which is of type
int, is
a serial persistent field and will be written to the output stream by the
serialization mechanism. Declaring
serialPersistentFieldsis almost the opposite of
declaring some fields
transient. The meaning of
transient is, "This field shouldn't be stored by serialization," and the
meaning of
serialPersistentFieldsis, "These fields
should be stored by serialization."
But there is one important difference between declaring some
variables to be
transientand others to be
serialPersistentFields. In order to declare variables to
be transient, they must be locally declared. In other words, you must have
access to the code that declares the variable. There is no such requirement
for
serialPersistentFields. You simply provide the
name of the field and the type.
TIP: What if you try to do both? That is,
suppose you declare some variables to be
transient, and then also provide a definition for
serialPersistentFields? The answer is that the
transientkeyword is ignored; the definition of
serialPersistentFieldsis definitive.
So far, we've talked only about instance-level state. What about
class-level state? Suppose you have important information stored in a static
variable? Static variables won't get saved by serialization unless you add
special code to do so. In our context, (shipping objects over the wire between
clients and servers), statics are usually a bad idea anyway.
Make Sure That Superclass State Is Handled Correctly
After you've handled the locally declared state, you may still
need to worry about variables declared in a superclass. If the superclass
implements the
Serializableinterface, then you
don't need to do anything. The serialization mechanism will handle everything
for you, either by using default serialization or by invoking
writeObject( )/
readObject( )if they are declared in the superclass.
If the superclass doesn't implement
Serializable, you will need to store its state. There are
two different ways to approach this. You can use
serialPersistentFieldsto tell the serialization
mechanism about some of the superclass instance variables, or you can use
writeObject( )/
readObject(
)to handle the superclass state explicitly. Both of these,
unfortunately, require you to know a fair amount about the superclass. If
you're getting the .class files from another source,
you should be aware that versioning issues can cause some really nasty
problems. If you subclass a class, and that class's internal representation of
instance-level state changes, you may not be able to load in your serialized
data. While you can sometimes work around this by using a sufficiently
convoluted
readObject( )method, this may not be a
solvable problem. We'll return to this later. However, be aware that the
ultimate solution may be to just implement the
Externalizableinterface instead, which we'll talk about
later.
Another aspect of handling the state of a nonserializable
superclass is that nonserializable superclasses must have a zero-argument
constructor. This isn't important for serializing out an object, but it's
incredibly important when deserializing an object. Deserialization works by
creating an instance of a class and filling out its fields correctly. During
this process, the deserialization algorithm doesn't actually call any of the
serialized class's constructors, but does call the zero-argument constructor
of the first nonserializable superclass. If there isn't a zero-argument
constructor, then the deserialization algorithm can't create instances of the
class, and the whole process fails.
WARNING: If you can't create a
zero-argument constructor in the first nonserializable superclass, you'll
have to implement the
Externalizableinterface
instead.
Simply adding a zero-argument constructor might seem a little
problematic. Suppose the object already has several constructors, all of which
take arguments. If you simply add a zero-argument constructor, then the
serialization mechanism might leave the object in a half-initialized, and
therefore unusable, state.
However, since serialization will supply the instance variables
with correct values from an active instance immediately after instantiating
the object, the only way this problem could arise is if the constructors
actually do something with their arguments--besides setting variable
values.
If all the constructors take arguments and actually execute
initialization code as part of the constructor, then you may need to refactor
a bit. The usual solution is to move the local initialization code into a new
method (usually named something like
initialize(
)), which is then called from the original constructor:
public MyObject(arglist) {
// set local variables from arglist
// perform local initialization
}
to something that looks like:
private MyObject( ) {
// zero argument constructor, invoked by serialization
// and never by any other
// piece of code.
// note that it doesn't call initialize( )
}
public void MyObject(arglist) {
// set local variables from arglist
initialize( );
}
private void initialize( ) {
// perform local initialization
}
After this is done,
writeObject(
)/
readObject( )should be implemented, and
readObject( )should end with a call to
initialize( ). Sometimes this will result in code that
simply invokes the default serialization mechanism, as in the following
snippet:
private void writeObject(java.io.ObjectOutputStream stream) throws
java.io.IOException {
stream.defaultWriteObject( );
}
private void readObject(java.io.ObjectInputStream stream) throws
java.io.IOException {
stream.defaultReadObject( );
intialize( );
}
TIP: If creating a zero-argument constructor
is difficult (for example, you don't have the source code for the
superclass), your class will need to implement the
Externalizableinterface instead of
Serializable.
Override equals( ) and hashCode( ) if Necessary
The default implementations of
equals(
)and
hashCode( ), which are inherited from
java.lang.Object, simply use an instance's location
in memory. This can be problematic. Consider our previous deep copy code
example:
ByteArrayOutputStream memoryOutputStream = new ByteArrayOutputStream( );
ObjectOutputStream serializer = new ObjectOutputStream(memoryOutputStream);
serializer.writeObject(serializableObject);
serializer.flush( );
ByteArrayInputStream memoryInputStream = new ByteArrayInputStream(memoryOutputStream.
toByteArray( ));
ObjectInputStream deserializer = new ObjectInputStream(memoryInputStream);
Object deepCopyOfOriginalObject = deserializer.readObject( );
The potential problem here involves the following boolean test:
serializableObject.equals(deepCopyOfOriginalObject)
Sometimes, as in the case of
Moneyand
DocumentDescription, the answer should be
true. If two instances of
Moneyhave the same values for
_cents, then they are equal. However, the implementation
of
equals( )inherited from
Objectwill return
false.
The same problem occurs with
hashCode(
). Note that
Objectimplements
hashCode( )by returning the memory address of the
instance. Hence, no two instances ever have the same
hashCode( )using
Object's
implementation. If two objects are equal, however, then they should have the
same hashcode. So if you need to override
equals(
), you probably need to override
hashCode( )as well.
TIP: With the exception of declaring variables
to be transient, all our changes involve adding functionality. Making a
class serializable rarely involves significant changes to its functionality
and shouldn't result in any changes to method implementations. This means
that it's fairly easy to retrofit serialization onto an existing object
hierarchy. The hardest part is usually implementing
equals( )and
hashCode( ).