Java RMI: Serialization
Using Serialization
Serialization is a mechanism built into the core Java libraries
for writing a graph of objects into a stream of data. This stream of data can
then be programmatically manipulated, and a deep copy of the objects can be
made by reversing the process. This reversal is often called deserialization.
In particular, there are three main uses of serialization:
- As a persistence mechanism
- If the stream being used is
FileOutputStream, then the data will automatically be
written to a file.
- As a copy mechanism
- If the stream being used is
ByteArrayOutputStream, then the data will be written to
a byte array in memory. This byte array can then be used to create
duplicates of the original objects.
- As a communication mechanism
- If the stream being used comes from a socket, then
the data will automatically be sent over the wire to the receiving socket,
at which point another program will decide what to do.
The important thing to note is that the use of serialization is
independent of the serialization algorithm itself. If we have a serializable
class, we can save it to a file or make a copy of it simply by changing the
way we use the output of the serialization mechanism.
As you might expect, serialization is implemented using a pair
of streams. Even though the code that underlies serialization is quite
complex, the way you invoke it is designed to make serialization as
transparent as possible to Java developers. To serialize an object, create an
instance of
ObjectOutputStreamand call the
writeObject( )method; to read in a serialized object,
create an instance of
ObjectInputStreamand call
the
readObject( )object.
ObjectOutputStream
ObjectOutputStream, defined in the
java.iopackage, is a stream that implements the
"writing-out" part of the serialization algorithm. (RMI actually uses a subclass of
ObjectOutputStreamto customize its behavior.)
The methods implemented by
ObjectOutputStreamcan
be grouped into three categories: methods that write information to the
stream, methods used to control the stream's behavior, and methods used to
customize the serialization algorithm.
The "write" methods
The first, and most intuitive, category consists of the "write"
methods:
public void write(byte[] b);
public void write(byte[] b, int off, int len);
public void write(int data);
public void writeBoolean(boolean data);
public void writeByte(int data);
public void writeBytes(String data);
public void writeChar(int data);
public void writeChars(String data);
public void writeDouble(double data);
public void writeFields( );
public void writeFloat(float data);
public void writeInt(int data);
public void writeLong(long data);
public void writeObject(Object obj);
public void writeShort(int data);
public void writeUTF(String s);
public void defaultWriteObject( );
For the most part, these methods should seem familiar.
writeFloat( ), for example, works exactly as you would
expect after reading Chapter 1 -- it takes a floating-point number and encodes
the number as four bytes. There are, however, two new methods here:
writeObject( )and defaultWriteObject( ).
writeObject( )serializes an object.
In fact,
writeObject( )is often the instrument of
the serialization mechanism itself. In the simplest and most common case,
serializing an object involves doing two things: creating an
ObjectOuptutStreamand calling
writeObject( )with a single "top-level" instance. The
following code snippet shows the entire process, storing an object--and all
the objects to which it refers--into a file:
FileOutputStream underlyingStream = new FileOutputStream("C:\\temp\\test");
ObjectOutputStream serializer = new ObjectOutputStream(underlyingStream);
serializer.writeObject(serializableObject);
Of course, this works seamlessly with the other methods for
writing data. That is, if you wanted to write two floats, a String, and an
object to a file, you could do so with the following code snippet:
FileOutputStream underlyingStream = new FileOutputStream("C:\\temp\\test");
ObjectOutputStream serializer = new ObjectOutputStream(underlyingStream);
serializer.writeFloat(firstFloat);
serializer.writeFloat(secongFloat);
serializer.writeUTF(aString);
serializer.writeObject(serializableObject);
TIP: ObjectOutputStream's constructor takes an
OutputStreamas an argument. This is analagous to many
of the streams we looked at in Chapter 1.
ObjectOutputStreamand
ObjectInputStreamare simply encoding and
transformation layers. This enables RMI to send objects over the wire by
opening a socket connection, associating the
OutputStreamwith the socket connection, creating an
ObjectOutputStreamon top of the socket's
OutputStream, and then calling
writeObject( ).
The other new "write" method is
defaultWriteObject().
defaultWriteObject( )makes it much easier to customize
how instances of a single class are serialized. However,
defaultWriteObject( )has some strange restrictions
placed on when it can be called. Here's what the documentation says about
defaultWriteObject( ):
Write the nonstatic and nontransient fields of the
current class to this stream. This may only be called from the
writeObjectmethod of the class being serialized. It
will throw the
NotActiveExceptionif it is called
otherwise.
That is,
defaultWriteObject( )is a
method that works only when it is called from another specific method at a
particular time. Since
defaultWriteObject( )is
useful only when you are customizing the information stored for a particular
class, this turns out to be a reasonable restriction. We'll talk more about
defaultWriteObject( )later in the chapter, when we
discuss how to make a class serializable.
The stream manipulation methods
ObjectOutputStreamalso implements
four methods that deal with the basic mechanics of manipulating the
stream:
public void reset( );
public void close( );
public void flush( );
public void useProtocolVersion(int version);
With the exception of
useProtocolVersion(
), these methods should be familiar. In fact,
reset( ),
close( ), and
flush( )are standard stream methods.
useProtocolVersion( ), on the other hand, changes the
version of the serialization mechanism that is used. This is necessary because
the serialization format and algorithm may need to change in a way that's not
backwards-compatible. If another application needs to read in your serialized
data, and the applications will be versioning independently (or running in
different versions of the JVM), you may want to standardize on a protocol
version.
TIP: There are two versions of the
serialization protocol currently defined: PROTOCOL_VERSION_1 and
PROTOCOL_VERSION_2. If you send serialized data to a 1.1 (or earlier) JVM,
you should probably use PROTOCOL_VERSION_1. The most common case of this
involves applets. Most applets run in browsers over which the developer has
no control. This means, in particular, that the JVM running the applet could
be anything, from Java 1.0.2 through the latest JVM. Most servers, on the
other hand, are written using JDK1.2.2 or later. (The main exception is EJB
containers that require earlier versions of Java. At this writing, for
example, Oracle 8i's EJB container uses JDK 1.1.6.)
If you pass serialized objects between an applet and a server, you should
specify the serialization protocol.
Methods that customize the serialization mechanism
The last group of methods consists mostly of protected methods
that provide hooks that allow the serialization mechanism itself, rather than
the data associated to a particular class, to be customized. These methods
are:
public ObjectOutputStream.PutField putFields( );
protected void annotateClass(Class cl);
protected void annotateProxyClass(Class cl);
protected boolean enableReplaceObject(boolean enable);
protected Object replaceObject(Object obj);
protected void drain( );
protected void writeObjectOverride(Object obj);
protected void writeClassDescriptor(ObjectStreamClass classdesc);
protected void writeStreamHeader( );
These methods are more important to people who tailor the
serialization algorithm to a particular use or develop their own
implementation of serialization. As such, they require a deeper understanding
of the serialization algorithm. We'll discuss these methods in more detail
later, after we've gone over the actual algorithm used by the serialization
mechanism.
ObjectInputStream
ObjectInputStream, defined in the
java.iopackage, implements the "reading-in" part
of the serialization algorithm. It is the companion to
ObjectOutputStream--objects serialized using
ObjectOutputStreamcan be deserialized using
ObjectInputStream. Like
ObjectOutputStream, the methods implemented by
ObjectInputStreamcan be grouped into three categories:
methods that read information from the stream, methods that are used to
control the stream's behavior, and methods that are used to customize the
serialization algorithm.
The "read" methods
The first, and most intuitive, category consists of the "read"
methods:
public int read( );
public int read(byte[] b, int off, int len);
public boolean readBoolean( );
public byte readByte( );
public char readChar( );
public double readDouble( );
public float readFloat( );
public intreadInt( );
public long readLong( );
public Object readObject( );
public short readShort( );
public byte readUnsignedByte( );
public short readUnsignedShort( );
public String readUTF( );
void defaultReadObject( );
Just as with
ObjectOutputStream's
write( )methods, these methods should be familiar.
readFloat( ), for example, works exactly as you
would expect after reading Chapter 1: it reads four bytes from the stream and
converts them into a single floating-point number, which is returned by the
method call. And, again as with
ObjectOutputStream,
there are two new methods here:
readObject( )and
defaultReadObject( ).
Just as
writeObject( )serializes an
object,
readObject( )deserializes it.
Deserializing an object involves doing two things: creating an
ObjectInputStreamand then calling
readObject( ). The following code snippet shows the
entire process, creating a copy of an object (and all the objects to which it
refers) from a file:
FileInputStream underlyingStream = new FileInputStream("C:\\temp\\test");
ObjectInputStream deserializer = new ObjectInputStream(underlyingStream);
Object deserializedObject = deserializer.readObject( );
This code is exactly inverse to the code we used for serializing
the object in the first place. If we wanted to make a deep copy of a
serializable object, we could first serialize the object and then deserialize
it, as in the following code example:
ByteArrayOutputStream memoryOutputStream = new ByteArrayOutputStream( );
ObjectOutputStream serializer = new ObjectOutputStream(memoryOutputStream);
serializer.writeObject(serializableObject);
serializer.flush( );
ByteArrayInputStream memoryInputStream = new ByteArrayInputStream(memoryOutputStream.
toByteArray( ));
ObjectInputStream deserializer = new ObjectInputStream(memoryInputStream);
Object deepCopyOfOriginalObject = deserializer.readObject( );
This code simply places an output stream into memory, serializes
the object to the memory stream, creates an input stream based on the same
piece of memory, and runs the deserializer on the input stream. The end result
is a deep copy of the object with which we started.
The stream manipulation methods
There are five basic stream manipulation methods defined for
ObjectInputStream:
public boolean available( );
public void close( );
public void readFully(byte[] data);
public void readFully(byte[] data, int offset, int size);
public int skipBytes(int len);
Of these,
available( )and
skip( )are methods first defined on
InputStream.
available( )returns a boolean flag indicating whether data is immediately available, and
close( )closes the stream.
The three new methods are also straightforward.
skipBytes( )skips the indicated number of bytes in the
stream, blocking until all the information has been read. And the two
readFully( )methods perform a batch read into a byte
array, also blocking until all the data has been read in.
Methods that customize the serialization mechanism
The last group of methods consists mostly of protected methods
that provide hooks, which allow the serialization mechanism itself, rather
than the data associated to a particular class, to be customized. These
methods are:
protected boolean enableResolveObject(boolean enable);
protected Class resolveClass(ObjectStreamClass v);
protected Object resolveObject(Object obj);
protected class resolveProxyClass(String[] interfaces);
protected ObjectStreamClass readClassDescriptor( );
protected Object readObjectOverride( );
protected void readStreamHeader( );
public void registerValidation(ObjectInputValidation obj, int priority);
public GetFields readFields( );
These methods are more important to people who tailor the
serialization algorithm to a particular use or develop their own
implementation of serialization. Like before, they also require a deeper
understanding of the serialization algorithm, so I'll hold off on discussing
them right now.