Java RMI: Serialization
Making DocumentDescription Serializable
To make this more concrete, we now turn to the
DocumentDescriptionclass from the RMI version of our
printer server, which we implemented in Chapter 4. The code for the first
nonserializable version of
DocumentDescriptionwas
the following:
public class DocumentDescription implements PrinterConstants {
private InputStream _actualDocument;
private int _length;
private int _documentType;
private boolean _printTwoSided;
private int _printQuality;
public DocumentDescription(InputStream actualDocument) throws IOException {
this(actualDocument, DEFAULT_DOCUMENT_TYPE, DEFAULT_PRINT_TWO_SIDED,
DEFAULT_PRINT_QUALITY);
}
public DocumentDescription(InputStream actualDocument, int documentType, boolean
printTwoSided, int printQuality)
throws IOException {
_documentType = documentType;
_printTwoSided = printTwoSided;
_printQuality = printQuality;
BufferedInputStream buffer = new BufferedInputStream(actualDocument);
DataInputStream dataInputStream = new DataInputStream(buffer);
ByteArrayOutputStream temporaryBuffer = new ByteArrayOutputStream( );
_length = copy(dataInputStream, new DataOutputStream(temporaryBuffer));
_actualDocument = new DataInputStream(new
ByteArrayInputStream(temporaryBuffer.toByteArray( )));
}
public int getDocumentType( ) {
return _documentType;
}
public boolean isPrintTwoSided( ) {
return _printTwoSided;
}
public int getPrintQuality( ) {
return _printQuality;
}
private int copy(InputStream source, OutputStream destination) throws
IOException {
int nextByte;
int numberOfBytesCopied = 0;
while(-1!= (nextByte = source.read( ))) {
destination.write(nextByte);
numberOfBytesCopied++;
}
destination.flush( );
return numberOfBytesCopied;
}
}
We will make this into a serializable class by following the
steps outlined in the previous section.
Implement the Serializable interface
This is easy. All we need to do is change the class
declaration:
public class DocumentDescription implements Serialiazble, PrinterConstants
Make sure that instance-level, locally defined state is
serialized properly
We have five fields to take care of:
private InputStream _actualDocument;
private int _length;
private int _documentType;
private boolean _printTwoSided;
private int _printQuality;
Of these, four are primitive types that serialization can handle
without any problem. However,
_actualDocumentis a
problem.
InputStreamis not a serializable class.
And the contents of
_actualDocumentare very
important;
_actualDocumentcontains the document we
want to print. There is no point in serializing an instance of
DocumentDescriptionunless we somehow serialize
_actualDocumentas well.
If we have fields that serialization cannot handle, and they
must be serialized, then our only option is to implement
readObject( )and
writeObject(
). For
Document-
Description, we declare
_actualDocumentto be transient and then implement
readObject( )and
writeObject(
)as follows:
private transient InputStream _actualDocument;
private void writeObject(java.io.ObjectOutputStream out) throws IOException {
out.defaultWriteObject( );
copy(_actualDocument, out);
}
private void readObject(java.io.ObjectInputStream in) throws IOException,
ClassNotFoundException {
in.defaultReadObject( );
ByteArrayOutputStream temporaryBuffer = new ByteArrayOutputStream( );
copy(in, temporaryBuffer, _length);
_actualDocument = new DataInputStream(new
ByteArrayInputStream(temporaryBuffer.toByteArray( )));
}
private void copy(InputStream source, OutputStream destination, int length)
throws IOException {
int counter;
int nextByte;
for (counter = 0; counter <length; counter++) {
nextByte = source.read( );
destination.write(nextByte);
}
destination.flush( );
}
Note that we declare
_actualDocumentto be transient and call
defaultWriteObject( )in
the first line of our
writeObject( )method. Doing
these two things allows the standard serialization mechanism to serialize the
other four instance variables without any extra effort on our part. We then
simply copy
_actualDocumentto the stream.
Our implementation of
readObject( )simply calls
defaultReadObject( )and then reads
_actualDocumentfrom the stream. In order to read
_actualDocumentfrom the stream, we used the length
of the document, which had previously been written to the stream. In essence,
we needed to encode some metadata into the stream, in order to correctly pull
our data out of the stream.
This code is a little ugly. We're using serialization, but we're
still forced to think about how to encode some of our state when we're sending
it out of the stream. In fact, the code for
writeObject(
)and
readObject( )is remarkably similar to
the marshalling code we implemented directly for the socket-based version of
the printer server. This is, unfortunately, often the case. Serialization's
default implementation handles simple objects very well. But, every now and
then, you will want to send a nonserializable object over the wire, or improve
the serialization algorithm for efficiency. Doing so amounts to writing the
same code you write if you implement all the socket handling yourself, as in
our socket-based version of the printer server.
TIP: There is also an order dependency here.
The first value written must be the first value read. Since we start writing
by calling
defaultWriteObject( ), we have to
start reading by calling
default-
ReadObject( ). On the bright side, this means we'll
have an accurate value for
_lengthbefore we try
to read
_actualDocumentfrom the stream.
Make sure that superclass state is handled correctly
This isn't a problem. The superclass,
java.lang.Object, doesn't actually have any important
state that we need to worry about. Since it also already has a zero-argument
constructor, we don't need to do anything.
Override equals() and hashCode( ) if necessary
In our current implementation of the printer server, we don't
need to do this. The server never checks for equality between instances of
DocumentDescription. Nor does it store them in a
container object that relies on their hashcodes.
Did We Cheat When Implementing Serializable for
DocumentDescription?
It may seem like we cheated a bit in implementing
DocumentDescription. Three of the five
steps in making a class serializable didn't actually result in changes
to the code. Indeed, the only work we really did was implementing
readObject( )and
writeObject( ). But it's not really cheating.
Serialization is just designed to be easy to use. It has a good set of
defaults, and, at least in the case of value objects intended to be
passed over the wire, the default behavior is often good
enough.
|
The Serialization Algorithm
By now, you should have a pretty good feel for how the
serialization mechanism works for individual classes. The next step in
explaining serialization is to discuss the actual serialization algorithm in a
little more detail. This discussion won't handle all the details of
serialization (Though we'll come close).
Instead, the idea is to cover the algorithm and protocol, so you can
understand how the various hooks for customizing serialization work and how
they fit into the context of an RMI application.
The Data Format
The first step is to discuss what gets written to the stream
when an instance is serialized. Be warned: it's a lot more information than
you might guess from the previous discussion.
An important part of serialization involves writing out
class-related metadata associated with an instance. Most instances are more
than one class. For example, an instance of
Stringis also an instance of
Object. Any given instance,
however, is an instance of only a few classes. These classes can be written as
a sequence:
C1,
C2...
CN, in which
C1is a superclass of
C2,
C2is a superclass of
C3, and so on. This is actually a linear sequence because
Java is a single inheritance language for classes. We call
C1the least superclass and
CNthe most-derived class. See Figure
10-4.

Figure 10-4. Inheritance diagram
|
After writing out the associated class information, the
serialization mechanism stores out the following information for each
instance:
- A description of the most-derived class.
- Data associated with the instance, interpreted as an
instance of the least superclass.
- Data associated with the instance, interpreted as an
instance of the second least superclass.
And so on until:
- Data associated with the instance, interpreted as an
instance of the most-derived class.
So what really happens is that the type of the instance is
stored out, and then all the serializable state is stored in discrete chunks
that correspond to the class structure. But there's a question still
remaining: what do we mean by "a description of the most-derived class?" This
is either a reference to a class description that has already been recorded
(e.g., an earlier location in the stream) or the following information:
- The version ID of the class, which is an integer used
to validate the
.class files
- A boolean stating whether
writeObject( )/
readObject( )are implemented
- The number of serializable fields
- A description of each field (its name and type)
- Extra data produced by
ObjectOutputStream's
annotateClass(
)method
- A description of its superclass if the superclass is
serializable
This should, of course, immediately seem familiar. The class
descriptions consist entirely of metadata that allows the instance to be read
back in. In fact, this is one of the most beautiful aspects of serialization;
the serialization mechanism automatically, at runtime, converts class objects
into metadata so instances can be serialized with the least amount of
programmer work.
A Simplified Version of the Serialization Algorithm
In this section, I describe a slightly simplified version of the
serialization algorithm. I then proceed to a more complete description of the
serialization process in the next section.
Writing
Because the class descriptions actually contain the metadata,
the basic idea behind the serialization algorithm is pretty easy to describe.
The only tricky part is handling circular references.
The problem is this: suppose instance
Arefers to instance
B. And
instance
Brefers back to instance
A. Completely writing out
Arequires you to write out
B. But writing out
Brequires you to write out
A.
Because you don't want to get into an infinite loop, or even write out an
instance or a class description more than once
you need to keep track of what's already been written to the stream. (Serialization is a slow process
that uses the reflection API quite heavily in addition to the bandwidth)
ObjectOutputStreamdoes this by
maintaining a mapping from instances and classes to handles. When
writeObject( )is called with an argument that has
already been written to the stream, the handle is written to the stream, and
no further operations are necessary.
If, however,
writeObject( )is passed
an instance that has not yet been written to the stream, two things happen.
First, the instance is assigned a reference handle, and the mapping from
instance to reference handle is stored by
ObjectOutputStream. The handle that is assigned is the
next integer in a sequence.
TIP: Remember the
reset(
)method on
ObjectOutputStream? It clears
the mapping and resets the handle counter to 0x7E0000 .RMI also
automatically resets its serialization mechanism after every remote method
call.
Second, the instance data is written out as per the data format
described earlier. This can involve some complications if the instance has a
field whose value is also a serializable instance. In this case, the
serialization of the first instance is suspended, and the second instance is
serialized in its place (or, if the second instance has already been
serialized, the reference handle for the second instance is written out).
After the second instance is fully serialized, serialization of the first
instance resumes. The contents of the stream look a little bit like Figure
10-5.

Figure 10-5. Contents of Serialization's data stream.
|
Reading
From the description of writing, it's pretty easy to guess most
of what happens when
readObject( )is called.
Unfortunately, because of versioning issues, the implementation of
readObject( )is actually a little bit more complex than
you might guess.
When it reads in an instance description,
ObjectInputStreamgets the following information:
- Descriptions of all the classes involved
- The serialization data from the instance
The problem is that the class descriptions that the instance of
ObjectInputStreamreads from the stream may not be
equivalent to the class descriptions of the same classes in the local JVM. For
example, if an instance is serialized to a file and then read back in three
years later, there's a pretty good chance that the class definitions used to
serialize the instance have changed.
This means that
ObjectInputStreamuses the class descriptions in two ways:
- It uses them to actually pull data from the stream,
since the class descriptions completely describe the contents of the stream.
- It compares the class descriptions to the classes it
has locally and tries to determine if the classes have changed, in which
case it throws an exception. If the class descriptions match the local
classes, it creates the instance and sets the instance's state
appropriately.
RMI Customizes the Serialization Algorithm
RMI doesn't actually use
ObjectOutputStreamand
ObjectInputStream. Instead, it uses custom subclasses so
it can modify the serialization process by overriding some protected methods.
In this section, we'll discuss the most important modifications that RMI makes
when serializing instances. RMI makes similar changes when deserializing
instances, but they follow from, and can easily be deduced from, the
description of the serialization changes.
Recall that
ObjectOutputStreamcontained the following protected methods:
protected void annotateClass(Class cl)
protected void annotateProxyClass(Class cl)
protected boolean enableReplaceObject(boolean enable)
protected Object replaceObject(Object obj)
protected void drain( )
protected void writeObjectOverride(Object obj)
protected void writeClassDescriptor(ObjectStreamClass classdesc)
protected void writeStreamHeader( )
These all have default implementations in
ObjectOutputStream. That is,
annotateClass( )and
annotateProxyClass( )do nothing.
enableReplaceObject( )returns
false, and so on. However, these methods are still called
during serialization. And RMI, by overriding these methods, customizes the
serialization process.
The three most important methods from the point of view of RMI
are:
protected void annotateClass(Class cl)
protected boolean enableReplaceObject(boolean enable)
protected Object replaceObject(Object obj)
Let's describe how RMI overrides each of these.
annotateClass( )
ObjectOutputStreamcalls
annotateClass( )when it writes out class descriptions.
Annotations are used to provide extra information about a class that comes
from the serialization mechanism and not from the class itself. The basic
serialization mechanism has no real need for annotations; most of the
information about a given class is already stored in the stream.
TIP: RMI's dynamic classloading system uses
annotateClass( )to record where .class files are stored. We'll discuss this more in
Chapter 19.
RMI, on the other hand, uses annotations to record codebase information. That is,
RMI, in addition to recording the class descriptions, also records information
about the location from which it loaded the class's bytecode. Codebases are
often simply locations in a filesystem. Incidentally, locations in a
filesystem are often useless information, since the JVM that deserializes the
instances may have a very different filesystem than the one from where the
instances were serialized. However, a codebase isn't restricted to being a
location in a filesystem. The only restriction on codebases is that they have
to be valid URLs. That is, a codebase is a URL that specifies a location on
the network from which the bytecode for a class can be obtained. This enables
RMI to dynamically load new classes based on the serialized information in the
stream. We'll return to this in Chapter 19.
replaceObject( )
The idea of replacement is simple; sometimes the instance that
is passed to the serialization mechanism isn't the instance that ought to be
written out to the data stream. To make this more concrete, recall what
happened when we called
rebind( )to register a
server with the RMI registry. The following code was used in the bank
example:
Account_Impl newAccount = new Account_Impl(serverDescription.balance);
Naming.rebind(serverDescription.name, newAccount);
System.out.println("Account " + serverDescription.name + " successfully launched.");
Account_Impl newAccount = new Account_Impl(serverDescription.balance);
Naming.rebind(serverDescription.name, newAccount);
System.out.println("Account " + serverDescription.name + " successfully launched.");
This creates an instance of
Account_Impland then calls
rebind(
)with that instance.
Account_Implis a
server that implements the
Remoteinterface, but
not the
Serializableinterface. And yet, somehow,
the registry, which is running in a different JVM, is sent something.
What the registry actually gets is a stub. The stub for
Account_Impl, which was automatically generated by
rmic, begins with:
public final class Account_Impl_Stub extends java.rmi.server.RemoteStub
java.rmi.server.RemoteStubis a class
that implements the
Serializableinterface. The RMI
serialization mechanism knows that whenever a remote server is "sent" over the
wire, the server object should be replaced by a stub that knows how to
communicate with the server (e.g., a stub that knows on which machine and port
the server is listening).
Calling
Naming.rebind( )actually
winds up passing a stub to the RMI registry. When clients make calls to
Naming.lookup( ), as in the following code snippet, they
also receive copies of the stub. Since the stub is serializable, there's no
problem in making a copy of it:
_account = (Account)Naming.lookup(_accountNameField.getText( ));
In order to enable this behavior,
ObjectOutputStreamcalls
enableReplaceObject( )and
replaceObject( )during the serialization process. In
other words, when an instance is about to be serialized,
ObjectOutputStreamdoes the following:
- It calls
enableReplaceObject(
)to see whether instance replacement is enabled.
- If instance replacement is enabled, it calls
replaceObject( ), passing in the instance it was about
to serialize, to find out which instance it should really write to the
stream.
- It then writes the appropriate instance to the stream.
Maintaining Direct Connections
A question that frequently arises as distributed applications
get more complicated involves message forwarding. For example, suppose that we
have three communicating programs:
A,
B, and
C. At the start,
Ahas a stub for
B,
Bhas a stub for
C, and
Chas a stub for
A. See Figure 10-6.

Figure 10-6. Communication between three applications.
|
Now, what happens if
Acalls a
method, for example,
getOtherServer( ), on
Bthat "returns"
C? The answer
is that
Agets a deep copy of the stub
Buses to communicate with
C.
That is,
Anow has a direct connection to
C; whenever
Atries to send a
message to
C,
Bis not
involved at all. This is illustrated in Figure
10-7.

Figure 10-7. Improved communication between three applications.
|
This is very good from a bandwidth and network latency point of
view. But it can also be somewhat problematic. Suppose, for example,
Bimplements load balancing. Since
Bisn't involved in the
Ato
Ccommunication, it has no direct way of knowing
whether
Ais still using
C, or how heavily. We'll revisit this in Chapters and ,
when we discuss the distributed garbage collector and the
Unreferencedinterface.