JBoss Optimizations 101
When Loading Data Once May Be Enough ...
The reason for this heavy database usage comes from the cache, or, more
accurately, from the absence of cache. For entity beans, the EJB specification
defines three commit options, which can be split into two main categories:
- I own the database (AKA Commit Option A): If any data must be
modified, it will be done through the container. The container is the only
point of write access to the database. As such, the container can cache data
across transactions without the risk of having an unsynchronized cache.
- I don't own the database (AKA Commit Options B or C): Data may be
modified by other systems as the EJB container. Consequently, the EJB container
cannot keep data in cache across transactions, as it may have been modified
externally. It must reload the required data from the database for each
transaction.
By default, JBoss' entity bean containers are configured not to use the
cache (i.e., Commit Option B). As the CMS101 development team hasn't changed the
default configuration, the database becomes the bottleneck. For each database
request, the page description, content, header, footer, and left and right
sides are reloaded!
Note: I have even seen a real-life situation where CMP 1.1 was used
and all data that was read during the transaction was written back at the end
of the transaction, even though no fields had been changed.
As the database used by CMS101 is only used by their application, they
decide to activate caching and switch to Commit Option A:
<jboss>
<enterprise-beans>
<container-configurations>
<!-
We define a new configuration that simply overrides
the default CMP 2.x configuration defined in
conf/standardjboss.xml by changing its commit
option
-->
<container-configuration extends=
"Standard CMP 2.x EntityBean">
<container-name>CMP 2.x and Cache</container-name>
<commit-option>A</commit-option>
</container-configuration>
</container-configurations>
<entity>
<ejb-name>WebPage</ejb-name>
<configuration-name
> CMP 2.x and Cache</configuration-name>
<method-attributes>
<method>
<method-name>get*</method-name>
<read-only>true</read-only>
</method>
<method-attributes>
</entity>
<entity>
<ejb-name>PageContent</ejb-name>
<configuration-name
> CMP 2.x and Cache</configuration-name>
<method-attributes>
<method>
<method-name>get*</method-name>
<read-only>true</read-only>
</method>
<method-attributes>
</entity>
<!-- and so on for Header, Footer,
LeftSide and RightSide -->
<session>
<ejb-name>PageRenderer</ejb-name>
<jndi-name>PageRenderer</jndi-name>
</session>
</enterprise-beans>
</jboss>
The development team runs the test suite again and sees that both the
scalability and level of database usage are excellent! After a little more
testing, they will be ready to go into production and sell their (highly
value-added) CMS101!
Cluster, You Said Cluster? Oops, Forgot that Detail ...
After months of prospecting, the CMS101 commercial team finds its first
customer. There is, however, a small discrepancy between what the sales force
has sold and what the development team has implemented (which is quite an
unusual situation): the customer expects a high number of requests on its web
site and thus wants CMS101 to run in a cluster to balance the load.
Clustering CMS101 is not a problem in itself, as JBoss supports clustering
features. The problem is that by doing so, they will lose the performance
optimizations they just implemented through Commit Option A. By running a
cluster of JBoss instances, more than one JBoss node will access the same
database. Furthermore, they will not only read data, but may also update web
page content, for example. Consequently, we now have as many points of write
access to the database as we have JBoss instances in the cluster. If a user
modifies a web page on a specific JBoss node, the database and the local cache
will be updated. However, the other JBoss instances will never reload fresh
data from the database, instead using their own caches, now containing stale
data.

Figure 5. Unsynchronized cache data
Once again, let's analyze the specific requirements of this application. In
the clustered case, our problem is that the data is never refreshed in the
other nodes' caches. Consequently, we need a way to force other nodes' caches
to reload a specific bean from the database when it is modified on another
node. The node that modifies data must send some kind of invalidation
message to the other node caches. Luckily, the cache invalidation message
doesn't need to be sent transactionally to the other caches — we're
dealing with web pages, not bank accounts.
For these scenarios, JBoss incorporates a handy tool: the cache
invalidation framework. It provides automatic invalidation of cache
entries in a single node or across a cluster of JBoss instances. As soon as an
entity bean is modified on a node, an invalidation message is automatically
sent to all related containers in the cluster and the related entry is removed
from the cache. The next time the data is required by a node, it will not be found
in cache, and will be reloaded from the database:

Figure 6. Cache invalidation framework
To activate this behavior in JBoss, the development team has to run JBoss
clustered and modify the jboss.xml deployment descriptor:
<jboss>
<enterprise-beans>
<entity>
<ejb-name>WebPage</ejb-name>
<configuration-name
>Standard CMP 2.x with cache invalidation<
/configuration-name>
<method-attributes>
<method>
<method-name>get*</method-name>
<read-only>true</read-only>
</method>
<method-attributes>
<cache-invalidation>True</cache-invalidation>
</entity>
<entity>
<ejb-name>PageContent</ejb-name>
<configuration-name
>Standard CMP 2.x with cache invalidation<
/configuration-name>
<method-attributes>
<method>
<method-name>get*</method-name>
<read-only>true</read-only>
</method>
<method-attributes>
<cache-invalidation>True</cache-invalidation>
</entity>
<!-- and so on for Header, Footer,
LeftSide and RightSide -->
<session>
<ejb-name>PageRenderer</ejb-name>
<jndi-name>PageRenderer</jndi-name>
</session>
</enterprise-beans>
</jboss>
Note that we have removed our customized container configuration, instead
using the one named "Standard CMP 2.x with cache invalidation," pre-defined in
conf/standardjboss.xml. No additional configuration is required to
get this behavior. Many other fancy designs can be built using this framework.
Note: JBoss 4.0 will not only contain the distributed invalidation
framework, but will also include a full-fledged transactional distributed cache.
This way, the development team can keep all of the advantages from previous
optimizations and get even better throughput, thanks to the cluster, all the
while satisfying the customer requirements.
Conclusion
The basic optimizations described in this article do not just apply to
CMS101, but to any kind of J2EE application with similar data taxonomy.
Remember that this analysis can be done on a per-EJB basis, not just for your
entire application as a monolithic whole.
Long life to CMS101 and see you on the JBoss.org forums!
Sacha Labourey
is one of the core developers of JBoss Clustering and the General Manager of JBoss Group Europe.
Juha Lindfors
is a computer scientist at the University of Helsinki.
Return to ONJava.com.