Using Global/Distributed Transactions in Java/JDBC with Oracle Real Application Clusters
by Sachin Shetty
12/21/2005
This document describes some of the ways to leverage the high availability features of Oracle Real Application Cluster (RAC) databases by moving the load balancing functionality from the Oracle server/Oracle driver to your application. In this article, I discuss the ways to achieve load balancing of connections across RAC nodes and yet maintain the sticky nature to the same RAC node for the all of the participants of a single global transaction. I have used Java examples wherever possible, but most of the programming constructs can be mapped to other sufficiently advanced programming languages.
An Oracle Real Application Cluster database is a clustered database. A cluster is a group of independent servers that cooperate as a single system. Clusters provide improved fault resilience and modular incremental system growth over single, symmetrical multiprocessor systems. Oracle's RAC page has more information on RAC technology.
A global transaction is a mechanism that allows a set of programming tasks, potentially using more than one resource manager and potentially executing on multiple servers, to be treated as one logical unit. Once a process is in transaction mode, any service requests made to servers may be processed on behalf of the current transaction. The services that are called and join the transaction are referred to as transaction participants. The value returned by a participant may affect the outcome of the transaction. A global transaction may be composed of several local transactions, each accessing the same resource manager. The resource manager is responsible for performing concurrency control and atomicity of updates.
Problems with Global Transactions and Oracle RAC
Problems occur when connections participating in a distributed transaction are routed to different instances of an RAC cluster. A split transaction is a distributed transaction that spans more than one instance of an RAC database. This implies that different branches of the same distributed transaction are located on different instances of an RAC database. This could result in erroneous situations, as follows:
- During normal operation: Neither branch can see changes made by the other branch. This can cause row-level lock conflicts amongst these branches, leading to ORA-2049 errors (timeout waiting for distributed lock).
- During recovery operation: Failures can occur during two-phase commit (2PC). Sometimes 2PC requires its own connection to the database (e.g., an abort). In such cases, a 2PC operation may be attempted on a transaction branch at an instance where that branch does not exist, causing missing transaction errors (ORA-24756). This in turn leaves the branch hanging as an active branch to be cleaned up by the Process Monitor daemon (PMON). While the branch is active, it still holds row-level locks on all rows that it modified. Similarly, if the 2PC operation that failed is a commit, then in-doubt transactions can remain in the database. This can cause ORA-1591 errors when another transaction attempts to access data that has been modified by the in-doubt transaction. You can find more detailed explanation of this problem in Oracle's "Best Practices for using XA with RAC" (PDF, 219 KB).
Recommended Approach
The following design forms the foundation of the recommended approach:
[1] [2] Next