A feature of transaction processing systems that enables databases or other transacted resources to be returned to the pre-transaction state if some error condition occurs. Sometimes abbreviated 2PC. Under 2PC, a single transaction can update many different databases or resources, and these resources may be distributed across networks, and have independent availability and failure modes. The two-phase commit strategy is designed to ensure that either all the resources are updated or none of them, so that the resources under transactional control remain synchronized.
Resources that participate in 2PC agree to be managed by a transaction manager.
In the X/Open model, there are 3 parties: the application, the resource manager (RM), and the transaction manager (TM). An example of an RM might be a database (like Oracle, DB2, SQL Server) or a transactional message queue (like IBM MQSeries or Microsoft Message Queue). An example of an App is, the code that denotes the transaction operation. The TM is often invisible to the app, but plays the role of director when multiple distributed RMs participate in a transaction.
The way it works:
It has been my understanding that two-phase commit had largely been discredited, at least as a way of keeping databases synchronized, since if any of the several databases is down, none can continue. More recent replication strategies (while not as simple as the salesman might suggest) do a better job. Any input from real world users?
It's also true that waiting for all databases to be updated is a bad strategy. Even if no database is down, you can be sure that one of them is the slowest, and if you wait for all databases to be updated before continuing, then you're running at the speed of the slowest one.
However, there are cases where two-phase commit makes sense. For example, suppose you have an application design that accepts incoming messages on a queue. When a message arrives, the design calls for a row in a database to be updated or modified. How does one coordinate the updates across these two independent stores? A two-phased commit makes this sort of application possible.
2PC != Transaction
Transactions are sometimes confused with 2PC. 2PC is an example of a distributed transaction commit protocol. While transactions can be used across distributed resources, the transaction concept is also often (most often?) used within a single resource manager, for example to to keep tables consistent within a single database.
Example: Let's say you want to transfer $20 from your checking account to your savings account. The bank's computer might do something like this:
As long as you're not running with distributed databases, "single phase" commit is what is used. All the necessary information for the transaction to be undone or completed are written to persistent storage (typically called a "transaction log") in an atomic step. The transaction is committed as soon as this information is permanently recorded. This is not possible in a distributed system, as there's no guarantee that the commit record is written on all participating systems. With distributed databases, TwoPhaseCommit solves this problem.
TwoPhaseCommit assumes reliable communications and tends to utilize locking (between receipt of 'Prepare' and TM's final 'Commit'). If the communications are unreliable (especially if the TM can go down in the middle of a commit) or if blocking is undesirable, ThreePhaseCommit? will be required.
In VersionControl(CVS), most developers do an update(checks for conflicts with repository), then a commit(write to repository). Most of the time it's a two step process, but sometimes it's of the form: update, if no conflicts, do a commit(as one automated step). It's a little different from SQL transactions, but the goal is always to handle conflicts.