(tangent from ResultSetSizeIssues)
That's why we have transactions and row locks. Nothing can delete the records in the result set being spooled to the client.
You appear to be suggesting we abandon AcId to avoid the possibility of exhausting virtual memory. AcId is a good technology. I will sic Costin on you if you defame AcId. And, there are usually ways to handle run-away queries where they are likely to occure. Client-induced row-locks are a pain in the arse. A bad lock can hold up a lot of processes. I've seen times where somebody left for lunch or their machine crashed, leaving a row lock standing and pissing off a lot of executives. If there is an upper limit on all such locks, then there is the possibility a client can crash because a process takes too long. You are trading upper bounds on disk for upper bounds on time it seems. Yes, you can put in time-out handling on the client, but this is analogous to putting result set size quotas and handling on the client. --top
No one suggested abandoning AcId. Transactions and row locks are ways to enforce AcId. These "clients" can't go to lunch, nor do they create "run-away" queries. They aren't humans. They are RDBMS client processes like the X9.37 writer described above. Row locks are held for a few milliseconds at most, and usually much less.
Are you talking about client-side techniques or RDBMS implementation techniques?
Client side techniques. Why?
Explicit row locks are not part of AcId.
Why aren't they?
- AcId is not explicit about the means used to provide the guarantees. Row locks are one of a number of different kinds of means. Although I don't know if that was the intended point of top or whoever wrote the above. -- Doug
(As usual, Doug said what I tried to say in an earlier edit, but much more succinctly. I'll just note that PostgreSQL and, I believe, SQL Server use an interesting versioning technique to reduce the number and duration of locks used to implement
AcId. --
DanMuller)
- You're too kind. :-) The generic technique to avoid locks is to introduce generation numbers. Each transaction creates a new generation of the DB. Any given client operates on some particular generation, not necessarily the most recent, and their operations within a transaction ignore DB data stamped with generation numbers more recent than the generation in which their transaction began. When their transaction tries to complete, it is examined to see whether it conflicts with any transaction completed meanwhile, and if so, their transaction fails, otherwise, it is applied. This allows all clients to operate lock-free, with locks being used only by the part of the server that applies successful transactions to the DB -- which can be done very efficiently by simply adding it to a cache of recent data, to be merged onto disk gradually. But obviously this cannot be done for clients that require e.g. traversing all rows and seeing only the most recent data; they still need to lock. Works great for e.g. live backups, though. -- Doug
(Yup, that's the technique I was referring to. Now we'll wait for someone to protest that it doesn't provide any guarantees for update transaction success, at which point you'll explain why those guarantees don't exist anyway, and so on, resulting eventually in a new page on this transaction implementation technique. :) -- DanM)
- Guaranteeing update transaction success is an interesting problem. There's a trivial proof that it cannot always be provided: have your transaction complete 5 years after it began. The entire database will have changed meanwhile. We can't allow you to lock everyone out for 5 years, nor should you expect it to be successful when it's 5 years out of date. However, we do want to provide statistical guarantees about success and failure rates for more reasonable transactions, and that's where it gets complicated, as statistical algorithms and analysis always do -- and where many vendors (<cough>oracle<cough>) have been strongly faulted for doing far, far worse than the state of the art allows. (Caveat: I'm no expert on what various vendors do and do not provide). -- Doug
I find a mix is the best solution, use Acid for general processing but as soon as you have to wait for the user (or anything else that's very slow) you release all locks. You then check the generation counter when the user finally returns the modified data. If there have been other updates meantime the user gets an error or a conflict resolution request.