Result Set Size Issues

Moved from ChallengeSixVersusFpDiscussion.

The point is there is not enough regularity in many loops to take much advantage of the savings. Unless abstractions or wrappings provide significant savings, they often get in the way and create too much "creativity" (inconsistency) to be of much help. Further, if simplifying query loops is our goal, there are some non-FP ways to shorten stuff if a compactly-designed wrapper(s) is designed around the DB API's. Although it may vary per language, something like this is not out of the question:

PageAnchor: short_procedural

  rs = query("select * from foo...", optionX=bar);
  while (getNextRow(rs)) {
     stuff(rs, ...);
  }
Contrast with yours:

  withEachRow("Select * from foo...", 
               function(rs), optionX=bar) {
     stuff(rs, ...);
  }
Your approach is only two tokens smaller. (Plus, you may have to put the named-parameter options at the end, which is awkward.) The "getNextRow()" function can automatically close the connection upon detection of DB "EOF", depending on the settings. In fact, if we follow SeparateIoFromCalculation, then "rs" is only a local variable (record set structure copy) and the "query" function can close the connection, if so desired. MS has some "cursor tricks" possible with its DB API's, and IIRC that is sometimes why the connection is left open during looping. But they are not of much use for most web apps.

In practice, I generally don't strive for super-compact non-FP wrappers because it confuses other developers or they think saving a line here and there is not worth more parts that can break or hinder changes. Higher abstractions (both FP and non-FP versions) can result in DiscontinuitySpike change rework when they can't handle a situation they were not designed for. Thus, I tend to only spend such on bigger savings.

No, having "getNextRow() that automatically close the connection at EOF" wouldn't help here. What would happen in Non FP version if an exception is thrown inside the block, supposed that getNextRow hadn't reach EOF yet? Your automatically-closed-at-EOF getNextRow will still leave the connection open, won't it? FP version won't. See, while reducing only two token, we have also add more robustness to the system. As i have said before, doing both reducing source code and increasing robustness at the same time is considered significant improvement.

What connection? We don't need the connection inside the loop. The result set is copied to an internal data structure/table. We are done talking to the DB directly at that point (although MS has the option of direct talk, and that is why their API's are a little bloated). If there is an exception, it needn't have anything to do with the DB. True, in the few times that we may need live cursors it may be more code, but that is only going to make a minor total difference because it is a relatively rare occurrence. The original premise is "significant". -- top


PageAnchor: memory_management

How do you guarantee that the result set will fit in available virtual memory? How do you guarantee that your solution works in bounded memory?

A good language can cache that kind of stuff to disk if need be. Generally, result recordsets shouldn't be more than about 200 records at a time. If you need more than that, perhaps the approach needs to be rethought. If by chance you actually need cursors instead of record sets, then that is certainly an option that I didn't rule out. I am only saying it is not that common a need, so won't be a big source of code reduction. -- top

A "good language" would need a connection to feed the cache. And you can't tell me that entire result sets should be less than 200 records, or that working with result sets that won't fit in virtual memory isn't a "common need". My applications deal with result sets containing hundreds of thousands to millions of records at a time. (And some of those records contain images.) We use functors to guarantee that all connections, result sets and prepared statements are closed as soon as possible because resource leakage in 24/7 mission critical business applications is not acceptable.

I cannot comment without more domain-specific details. There may be a way to reorg it such that the calculations can be done in the DB server, such as an SQL "SET" instead of via loops. It hints of a yellow-alert design but I don't have the details. The few times I've seen actual needs to chomp strait thru that many records, it was written as a stored procedures using native cursors in PL/SQL or Transact-SQL. These languages tend to be the "big batch" languages of choice.

We aren't talking about situations where calculations can be performed in the RDBMS. We're talking about situations where you need to traverse result sets in the client. Transact-SQL can't write X9.37 files. Those can easily span hundreds of thousands or millions of rows.

I would need to know more about your specific app requirements to comment. It is curious why you need so much stuff on the client. If there is a lot of sharing of such info between clients, then a more server-centric approach would seem more appropriate. If there is not a lot of sharing, then the client and the server are essentially the same thing.

I gave you the specific app requirements. Write 1 million rows as an X9.37 file. The client needs every row because it writes the file. The RDBMS doesn't provide the tools for writing X9.37 files. This is a "server-centric" approach. The RDBMS's client is an application or batch server. I don't know what you mean by "sharing". The client and the RDBMS are not the same thing.

PageAnchor: cursor_emulation

I still question the "why" of this; I am looking for good domain reasons for needing such. Otherwise, it is merely an acedemic excercise or "problem" that may not reflect the real world. But, I will go with it for now. As already mentioned, it is possible to emulate cursors with record sets. It sounds like the bottleneck of processing here is the image file generation, not the database. Cursors can be emulated using something like this:

  row = queryOne("select top 1 * from foo order by xid");  // get first
  if (!row) {raisError(...);}
  process(row);
  while (row=queryOne("select top 1 * from foo order by xid 
                       where xid > " . row['xid']) {
    process(row);
  }
  // note that "top 1" syntax varies per vendor
This way, only one record will be delivered to the client at a time. "queryOne" is a routine that delivers only the first row of the result set. (We only get one row anyhow in this case due to our SQL.) It avoids the need to have an intermediate "rs" variable.

But now you're issuing several hundred thousand queries for each file instead of one. Performance needs won't allow that.

Another approach is to first download just the list of available ID's (primary keys). Getting a full list of keys is usually a much smaller job than than getting every data record. The list is temporarily stored on the client, and is used to grab and process each record in a loop.

 myList = queryToList("select xid from foo order by xid");
 for (each ix in Mylist) {  // traverse list
   row = queryOne("select * from foo where xid = " . ix);
   process(row);
 }

Again we see several hundred thousand queries plus the need for client memory to store the list of IDs.

A third technique is to process them in chunks. For example, you might grab groups of 100 at a time using a technique similar to the first example. It is more complicated, but may reduce network traffic.

Why would it reduce network traffic? The result set mechanism spools chunks to the client already, and it does so with only one query.

I am not sure what you mean. This sounds like the built-in cursoring that some API's already provide.

Only if you keep the connection open.

Or you could leave the connection open and let the result set handle the caching.

. . . .

I still don't feel I have enough info to make a fuller judgement. Maybe it is one of the cases where my techniques don't work well. Nothing is 100% right for 100% of all situations. It is possible to emulate cursors with result sets, I would note. But do you need cursors for hundreds of loops or just one? The alleged code savings won't mount to much unless it can be applied to many loops. I don't dispute that your suggestions may save a bit of code for a small percent of the loops. But the premise is a GoldenHammer, not a brass hammer. -- top

You lost me. All I'm saying is that you need to maintain a connection to the RDBMS while traversing any non-trivial result set. That's a fact of life. It doesn't make sense to copy the entire thing across the network up front and it may not fit into available RAM. Take it as a given that most real world use of databases will have this requirement.

The vast majority of interaction I encounter with the DB can be done via result sets that are a small subset of the entire table (unless the table is small). And, those that don't are usually done via stored procedures such that cursors can be effectively used. What I don't understand about your anecdote is why the entire result set has to be processed on the client side and why it has to be re-processed often. In other words, why do you have to repeatedly loop thru the same large entire data set?

Most of my interactions with the DB involve hundreds of thousands of rows. If they don't it usually isn't worth the additional overhead of database hosting, licensing and administration. As I said above, Transact-SQL doesn't provide the tools for generating X9.37 files. Hence the need to send hundreds of thousands of rows to the client. I don't repeatedly loop through the same data set. I repeatedly loop through new result sets composed of rows that haven't yet been exported.

Usually there is a pattern such that the result set can be narrowed down much further or the RDBMS can do much of the processing if the info is normalized.

Again, we aren't talking about that. We're talking about processing that must be done on the client over potentially large result sets. The example is simple. There are millions of rows in a table. The application must create records in an X9.37 file (a cash letter file used for exchanging check images and data with banks and the Federal Reserve) for several hundred thousand of those records. No pattern will further narrow the result set. Several hundred thousand checks have to be exported to a bank. The RDBMS can't create X9.37 files. If it could we wouldn't be writing this software.

Without digging in the actual details of your project, I cannot make any recommendations here. Again, it is a matter of probability. The 2 approaches cover the vast majority of stuff I actually encounter such that "living loops" are not necessary most of the time, and thus FP won't provide much simplification most of the time. I chose challenge 6 to provide a coded example to compare to move beyond hard-to-pin-down anecdote examples, and as far as I can tell, none of its loop need "live" cursors.

This anecdote is easy to pin down.

I beg to differ.

What's so hard to pin down about it? The requirements are simple:

 1. Write a program that creates X9.37 files containing check images and data from a database.
 2. The table containing non-image data has a state column. Only write the checks with state=READY_FOR_EXPORT.
 3. Don't stop other processes from adding records to the tables or modifying records that aren't being written.

By the way, what "good language can cache that kind of stuff to disk if need be"?

I haven't tested every language I used, but for a known example, FoxPro and dBASE V could easily cache SQL query results to local tables (DBF), which were automatically cached to disk. RAM-to-disk caching is built into modern OS's anyhow isn't it? Generally you want to avoid too much caching, but if cache it needed, it just slows thing down rather than outright crashes apps with "out of memory" errors. Caching allows more graceful degradation than a mere crash. Hopefully one reworks the app or buys more RAM if cache monitoring shows cache being used too heavily. I haven't seen an out-of-mem error for several years now. Sometimes I wish it would because run-away processes sometimes take forever to self-clean from disk cache when finally stopped. But usually such is from mistakes rather than regular production runs. Some notes about large queries can also be found in QueryByExample. -- top

If FoxPro and dBASE V cache large result sets to disk I wouldn't call them "good". The result set is already cached on disk (or RAM) in the RDBMS. If those languages maintained a connection they could scroll through them in chunks instead of writing them to disk and reading them back out. I suspect that's what they do and neither of us knows it.

I was thinking in terms of querying a RDBMS other than FoxPro. FoxPro and other ExBase dialects adjusted from LAN-based DB's to client/server DB setups by using tables more as temporary recordsets instead of "master" tables. But ExBase and club was initially a cursor-centric tool, which is one of the reasons it is not "true" relational. But the technique still worked well for temporary result recordsets. -- top

What technique? Copying the entire result set to a local drive?

I didn't mean to imply the result set was the same size as the master table.

I didn't think you did. It sounds like you're saying that FoxPro and dBASE V will copy the entire result set to a local drive. I seriously doubt that they do this, and if they do I think that's foolish.

PageAnchor local_tables

It is rare one needs a local copy of the entire master table (which is on the RDBMS). One can usually get by with a subset of rows and/or columns for a particular task. Even the somewhat rare times where one needs every row, it is rare to need every column such that the client-side copy takes up only about 5 to 10 percent of the space of the master. For the few times the full thing is needed, a stored procedure can be put on the RDBMS server to avoid having to ship all the info to the client. I would break it down roughly like this:

I agree that currently-favored tools don't support local auto-disk-buffered tables, and to me this is a crying shame. Vendors thought RAM-OOP would magically solve those kind of issues and thus abandoned client-side tables, taking us back to the dark-ages of data handling. TableOrientedProgramming worked! --top


PageAnchor: most_loops

Re: "If I can perform a calculation in the RDBMS, I do so. If not, I have to send data to the client for processing. It is rare that I can predict with 100% certainty that the size of that data will always be within the bounds of available memory. Therefore every client side loop I write must spool the result set from the RDBMS in manageable chunks, and it must keep a connection open while it does so. It isn't that I need "many" of these kinds of loops. These are the only kind of loops I need."

So you are saying this situation applies to more than just the X9.37 example? This would suggest that it is not just cases where you need to process an entire table in one chunk, but any result-set chunk is a risk because you cannot predict the upper bound with certainty?

If so, this was addressed above under "memory management". Most systems cache RAM to disk when it gets large. Thus, the biggest likely impact is that some chunks may process slowly, not an outright crash. For interactive apps, if the user does something that grabs too many records, put a quota on it and tell the user they have exceeded the quota. See QueryByExample for more tips.

No, that wasn't addressed. As I said above, exhausting virtual memory is not a valid strategy.

Live cursors may have other risks which vary depending on how they are implemented. For example, a record may be deleted just before you are about to process it. The "snapshot" nature of a pure record-set avoids many of these kinds of now-you-see-them-now-you-don't problems. And, it is generally better for network traffic to use full result sets rather than live cursors because they require fewer total network packet headers.

-- top

That's why we have transactions and row locks. Nothing can delete the records in the result set being spooled to the client.

(Tangent moved to AcidAndRowLocks)


I haven't been following this conversation much, but I just noticed the cursor emulation technique in the middle of the page. You guys realize that that approach, although workable when supported, really sucks, right? With cursors you'd just have a linear traversal, but this emulation makes a single traversal quadratic! With a huge number of rows, quadratic time means you can't do it that way; over a ten million rows, it'd never finish. -- Doug

One of us realizes that approach really sucks. The other will probably say that he never has to traverse ten million rows, and that if we followed his teachings neither would we.

What about the approach that breaks it into chunks? Besides, keeping a cursor open may also require lots of RAM usage on either the server or the client, depending on technique.

Why would keeping a cursor open require lots of RAM?

When you ask for the "next" row, it must get that info from somewhere. Either it does something similar to the emulated cursors above, or it creates a list of pointers/keys up front.

The RDBMS keeps track of which rows and columns constitute the result set, but that doesn't require a lot of RAM. It can do that in RAM or on disk. It can then read and spool chunks of the result set to the client as it traverses the set. Commercial RDBMSs do a good job of optimizing all of that so the client always has the next record available when it needs it. The important part of this is that the chunks have a finite, predictable and often tunable size. RDBMS writers use bounded memory algorithms and so should you.

No! And I am going to run with scissors and drink milk past the expiration date. So there!

[You're a very silly person, and I appreciate that. Back to cursors, though, the question perhaps is what you can reasonably expect an RDBMS to do. If you simulate cursors using repeated queries, there is every reason for the RDBMS to think that you have completed your query once, well, once you have completed the query, and no reason to expect it to hang onto the position of your last result from your last query, and still less reason to expect it to continue your *next*, apparently unrelated (from its point of view) query from *that* position.]

[That's basically asking for magic. Instead, you should expect that it is reasonable for all RDBMS'es to treat each new query over all rows as statistically independent, because there is no reason for it to expect that your next query will just pick up where the last one left off. This is typically *not* how queries behave, other than when simulating cursors, after all. Therefore, it is reasonable for the server to optimize the order of row traversals based on criteria other than where your last query had its last match. And as we all know, they have a zillion ways to attempt more reasonable optimizations.]

[In short, like I said before, simulating cursors leads to O(N^2) behavior, and you can't expect the server to optimize that away. Cursors have drawbacks, but this isn't one of them; this is where they are strongest, and where there are the fewest alternatives.]

[Some of the above discussion puzzles me, though. The cost of a cursor, to the RDBMS, is as close to zero as it gets. It's just a row index within whatever traversal ordering the server wishes to use, that must remain valid even if the server changes row orderings...no big deal; most of the time the row ordering doesn't change anyway, so mostly it's just, conceptually, a simple index, and the exceptions aren't important. Supporting cursors is really cheap. The downside is merely the usual issue with prolonged transactions, which cursors encourage, but still are not unique to cursors.-- Doug]

A cursor solution for this case is not something I will complain loudly about. I would still look for a non-cursor solution but probably wouldn't change an existing cursor solution to such if I encountered it as long as there were not reported problems. A file dump or groups of say 100 are still not out of the question, though. But it is still an exceptional situation for the reasons stated above. It is a batch job that just happens to need client-only tools.

A file dump buys you nothing and increases total I/Os. Fetching groups of 100 buys you nothing, introduces the potential for skipped and duplicated rows and forces the RDBMS to do more work than it needs to. Why would you complain about a cursor solution at all?

[[You guys should be careful to distinguish the use cases that you're talking about. The devil's in the details, as usual. For UI presentation of long lists, fetching a page at a time can be a reasonable solution, because you get a page of self-consistent results cached locally for screen updates. If you have multiple UI elements (e.g. a context list and drop-down lists), you might have to worry about consistency of snapshots across the elements. For batch processes, transaction-and-cursor methods are fine. If you're going to traverse the entire query (likely), Top has a point that fetching the entire result set will, in toto, usually have less network overhead than getting a record at a time. The tradeoff is in client-side storage needs. Recent solutions in the product I work on have used client-side snapshots, and performance has been acceptable. But in our case, this is motivated mainly by a lack of reasonable ACID support in the 'RDBMS' for cursors. When performance becomes an issue due to snapshot size, we try to break a batch process down into smaller atomic units and smaller query result sets. As with most optimizations, direct measurement of your particular case is essential. -- DanMuller]] [[Could be that I misunderstood, too. Perhaps Top or his anonymous debater will clarify for us. In any case, as you allude to, the degree of currency required of the data will vary by task, and this also affects implementation choices. -- DanM]]


Regarding batched cursor emulation (groups of 100), why would it result in dup or lost records anymore than live cursors would? Live cursors will probably do one of:

In the second two, the conditions can change, just like groups. We could also do the same thing: get all the keys up front, and then bring those records in via chunks (100's).

-- top

The terminology can be confusing here, because it tends to be vendor-specific. (I may have used the term 'client-side cursor' incorrectly earlier, too.) If by 'live' cursor, you mean one that does not guarantee consistency, then you're correct. Many DBMS offer such as a performance optimization, but I've personally never had use for one. However, you can also have a server-side cursor that traverses a query result set without any of the problems mentioned above, and without placing a large burden on the server. See AcidAndRowLocks for mention of the 'generational' or 'versioning' technique that a DBMS can use to implement this efficiently. -- DanM

Given 1,000,000 rows in a result set it would take 10,000 queries to traverse the set in chunks of 100 rows each. If you put all 10,000 queries inside the same transaction and lock all tables involved then you won't have lost or duplicated rows, but you will issue 10,000 queries in a single transaction that keeps all its tables locked. If you put each query in its own transaction then you can easily have lost and duplicated rows because other clients can change the data that formed the original result set. Cursors don't work the way you describe. There's no need to "re-search" during a single transaction. The result set is unchanged from start to finish.

If the result set is "unchanged from start to finish", then something might change in the actual tables in-between. Are you suggesting that your cursor takes a snapshot up front, and then slowly feeds that snapshot to the client?

No, I'm saying we rowlock the result set so it can't change during the transaction.

If so, that is not much different than a file dump-and-xfer.

Except that there is no file and no additional file I/O.

As far as record locks, I thought this was a read-only operation, by the way.

It could be, or you may have to mark the exported rows as exported. It doesn't matter. The locks are there to keep other processes from changing this data.

[[See http://ldp.rtin.bz/LDP/LG/issue68/mitchell.html for a discussion of how this can be done without locks. I don't know if this is a good presentation of the technique, but it's one of the first reasonable-looking ones that I could find. -- DanM]]

How does that strategy handle changes to concurrent versions? What if one transaction is updating all rows marked "READY_FOR_EXPORT" with "EXPORTED" as it exports them, while another transaction is deleting all rows marked "READY_FOR_EXPORT"? How do we know we won't export a row that ends up deleted? Isn't it a matter of chance which version's transaction commits first?

[I covered this already. See above, search for "generation" -- Doug]

[[Err... actually, somebody moved it to another page already, like I predicted. :) See AcidAndRowLocks. -- DanM]]

Doug's description makes it sound like the first commit always wins (because the 2nd commit's version can't be reconciled with the 1st's version.) That's probably OK for some kinds of DB interactions, but it puts slower transactions at a disadvantage. I can easily see scenarios where slower transactions never complete successfully because faster transactions always corrupt their working set.

[[Well, this could get rather involved, and the details can vary a lot between implementations. But as a hint of what this technique is about, note that reading data is far more common than writing in most applications. Versioning allows readers to get consistent snapshots without disturbing or being disturbed by writers. For writers, various locking techniques still apply. A row update, for instance, might still lock that record to prevent further updates, causing conflicting writers to fail their transaction early. One sometimes also distinguishes between regular reads and read-with-intent-to-update, again to encourage early failure of conflicting transactions. -- DanM]]


Some have expressed concern that "record sets" leave too much room for something to go wrong. Non-result-set algorithms supposedly have more predictable bounds. While this may be true, the utility of record-sets still outweighs discarding them in most cases in my opinion. Further, the risk of any given query can generally be determined up-front.

Here is a general approach I use to keeping record-sets from producing the kind of surprises they are accused of causing.

For any given query, we need the following information:

Here is an example of a fairly typical log-scale distribution:

  000000 to 000001 ****** 
  000002 to 000010 ******* 
  000011 to 000100 ******
  000101 to 001000 ***
  001001 to 010000 **
  010001 to 100000 *
We almost certainly don't want 10,000+ item queries. Thus we need to decide what to do if such happens or is about to happen.

Most RDBMS allow one to put upper limits on the result set (specifically, number of records returned) for a given query. In MySql, it looks like:

  SELECT * FROM bar WHERE blah=goop LIMIT 1000
Here, the maximum returned will only be 1,000 records. If we actually received 1000 records, then we know that we probably reached the quota. (A more definite answer might be nice instead of inference, but I find it does not matter much, set the limit to 1,001 if the official limit is 1,000).

In interactive apps, we can tell the user that they reached the quota and suggest they re-query with more specific filters. (See QueryByExample) A large result set generally means that the query is not what the user intended.

However, in some cases the user may still want a large result, such as dumping data to disk for later use by a spreadsheet. If such activity is permitted, then the software can prompt:

  Do you really want to download a gazillion records? (Y,N): 
This will not guarantee that system utilization may go through the roof, but does greatly reduce it. Most large queries will be only those intended to be large. Unless there is a purposeful DenialOfService-like attack, or some other special circumstance, it will rarely be an issue. (More on such probabilities later.)

We still may want to put an upper limit on such queries. We may allow downloading 10,000 records, but maybe not one million. If the user really need that many, they should probably contact the DBA anyhow for special arrangements. It might be better to break the information into smaller chunks anyhow. The user just may not know or care about such an option. Thus, a quota limit check can still be in place for higher amounts. We now have two design questions to ask:

Things are a bit different for batch (non-interactive) processes. We need to decide whether to put a quota on result sets and decide what to do about if it does reach the quota. It is probably a good idea to put a "blue sky" quota on every non-trivial batch query just in case. If you don't expect a million records no matter what, then a quota at a million will serve as kind of a SanityCheck?. If it does reach a million, then something is probably drastically wrong and we want our software to stop processing as soon as possible.

For smaller, but still large values, you still need to decide whether to live with them or not. If an occasional spike will not cause significant harm, then it may make sense to let it go. For example, if only one in 500 queries in a multi-user system creates a query that is between 100,000 and a million records, and your system can handle that frequency distribution, then you also may be able to live with it.

But, perhaps a coincidental or situational PerfectStorm may result in many doing the same big queries at the same time. For example, what if a surprise auditing lawsuit results in many departments doing large data dumps or queries on the same day? One result is that the system might start running real slow as it has to deal with more disk thrashing from the cache. If you distribution analysis suggest that this happens only once every 5 years, for example, then you have a financial decision to make: spend money to prevent that once-per-5-year event, or live with it. (In the auditing case, maybe the DBA can create CD's to give to the various departments.)

If you truly don't know what the distribution pattern is, or cannot live with occasional PerfectStorms bringing the system to a near crawl, then maybe one should not use result sets, using cursor-oriented techniques instead. Cursors have their place, but it's best to reduce dependence on them in my opinion. These are tradeoffs that have to be considered. Result sets are a convenient sub-paradigm. Not using them will complicate development.

Cursors also have their own performance issues. They can consume more network traffic in many cases because the same information has to come over the network in smaller, but more frequent network packets. Packets contain "headers" that describe what the content is, where it came from, where it is going, and other internal "administrative" information. The ratio of administrative information to actual data tends to goes up under cursors. Generally, result sets are more network-friendly than cursors. And, there is the AcId issue that cursors make tougher.

It seems that the tradeoffs are similar to those found in AreRdbmsSlow. Yes, results sets may be sacrificing a bit of predictability to have allegedly higher-level or simpler software from a coder and maintainer's viewpoint. But, the extra risk is usually a manageable risk. If your distribution estimate is reasonably close, then you know the risks up front.

Further, cursors may have their own PerfectStorms and risk. I just personally have not given that side as much thought.

-- top

I don't understand. You speak as if result sets were transmitted to the client in one bulk transfer and not traversed in chunks.

From the app developer's perspective, they usually are.

No. All of the result set mechanisms I've used (me, an app developer) spool result sets in chunks.

{And your app program has to deal with them as chunks? Do you have some example code?}

No, the chunks are spooled by the driver. The app code iterates across the result set one row at a time. Have you never used result sets?

If they are so big that they have to be micro-managed, then perhaps it is time to rearrange stuff. As a rule of thumb, more than about 1,000 records from a "wide" table or more than 10,000 from a "narrow" table is a yellow alert.

Utter nonsense. If you have to transmit 100,000 credit card transactions, you're going to have to traverse at least 100,000 records.

{Do you mean like bulk transfer between servers? I would have to look at specific scenarios. Bulk data transfers are often considered something outside of app work. And starting sentences with "Utter nonsense" is not very diplomatic. I suggest you resist typical techie urges to exclude diplomacy.}

You also imagine a fairly static database that doesn't change between separate queries. And you seem to think that cursors cause problems for ACID transactions.

A quote taken from my website:

: Another thing to consider is consistency. Relational results tend to be "snapshots in time". If we only use references, some pieces that may exist at the time of query may not be available or out-of-sync when they are used later, creating a host of conundrums. For example, suppose the user enters a bunch of search or filtering criteria for a report (such as a query-by-example screen). Between the time the report is queried and the time it is displayed some of the data may change if we rely heavily on references [cursors]. In my experience this results in phone-calls from angry or confused users.

: They may ask for all products allocated to Texas, for example; but some may be reallocated to Hong Kong by another user between that time-gap. Thus, the report user may see Hong Kong items in their report even though they only asked for Texas items. (Some kind of "refresh" button would be helpful if they want to see updated info.) The snap-shot approach is thus not a flaw, but a tool used to provide an internally consistent view.

How is that relevant?

It is an example of where using cursors can cause undesired results.

Right, but how is that relevant to my statements?

If a language supports auto-disk-buffered local tables, then I rarely see a need to have to deal with chunks in the app code itself. Maybe it chunk-ifies at the transfer level, but that should be hidden from the app code most of the time (other than maybe a progress call-back). See PageAnchor "local_tables" above.


CategoryConcurrency, CategoryOptimization


EditText of this page (last edited February 13, 2007) or FindPage with title or text search