Connection Pool

In a simple system, you create a fresh connection for each transaction. However, there is a fair overhead for doing that, so instead you can create a set of connection objects and place them in a pool and recycle them. Having a finite pool is also a way of preventing the number of simultaneous threads from growing without bound. In some databases, you pay extra for the extra connections, so you work to squeeze everything through one or a small number of connections.

Two examples of connection pools:

For this in pattern form, see RecycleBin.


I always wondered why it is something that the app developer had to worry about. When the API user asks for a connection reference/object/pointer, they don't care where it came from, a pool or bottle of milk, as long as it works. Why can't that issue be hidden or moved to a centeralized config flag somewhere?

Because connection pooling is not guaranteed to be 100% transparent to both application and database semantics. Getting a connection from a connection pool and getting a fresh new connection to the database, are not exactly the same thing.

I have never understood what the overhead is. Opening a socket is extremely cheap, it should only take about a millisecond on a LAN. If there is some heavy-duty handshaking going on to establish the connection then multiplexing all requests through one socket is another reasonable option. So why is pooling the popular solution?

It takes about 6 milliseconds to open a connection between a Java app and Microsoft SQL Server, both running locally on my P4 3.5 Ghz Windows box. 6 milliseconds is cheap in some contexts, expensive in others. I suspect the bulk of the time is spent authenticating the connect request.

Doing things solo on a local box is always best; it avoids scaling issues. Pooling only works if the connection is returned to the pool as soon as it can be.

Anyway, do the sums.... 6 ms to establish a connetion, but 1000 connections are needed NOW, that makes 6 secs, but oh, there is also the additional work / overhead to take into account, lets make it 10 secs (finger in air). That's starting to get noticeably slow.

Ok, now lets take the pooling approach, the connection has already been made, so lets say it (overly pessimistically slow) 2 ms to establish a connection, so that is 2 secs for the 1000 connections, and there is no additional work / overhead to take into account.

10s versus 2s, a bit of a no-brainer. Now let's scale it a bit. 1000 more connections are needed a little later, thats an additional 10s versus 2s. Now think of a web site that is getting 1 million hits a day. Would it be better to pool those connections or not? Obviously the numbers have been used to make the point. In a simple system, as has been said, this isn't important.

In summary - nothing is ever 'cheap'. -- BarnySwain


EditText of this page (last edited October 24, 2006) or FindPage with title or text search