Priority Queue

from discussion in IhadToWriteMyOwnLinkedList ...

A PriorityQueue is a data structure that is designed to allow efficient removal of the highest priority item. In a simulation event queue, that would typically be the event which happens next. Your linked list was in fact such a queue. You removed an item, then added it with a new priority. My point (if I had one) was that perhaps a priority queue implementation existed for your use and you didn't have to create anything. If there wasn't such a thing, then you did DoTheSimplestThingThatCouldPossiblyWork, and if the queue never got too long, probably the highest performing as well.

Here is an example of a priority queue in use. It sorts on the fly so that every queue.get() returns the smallest number. In this case, as is the case in simulation, no number is added to the queue that is smaller than the last number retrieved.

  Random rand = new Random();
  PriorityQueue queue = new PriorityQueue();
  // initialize with 1000 elements
  for(int i=0; i<1000; i++)
    queue.put(rand.nextDouble());
  // exercise queue by turning these over 1000 times
  for(int i=0; i<10000000; i++)
    queue.put(queue.get()+rand.nextDouble())

These numbers, 1000 elements and 1000000 gets, are typical of simulation. This could be a simulation of 1000 computers doing 1000 things each, say a small cell site over a short period of time. Look to height-balanced trees for good implementations. DonKnuth recommends the LeftistTree in the ArtOfComputerProgramming, and this will code up nicely in objects. I used java.util's TreeMap in a simulator recently but consider this very dangerous because it will discard duplicates. (Quiz: what is the probability of at least one duplicate in the above example.) -- WardCunningham

PriorityQueues can also be implemented with a HeapDataStructure.

Is it still meaningfully a priority queue when you need to resort the queue to determine which is the highest priority element to pull out? Most of the ways to implement a priority queue would be completely useless ...

I would call it a PriorityQueue when both adding an element and removing the highest-priority element are at most O(log n).

But priority queues do exist already. They're called relational databases.

For XP/agile methods: If there isn't one already, there definitely needs to be an agile precept called DontReinventTheWheel? (DRITW). Large PriorityQueues? are handled quite effectively by RDBMSs - which means for better or worse and whether one likes it or not - much of the computer science one might want to employ is already encapsulated inside SQL and Oracle, Sybase, DB2, etc. If SQL is less than desirable (what "right thinking" XP programmer would want to...) use Matisse since it (allegedly) enables both SQL and OO manipulation and access as well.

Ignoring DRITW and just going with YAGNI and DoTheSimplestThingThatCouldPossiblyWork, one could conclude at a certain point "I don't know if I need a queue or a priority queue, therefore I'll implement just a simple queue." But this turns out to be a misleading, costly, and wasteful decision later on if it happens that you actually *did* need a priority queue - especially a large one. So it's important to provide some reasonable validation for DoTheSimplestThingThatCouldPossiblyWork. Not validated, it could be risky - regardless of whether you choose to then move to an RDBMS or continue your own re-invention process of the priority queue.

The RDBMS will be slower than an object mechanism for simple queue processing. But given modern indexing, the RDBMS will often be just as efficient as a PriorityQueue coded by hand - why MQ systems often build on top of an RDBMS.

If you're good at doing queueing, you enjoy it, and someone will pay you to do it, go with YAGNI by itself, start off with a simple queue and add on as necessary. But in many cases, doing so is simply avoiding the inevitable and DRITW comes into play as an important consideration.

-- RichKatz

I've never heard of a RelationalDatabases being referred to as a PriorityQueue before. I certainly wouldn't want to rely on an RDBMS for an event simulation or an operating system, or any of the other typical uses of PriorityQueues?.

--- I can certainly understand that you've never heard of RDBMS being "referred to" as a PriorityQueue. RDBMSs are often used to implement a priority queue. However, what you are extending with "any of the other typical" uses are actually realtime scientific and system software. That's a bit narrow. There are many queue systems that model business processes and are more granular. What is typical for OS internals is not necessarily the same as what's typical for business apps.

Message queues, conferencing, scheduling, and reservation systems are applications that require high degree of integrity and tracking and comparatively less performance than what operating system software requires. Sonic Software, for instance, implemented their SonicMQ on top of an RDBMS and I believe many (but certainly not all) MQ systems (as I mentioned previously) are implemented this way.

-- RichKatz

RDBMS are actually search trees. You can abuse them as priority queues, which yields maximum asymptotic efficiency (O(log n) for both insert and extract), though at a considerable overhead. If you need a mergeable PriorityQueue (merge in O(log m + log n) or better) or need to increase priorities (increase_key is possible in amortized O(1)) an RDBMS just doesn't cut it.

In fact, Sonic Software's persistence database is an object data base called PSCPRO, which supersedes Cloudscape, another object database(currently marketed by IBM). -- Tom B

It has been my experience that these priority queues are flawed in practice. (Right now, I am speaking about ranking ToDoList tasks 1 to 5, 1 being the highest priority, and completing them in order of priority. I have, in the past, done this on paper.) Some things, which are clearly important, never reach the highest priority in a priority queue. I have found it important to invest time in certain basic areas regardless of priorities. I could easily put out priority 1 fires until I'm blue in the face, and then find out that I haven't made any marketing calls and don't have any signed contracts, and it will take two months before I have income. In other words, this now becomes priority 1 fire. Then you frenetically oscillate between several necessary areas as each one becomes "on fire." (Marketing work, bill collecting, delivery.) (As I talk about this now, it reminds me of what BarryOshry? calls the DanceOfBlindReflex? (in the book SeeingSystems?). And as I think about it now, the symptoms I'm describing are related to the complexity of managing delays in the feedback loop.)

Instead, life is easier if you commit portions of your time to each necessary area, and adjust the portions from a top-level perspective. For example, spend six hours a week marketing, two hours bill collecting, and the rest putting out fires NO MATTER WHAT (that's the hard part). After six weeks, see if you need to adjust the numbers.

I wonder if anyone has supporting or detracting evidence here. -- JasonFelice

You need to decide if your single priority figure represents importance/severity or urgency. The two are related, but not the same, and in real life these change over time.

I believe the priority number (if you are to use one number) is necessarily influenced by both importance and urgency.

I've seen a four-quadrant model with importance and urgency on different axes, with the suggestion that something from each quadrant must be done every day. (I forget the name of the book.) That means that the author believes that you must do something unurgent and unimportant every day, too. That is closer to a useful system in my experience (though not perfect, of course).

I predict that simply picking one axis to rate priority based on would result in spectacular disaster. Probably moreso if you picked urgency over importance, but still...

--JasonFelice

CategoryDataStructure