Graph Algorithms With Tables

If tables are general purpose data structures could someone show me how you'd represent a graph as a table. Then show an implementation of DijkstrasAlgorithm using tables. Then compare it to this page: http://www.python.org/doc/essays/graphs.html or any other implementation using the standard EdgeList? or AdjacencyMatrix? representations.

Most graph-like arrangements are done using many-to-many tables. If all the nodes are in the same table, then we only need two columns in our many-to-many table: fromID and toID. Referential integrity and/or triggers can be used to avoid duplicate links.

This is beautiful. It perfectly illustrates why people get really angry at top. I asked him to show a solution to a fundamental programming problem: representation of graphs and graph traversal algorithms. I even gave a pointer to a trivial implementation in Python. Instead he hand-waves an answer about many-to-many tables.

{{It seems to me that top has answered the question that was asked ("how would you represent a graph as a table?"). The mathematical definition of a graph is already expressed as a pair of tables (V, E), since sets and relations are special cases of tables. The (fromID, toID) table mentioned above is E (V is only needed if the graph can have isolated vertices with no edges). The list and matrix representations are two possible representations of tables; they're not specific to graphs (more generally, you can use an n-cube to represent an n-ary relation, which corresponds to a hypergraph).}}

Maybe I'm not communicating well but I'd like to see

  1. an sql scheme that represents a graph and
  2. an sql select statement that retrieves the shortest path between arbitrary nodes using Dijkstra's algorithm.

{{SQL is a toy language. You need a more expressive language for this. The Python implementation referenced above is not table-oriented because it depends on a specific edge-list representation for E. It can be made table-oriented by accessing E only using table operations (which is no less efficient, and merges the edge list and adjacency matrix versions of the program).

For example, here's the find_path function so modified, in a hypothetical table-oriented variant of Python:

   def find_path(graph, start, end, path=[]):
      path = path + [start]
      if start == end:
          return path
      if not (start in dom(graph)):
          return None
      for node where graph(start, node):
          if node not in path:
              newpath = find_path(graph, node, end, path)
              if newpath: return newpath
      return None
Here "a in dom(R)" tests whether a is in the domain of a relation R, and "for x where R(a, x)" loops through the values x such that R(a, x) is true. These are both primitive relational operations. Below I'll use "R.hintIndexBy(n)" to declare a hint that a table should be indexed by the n'th field of each tuple.

You may note that the code still uses array syntax. In this hypothetical Python variant, an array would be equivalent to a relation between array indices and values, and the array syntax would be syntactic sugar. -- David Hopwood }}

If tables are general purpose data structures (that are as good as hashmaps/dictionaries) then this should be trivial. Of course past experience suggests that top will not write any SQL in response.

I never claimed that SQL could completely solve/implement the entirety of ALL algorithms. You seem to have some misconceptions about my viewpoints. In actual work (as I encounter it), SQL is often used to "pre-process" the information, reducing the workload/amount of procedural code. In some algorithms it may be 90+ percent, in others 10. Whether it can do large amounts with your suggested algorithm, I don't know yet. A possible failure to use it significantly for one particular algorithm does not flatten my claims, for there are jillions of other algorithms. This topic is not really about SQL anyhow.

My point is that this is an entire branch of fundamental knowledge that is very easy to represent using associative arrays and lists.

Same with tables.

It happens to be both a very important branch of knowledge and something that is very hard to represent with tables. The point I'm making is that tables/relational is a nice data structure whereas lists, associative arrays, trees, graphs, stacks, queues are truly general purpose structures that can be used to represent anything. Tables aren't general purpose because they only work for some niches. Lists, associative arrays, etc are general because they work for all niches.

If that is the case, then why is Oracle such a huge company? I don't think any structure is the "ideal" for 100% of all niches, and I doubt you do either. Tables can represent anything that the dedicated structures can, and bend better to new requirements.

Since I suspect you won't be convinced my argument I suggest you try to implement graphs. In the python link I provided above they show that an edge-list representation of a graph is as simple as g = {'nodeKey':['otherNodeA', 'otherNodeB']}.

{{And the corresponding table-oriented code could be as simple as "g = {'nodeKey' -> 'otherNodeA', 'nodeKey' -> 'otherNodeB'}" and "g.hintIndexBy(1);" The advantage here is that you can add other indices, or fields, or change to a matrix representation, or add transaction support, or whatever, without changing existing code that uses the graph. A smart implementation could even guess based on usage patterns what the best representation is.}}

I don't see link weights in that. Adding link weights to a many-to-many table is trivial.

Algorithms to process that are fairly easy. Any equivalent sql schema would be far more complex and awkward.

{{Why the assumption that using a table-oriented programming model necessarily has anything to do with SQL?}}

In closing: please try to implement this so you'll immediately see the difference.

Why not just replace the array references with function calls that use SQL or whatever internally? (Array calls don't allow one to change to a different data container anyhow.) The data can then remain the in tables, where it probably sits anyhow already in practice. Whether SQL can be used to shorten the algorithm, I don't know yet. I don't specialize in shortest-path algorithms; I'll have to study the problem more. This is more like a computer science term paper project rather than the kind of stuff I encounter in the real world. It would be interesting to see you guys use stacks, arrays, etc. to nicely implement this: http://www.geocities.com/tablizer/chal06.htm

Aaaargh! I can't believe you just did it again. You've hand-waved about link-weights, challenged people to write code (which is here by the way: http://www.python.org/doc/essays/graphs.html), made an appeal to authority and still didn't answer the question. This is increasingly aggravating but I shall try one more time.

  1. Graph theory underlies huge amounts of business applications from Amazon to DHL to genetic sequencing.
  2. Implementing a graph and traversing it is so trivial that it can be done in a handful of lines of Python.
Therefore prove that tables are general purpose by showing an SQL schema and some sql that does the equivalent of the Python code linked to. You don't have to be an expert in graph theory. This is first year undergraduate computer science. If the problem can easily be solved using tables then show me how. I'm not asking for miracles just for you to implement trivial functionality. If you can't or won't then say why. Is it because you don't know how to implement algorithms in SQL? Or because you need an English description of the shortest path algorithm? The DijkstrasAlgorithm page now has links to introductory pages that explain it but I'm quite willing to write out the pseudo code if first year comp-sci is too advanced for you.

[Bad example, because graphs are fairly easily to represent with relational tables. Easier, IMHO, than the python example. Since TopMind won't provide code, I will. I won't use SQL because SqlSucks?, but here's the Python example in pseudocode-ish relational model:

  Table: vertices
  node
  -----
  A
  B
  C
  D
  E
  F

Table: edges from to cost ---------------- A B 1 A C 1 B C 1 B D 1 C D 1 D C 1 E F 1 F C 1

Table: current_costs node cost ----------- A 0 (initialize all the rest to some variable representing infinity. SUM(edges.cost) should work)

Table: predecessors node pred ------------ (empty to start)

Table: visited node ----- (empty to start)

def shortest_paths while count(difference(vertices; visited)) != 0 current_cheapest = min(cost; difference(visited; current_costs)) insert(visited, current_cheapest) relax(current_cheapest; project(name; select(edges.from = current_cheapest.node; cross-product(edges; current_cheapest))))

def relax(cheapest_node; adjacency_set) adjacent_costs = select(current_costs.node = adjacency_set.node; cross-product(adjacency_set; current_costs)) link_costs = select(cheapest_node.node = edges.from && adjacent_costs.node = edges.to; cross-product(adjacent_costs; edges; cheapest_node) update(adjacent_costs; adjacent_costs.node = min(adjacent_costs.cost, cheapest_node.cost + link_costs.cost)) merge(predecessors; cross-product(rename(cheapest_node.node; pred); project(node; select(adjacent_costs.cost < cheapest_node.cost + link_costs.cost))))
The start node has to be placed in current_costs with a cost of 0, and then the shortest path to a given node is obtained by following the predecessors table back to the start node. This (most likely) requires a recursive query:

  while current_node != start
    current_node = project(pred; select(node = current_node; predecessors))
But we're talking about tables here, not databases, and there's no inherent reason why a recursive query on an in-process table structure would be slower than the equivalent list or map.

''Do you have any idea how slow select(edges.from = current_cheapest.node; cross-product(edges; current_cheapest) is going to be in the absence of a query optimizer like a RDBMS has? You assume some magic compiler for python is going to do that...

Note that this does more than the Python example, because the Python example neglects the cost of each edge (one of the fundamental parts of Dijkstras algorithm). I included it here to show that the relational model could handle it. It would be good to extend the Python example to include costs, just to see how much code it would take then.

I've taken a few liberties with the relational operators in the interests of clarity. Min and count, for example, aren't really operators, but everybody knows what they do, and if you need a real equivalent you can see how to do min at http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter3/node8.html. Also, the relational algebra doesn't have insert and update as operators; I define insert to take a relation and a tuple and insert the tuple into that relation, and update to take a relation and an expression and apply that expression to all fields of that relation. And I use an operation I've termed "merge" which updates a tuple if it already exists (setting all fields) and creating it if not. The semantics of these are probably not as clean or well-defined as they should be, but we're trying to express a graph traversal, not extend the relational algebra here.

-- JonathanTang]

[This isn't an academic exercise. I have examples above of multi-billion pound companies that have to solve variations of the above in order to stay in business. This is how Amazon and various logisitics companies work out how to get parcels and other items to recipients in the quickest and cheapest way. Google for Amazon's research department if you're interested in the business applications of graph theory.]

In my decade-and-a-half in biz apps, I have never encountered a single implementation. I thought such algorithms fall into a category by themselves such that there are dedicated specialists. If I was put in charge of implementing one, I would survey existing algorithms and use what I find. It appears most existing algorithms for such are procedural in nature and not relational. But that does not mean that a relational version cannot or does not exist. More on this below.

But you must admit that those implementation are neatly sitting somewhere, patiently waiting to impress us all with a timely worldwide Fedex pre-order of an XBOX360 title. I for one think that only the pure of heart are rewarded with having this kind of algorithm implementation as their critical path milestone, while the rest of us mortals battle with some stupid compiler oddities and pointy haired managers. -- LeoBighetti?


Jonathan you've illustrated my point. Doing what you did using what is the de facto language for real work with databases is so painful you've had to make up your own notation instead. I also wanted to avoid link weights and stick to the simplest possible type of graph so that we can see just how much work it takes to just to represent a graph using SQL/tables. This way topmind wouldn't be able to weasel out of writing code by claiming that the maths was too hard for him. I suppose I should have learned my lesson by now. He's never provided a working program to anyone ever but he always sounds like he will if you just jump through one more hoop for him.

I think I'll save myself any future aggravation by ignoring him and focusing on Mr Tang's interesting question about the difficulty of adding link-weights to the Python solution (which currently is the only executable implementation anyone has provided).

[ Code will go here: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/119466

You'll note that this solution which someone else already derived from Guido's original implementation requires several different kinds of dedicated data structures. These would all have to be re-implemented using SQL if you were going from the simple version to the complex version. If you had built the simple version in SQl (which you didn't because it was too hard) you would now have to re-write the whole lot again in SQl (which would be even harder). Thus the claim that tables, sql and databases make it easier to change things is shown to be bogus for any domains that require complex algorithms. ]

{{ Here is that code modified to be table-oriented, using the same variant of Python as before:

   def Dijkstra(G,start,end=None):
      """
      Find shortest paths from the start vertex to all
      vertices nearer than or equal to the end.

The input graph G is assumed to be modelled as a ternary relation Vertex x Vertex x Length, where (v, w, x) models an edge v -> w with length x.

Length must be an arithmetic type that can be added to 0 and itself. Vertex can be any type. """

D = {} # table of final distances P = {} # table of predecessors Q = {} # est.dist. of non-final vert. Q.hintIndexBy(1) Q.sortedBy(2) # Q must remain sorted by the estimated distances Q[start] = 0

for v in Q: D[v] = Q[v] if v == end: break

for w, x where G(v, w, x): vwLength = D[v] + x if w in D: if vwLength < D[w]: raise ValueError?, "Dijkstra: found better path to already-final vertex" elif w not in Q or vwLength < Q[w]: Q[w] = vwLength P[w] = v

return (D,P)

[code for shortestPath unchanged]

Why is this so similar to the original code? Simple: the algorithm is already in an (iterative) table-oriented style; it just needs to be abstracted away from any particular representation of the relation G. Note that the only "dedicated data structure" in the original code, a priority dictionary (see http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/117228), has become just a table sorted by its second column and indexed by its first. -- David Hopwood }}

["Thus the claim that tables, sql and databases make it easier to change things is shown to be bogus for any domains that require complex algorithms." I'm not sure about that. Adding link weights involved 2 new tables, and an additional column to edges. It did require that the code itself be rewritten, but that was the case with the Python version too. And if I'd properly factored the code (which is possible in relational algebra, because of algebraic closure, but not in SQL), then large segments of it could probably have been reused.

Regardless, it's probably best not to argue further until we have a decent (i.e. non-SQL) relational language. Not having executable code allows one to get away with a great deal of fuzzy thinking (as I'm sure you've noticed), so the discussion has a way of going around in circles. The Python version works, which is basically all that matters for an algorithm like this. -- JonathanTang]

Doing what you did using what is the de facto language for real work with databases is so painful you've had to make up your own notation instead.

Would you be more open to relational techniques if it had better implemented languages? Extendable relational languages tend to resemble functional programming I would note. -- top

Yes. People claim that relational techniques are very expressive but all the actual mainstream implementations as opposed to theoretical papers are hilariously bad. Although I rather like SchemeQL.


I think we need to divide this up into two issues:

I think the first has been clearly established. Tables can easily store graphs. Agreed?

The second point I already made, but it has been tossed asided for some reason. I never claimed that SQL or relational languages solve all processing all the time (I am not sure they even should). A possible failure of it to help significantly with this particular algorithm means very little. In actual work, the percentage of the behavior part that SQL solves varies widely. If SQL can't provide significant pre-processing, I simply write procedural code around simple Select statements. To get a better picture of what SQL can solve and can't solve we need more data points. Your example is only one point at best. (Note that I agree with Jonathan that SQL is far from the pinnacle of relational languages.) -- top

[I agree that tables can store graphs. I've done it myself when building content management systems. My point is that if one stays only in the relational/sql model then it becomes very, very, very difficult to express the algorithms that provide useful behaviour (after all data without behaviour is worthless.). Therefore relational isn't sufficient or general purpose because it must always be supplemented by something else. Just like OO isn't sufficient or general purpose because it must always be supplemented by something else. Namely imperative or functional paradigms.]

{{ Hear, hear. MultiparadigmLanguages and systems are clearly the right thing. However, if you ignore the bias against OO, I've found some of TopMind's stuff to be helpful in clarifying my understanding of how the RelationalModel fits into multi-paradigm systems. }}

He's never provided a working program to anyone ever but he always sounds like he will if you just jump through one more hoop for him.

You are being a hypocrite. I have been listing examples such as http://www.geocities.com/tablizer/chal06.htm for more than year. Yet you ignore them and then go ahead and make your own. Then when I don't give instant turn-around to your academic-ish example, you accused me of "ignoring code challenges". In "playground-speak", you ignored my example before I ignored yours. I gave you code to compare first. You finish mine, and then I'll finish yours. Otherwise, I will be tempted to accuse you of creating a different example to distract from the fact that you are afraid of the first one.

[This is the first time you've mentioned challenge 6 in this conversation. This discussion emanated from the claim that tables were sufficient and general purpose data structures. My point has always been that tables are not enough. That you need the vast corpus of computer science to do anything useful and that automatically reaching for a relational solution to problems is unwise because no paradigm is always the best solution.]

"General purpose" and "always best solution" are not necessarily the same thing. General purpose may not even be possible in an absolute sense. See GeneralPurposeProgrammingLanguage. If you take the absolutist version, then if dedicated structures were general purpose, people would not need RDBMS, ever! {That's not what general purpose means. It means that it can be used to solve any problem. Sometimes that means we can always, the computer science community, use dedicated data structures to build a tool (like an RDBMs) to solve pretty much any problem. We use bricks to build walls and use walls to build houses. We don't use walls to build bricks because that kind of AbstractionInversion is silly. Remember general purpose is only a starting point. Everything can be built out 1s and 0s but we don't use them directly.}

[That's why I created WhenNotToUseTableOrientedProgramming. That's why JeffPanici created TheTopChallenge (which you still haven't replied to).]

Have too.

[As for challenge 6. It's a reporting system. I would build it using the reporting tools provided by my database vendor because that's one situation where that's the best solution. I might even consider using RPG.]

In practice firms don't use off-the-shelf tools for such because they can't customize them enough, and also because such tools have not provided convenient and flexible web interface yet. Plus, they are expensive. I agree that such example as-is is too limiting. But a company could build on it to get what they want. You can't do that with proprietary code.

[Thing is there's all these different kinds of software out there that aren't reporting systems. The kind of work I and lots of programmers do require sophisticated algorithms and data structures. In short it involves computer science. If you'd ever done any work with data mining, financial analytics or even e-commerce web-apps]

I have too. Why not an e-commerce example instead of university lab games?

[ you'd realize that paradigms, data structures, etc are all necessary because you never know what may come up next. The narrowness of your experience is why you believe so firmly in your particular GoldenHammer.]

I have admitted that my experience is limited to a specific niche. "General purpose" topic was to explore them under other niches.

[I've looked at your website over the years and you keep proposing the same sorts of problems. I keep finding myself saying: with those criteria I'd use a database. But when people propose problems that would require you to broaden yourself you start name-calling and various other negative behaviours.]

I have avoided responding to FlameBait on many many many occasions here. You guys are no angels. Do I need to document and log your anti-social behavior to convince you? {You mean the way various people have logged your behaviour in the past? Don't you think that might be tempting fate just a little?}

[How exactly does your MSAccess based db front-end show that tables are general purpose/fundamental/sufficient data-structures? Or that graphs are best represented using tables?]

Dijkstra was pretty good at such algorithms. It may require a relational-version of Dijkstra's mind to produce an acceptable relational version. I probably could not fathiom a procedural solution either without lots of past experience on such problems. YOU did not come up with the Dijkstra algorithm, so why do you expect me to come up with a relational version? Your "challenge" is rigged. Dijkstra did the hard work for you. I don't claim to be as smart as Dijkstra on graph problems, relational nor procedural. A nice relational version is still a possibility, but not likely from me. That does not mean that relational stinks because I couldn't do a procedural (dedicated struc) one quickly either. It is the nature of the problem, not a failing of relational. This appears to be an intimidation ploy. You are comparing smart apples to dumb oranges (WRT graph algs).

{This is exactly my point. I can build software _today_ using existing data structures. You can't.}

It is due to where past research has been, not an in-born fault of relational.

{ Instead you have to wait for the 'relational Dijkstra'. Moreover this shows your ignorance of the _nature_ of algorithms. You don't need to invent a new algorithm. All you need to do is _implement_ an algorithm. Do you understand the difference between these things?}

You did not say which form would satisfy you because you seemed to focus more on SQL than the data container. Anyhow, the biggest problem I encounter in implementing that algorithm is figuring out how to read Python (if I mirror that version), and then finding a way to use recursion with tables. Most languages don't support pass-by-value tables. It can be emulated with a stack-like setup, but that gets kind of messy.

[It's tail-recursive. Convert it to a loop. Or look at the pseudocode on the first link on DijkstrasAlgorithm, which has an iterative version with external data storage anyway. -jt]

[If you were to join an open source project or two and try to build useful non-relational applications you'd learn that whilst databases are often useful they only solve a tiny sliver of the set of problems working programmers have to deal with.]

I started out in this biz without relational. It sucked. Linked lists, arrays. Yuck! I have found hash maps are useful for quick interface stuff, but I don't store structures with them. {I'm not proposing the replacement of the databases with persistent hashmaps.}

[You actually have valid ideas about the limitations of OO, hierarchies and the benefits of databases but it all gets clouded by your negative behaviours. I'll sign off by reminding you of Winston Churchill's quip that "a fanatic is someone who can't change their mind and won't shut up".]

I don't see a lot of open minds around here. This wiki is awash in argument by authority and argument by voting.

{{ Voting is pretty uncommon, AFAICS. Anyway no-one takes it seriously. }}

You don't see open minds, because the things you suggest are typically ignorant. You ignore established research

There is a lot of research on "how to do X", but very little objective on how "why X is better than Y". FP, OO, and P/R can pretty much solve most problems with the same amount of code and same amount of change-points per change scenarios. None has a clear monopoly. If there is such research, please link to it!

If you think all those methodologies are equal, then you really are inexperienced.

and then promote you're own ideas as if you're some kind of ignored unsung genius. You want people to pay attention, then learn how things are done before trying to change them. No scientist would be accepted without following established protocol, just as no one will listen to a programmer who completely ignores established practices without offering up anything better.

Protocol my ass, software engineering is a messy art. See DisciplineEnvy. The "discipline" is full of fads and argument-by-authority.

Learn to read there genius, that was a metaphor. Established protocol was referring to science, not programming.

You see, even if tables were in process and lightning fast, they would still be horrible for general purpose use in algorithms. The fact that you think tables sprinkled with sql is better than an object specifically built for a task, does nothing but show your inexperience in real programming. I know, I know, that's outside your domain experience, which is my whole point, you don't really know what you're doing.

{Nobody knows everything. One should spend at least about 7 years in a given domain to get a reasonable feel for it and watch changes come and go in it. If a person lives 70 years, working 40 of those, then they can cover only about 5 or 6 domains under ideal timing, and probably 3 under more realistic conditions. Thus, nobody can know everything about every domain.}

Of course not, which is why you need to listen to what other people who are much better than you tell you. You are admittedly limited to a small domain, thus you should learn to pay attention to real programmers who are far more experienced than you. Far more experienced programmers constantly tell you that you are wrong, yet you constantly argue with them, and you wonder why no one listens.

I will agree that "general purpose" is perhaps too wide, if you agree that this example only shows that one person is crappy at graph algorithms (in any paradigm). Deal? -- top

But, I still believe in the potential of tables. The world is just not ready for them yet. Your arguments appear to be a form of QuertySyndrome?. -- top

No, you just think you are smarter than the world, which is why no one listens. Tables are an inferior solution to most problems, that's why they aren't used.

You have not demonstrated it for "most problems". Only one, and it is mostly QwertySyndrome, not an in-born fault of tables. And, it is not me that is "smarter than everybody else", it is DrCodd that is.

Tables are great for one thing, ad hoc queries of data. That's it, and that's what they are build for, and they do that better than anything else. But that is a very limited area, most problems don't require ad hoc queries, and are much better solved by other structures. Yes yes, we all know you don't agree with this, we can't help it you aren't educated in writing software.

In my domain as I have encountered it so far, I am quite competent. Perhaps you are one of those people who wish real life was like university classes and keeps looking for that lost feeling by digging it out of any corner you can.

{No I'm what's called a programmer. I see you've decided to start the name-calling again. Pity it almost seemed like you were finally learning something.}


1-I wanted to see an attempt to query a graph using sql just so we could all agree on how painful it is to express complex logic/algorithms using _only_ SQL. I think we've all agreed that doing complex logic in SQL is far too painful for it to become then norm. Other programming languages are better for expressing logic.

Nobody ever suggested doing *entire* algorithms with SQL. At least not me. Like I said twice before, sometimes it will solve 90+ percent of a problem, and sometimes zilch. The net benefits for an entire application are positive in my experience. {You have been advocating exactly that for half a decade. It's all in google groups for those who care enough to search.}

2- Maybe I'm a Python bigot but I'm shocked that you can't read the Python code linked to at the top. That's a deeply disturbing admission that makes me question your programming background. Perhaps you're suffering from AntiExperience? In which case there's no point trying to have a conversation with you because you won't have the foggiest notion what I'm on about.

Oh well at least the shallowness of this page: http://www.geocities.com/tablizer/langopts.htm makes sense now.

If you use instant-Python-readabily as a litmus test for general IT intelligence, then I guess I flunk. I studied Python a bit a few years ago, but found some of its syntax and conventions odd. {I just found that it indicates that you can't read code. Which suggests you can't write code which suggests that there's no point in trying to talk to you about code. It's especially damning given that you claim to have studied the language. It wouldn't have been so bad if you'd been living on a desert island and had never seen Python before.}

Handling Change in Requirements

Re: AntiExperience: refers to people who have been doing things wrong for so long that it is actually worse to have them on your team than it is to have inexperienced people.{Just about sums you up.}

For my domain as I experience it, tables are a Godsend. You can perhaps accuse me of over-extrapolating my experience in one niche to other niches, but inside is another thing. I *have* used dedicated structures multiple times where table engines were unavailable, poorly implemented, or not permitted. I see no magic in them. They are not change-friendly. I think the problem you encountered adding weights to your structure is indicative if this problem. If requirements rarely changed, dedicated structures (DS) may be indeed the way to go. But I see change left and right. That is the world I live in. If your world is different, I apologize for over-extrapolating into it and bothering you. There have been 3 demonstrations of change problems with DS here. I made a decent case WRT change-handling. If requirements don't change, perhaps tables are a little more code and a little slower. However, when requirements do change, they handle it better overall. (At least in my domain.{Which consists of building reports on top of databases. And for people whose domain consists only of building static web pages then html is the perfect language. Most of those at least have the sense not to advocate using HTML for everything.})

I will for the sake of argument give you the first two. But you have not demonstrated the 3rd nor 4th.

See AreBusinessAppsBoring


Summary Lessons Learned

{Somebody greatly altered this, and I disagree with the changes. Perhaps we should re-org it into a pro-con list since there is no agreement with the summary. See the formatting near the bottom of AreTablesGeneralPurposeStructures.}


Summary of Arguments

(Note: the indentation level has exceeded browser capabilities)


Re: Some feel that tables structure-wise better absorbed the addition of weights.

I disagree. One relational 'implementation' was provided using a pseudo-language. In other words it wasn't real code.

I was referring to the "container", the structure without consideration of the algorithm. In theory relational divorces table design from usage (other than adding or deleting attributes or tables if not used). One should not have to alter tables just because an algorithm changes. (Work tables may still be needed, but that does not affect existing data.)

[The container is worthless without the algorithm. This discussion is about implementing an algorithm, which the RelationalWeenie hasn't even offered up and implementation yet. You just lost this argument.]

{I lost interest. Maybe I will come back to it another day.}[Is this your cute little way of admitting you're wrong?] {Believe what you want to. Honest, I have no interest in this. It does not relate to actual stuff I see at work.}

Secondly that implementation used weights from the beginning so it did not adapt to weights. It had weights already.

I did not mean it that way. If it started out without weights, one only has to add a column to an existing table. This is compared to changing the container altogether (from list to map). The first approach does not require a change to existing data nor any existing algorithm that uses that data (assuming it won't need weights).

Thirdly the actual working code that was provided adapted pretty well to the addition of weighting. The only thing missing was some kind of layer that provided encapsulation of the underlying data structure behind an interface. Hmm doesn't that sound like an object?

Or a database. A database should be considered an interface, not an implementation.

Really? Is it Oracle? Is it MySQL? Is it MS-SQL? DB2? Postgres? Jet? A tab-delimited flat file? XML? An interface is typically used to hide this. This kind of algorithm shouldn't concern itself with explicit storage details, like which flavor of SQL syntax will work with the chosen tool, or what the magic incantation is to produce a connection to a data source.

It is not always possible to map and wrap divergent interfaces one-to-one. Wrapping has limits. For one, interface A may assume a feature that implementation X simply does not support. It would be interesting to see your dedicated structure wrappers, by the way. I bet the more generic they become, the more they will resemble a table interface.

True, not every implementation provides every feature. It's a question of priorities: for an app, what's more important, the feature or the flexibility? You can either code to the LowestCommonDenominator, or add CapabilityQuerys? to the code, and hopefully, degrade gracefully.

How do you define 'table interface'? If table interface == SQL, then the answer is no. If table interface == associative array of records, well, that's pretty generic, isn't it?

What do you need for the algorithm? SQL's main strengths are aggregate queries (summations), relational queries (joins), and interesting subsets (complex WHERE). If the algorithm doesn't need those, then those features don't need to be in the interface. Specifically, these kinds of queries aren't likely to be needed by typical graph algorithms. SQL can't do a topological sort.

Perhaps not, but perhaps there are algorithms were it can do a large part of the work. Like I said many times, SQL and/or relational rarely do the entire job in applications I work on, but rather help out a lot with "pre-processing" kinds of stuff. How to do with with graph algorithms, I don't know at this point. I haven't done any since long ago in school, and am thus rusty.

(please take a look at this topological sort: http://www.danbala.com/python/tsort.py.html and see if there are ways SQL could help.)

{{This is another example of code that would benefit from manipulating graphs via a relational interface. Currently it will only work with a specific representation, using a list of edges [(from, to) pairs], even though there is no reason for the edges to be ordered.}}

The point about putting storage behind an interface is that this is something that is likely to change. Putting it behind an interface keeps this change from cascading through your code. For any particular app, if you're _certain_ it's not going to change, you can go ahead and code to a specific implementation - but in the long run, you're probably wrong.

DatabasesAreMoreThanJustStorage. Besides, even with inter-vendor differences, databases are still often better future-proofing than dedicated structures.

Why are databases often better future-proofing than dedicated structures?

How do the additional features of databases apply to graph algorithms?

Relational thinking generally assumes that any given piece of information may be used by different tasks, people, departments, etc. in the future. Thus, it tries not to build data layouts that are coupled to any particular usage. In my domain this has been a smart assumption to make. (I won't speak for other domains, however; it gets me into trouble.) Future uses of a given piece of information may have nothing to do with graph algorithms. For a few hundred nodes I see no real problem with copying some info into maps and lists for processing, but this may not extrapolate to hundreds of millions of nodes.

Database Views can be used to decouple layouts from usage. This point seems to me to be mixing levels of abstraction.

Perhaps "layout" was not the best word to describe what I was trying to say.


[ You do realize that we're talking about encapsulating behaviour behind an interface. Not encapsulating data storage. Graphs can be stored in a large number of ways but the relational model/SQL are not sufficiently expressive to implement the majority of the standard algorithms. The lesson to draw from this is that whilst relational is a good way to store data it's a poor way to express behaviour. Therefore in algorithmically sophisticated applications (i.e anything that requires recursion, graphs, deep maths, etc) databases are relegated to mere data storage.

I'd also like to point out that your usage of the word "interface" is unique. That's not what the rest of the world considers an interface. Please define how a database can be an interface? The rest of the world considers a database to be an implementation. Communicating with people who have unique and idiosyncratic definitions of common words is...challenging. Either they're ignorant of the standard definitions or they're iconoclasts who wish to obscure the shallowness of their ideas behind proprietary jargon. ]

Rather than calling somebody stupid or odd, or general ArgumentFromAuthority or ArgumentFromVotes; why not establish some clear criteria for the difference between an interface and an implementation. That would be a more sociable and scientific approach.


Re: but the relational model/SQL are not sufficiently expressive to implement the majority of the standard [graph] algorithms.

{{SQL: not interested. The relational model: yes, of course it's sufficiently expressive.}}

"Standard" by who's measure or proclamation? I agree that the existing algorithms in the literature are not very relational-friendly, but that does not mean that such is not possible.

{{The existing algorithms in the literature are defined in terms of relations (or multirelations, a.k.a. tables). A faithful implementation of the RelationalModel would just allow you to express the mathematics more directly, separating the choice of table representation from the expression of the algorithms.}}

I will be happier if you replace "standard" with "existing" or "known".

[You're not seriously claiming that there are algorithms out there that will be better with relational but we don't know them yet? When something is described as lacking TuringCompleteness they're not just expressing an opinion they're describing a fundamental limitation. That means there are things you just cannot and will not ''ever' be able to express with that thing. Don't blame me that's just life.]

I did not mean the entire algorithm, but rather some kind of pre-processing to simplify the sequential part of the code. I don't know if such exists or not. I have no interest in playing with this problem, for I consider it mostly an academic toy. If you disagree, then so be it.

Mathematical algorithms are already wonders of simplicity, that's what math is all about. Tables don't simplify algorithms, because they can't do anything, Sql doesn't simplify algorithms because it's not capable of expressing them, code however is quite easily capable of directly implementing the algorithm, and thus nothing else is needed and anything else would only complicate the algorithm. Sql is good for getting the necessary data into memory, nothing more.

I disagree. The amount it helps varies widely from algorithm-to-algorithm, but it is quite often helpful.

[If it is often quite helpful as you claim then give examples of algorithms where this 'pre-processing' takes place and where it was helpful.]

See how simple the SQL is in DotProductInManyProgrammingLanguages.


Re: If table interface == associative array of records, well, that's pretty generic, isn't it?

That would not be sufficient because it has only one key or "look-up" criteria.

A set of records could have multiple associative arrays for lookup, and the key for an array could be multiple fields.


I did once see an implementation of trees in Relational technology, it was quite impressive how he managed to get node insertion and searches implemented without a recursive query. As I recall, he put two numbers by each node. He counted depth-first, incrementing the count going both down and up, and put by each node both the down and up numbers. Thus he could tell with a simple query whether a needed node was below any given node (target node number between the query node numbers), and could insert using a collect and iterate rather than a recurse. Of course, this is a TREE not a GRAPH, but it was still interesting. -- AlistairCockburn

Some dialects of SQL have built-in tree-oriented operations. As far as I know, relational rules do not rule out the addition of tree operations, such as traversals starting from a given node.


By the way, I have a question and don't know whether the answer exists or not, but I'm not in this branch of algorithms.... ShortestPathWithTimeDecay is where I moved the question to. -- AlistairCockburn


Here is a ColdFusionLanguage version of the algorithm that uses RAM tables and lists. I noticed that the Python version does not change the structure that stores the node info, so passing table copies repeatedly as variable/object parameters is not necessary to use essentially the same algorithm. The Python version essentially just queries a "global" (pass by ref) navigational structure, which can just as well be a relational table. (Note that Cold Fusion's internal table engine is a bit buggy because they hastily tried to strap a static-typed addon into a dynamically typed language, but the problems do not show here.)

The indentation is a little messed up because tabs and spaces got mixed.

 <!--- ColdFusion Path-finder version --->

<cfset graph = queryNew("fromID,toID")> <!--- create a temp table ---> <cfset myAddRow('A','B')> <cfset myAddRow('A','C')> <cfset myAddRow('B','C')> <cfset myAddRow('B','D')> <cfset myAddRow('C','D')> <cfset myAddRow('D','C')> <cfset myAddRow('E','F')> <cfset myAddRow('F','C')>

<cfdump var="#graph#"> <!--- to inspect the table --->

Results: <cfoutput><br>A-to-D: #find_path('A', 'D')#</cfoutput> <cfoutput><br>D-to-A: #find_path('D', 'A')#</cfoutput> (should be empty) <cfoutput><br>B-to-C: #find_path('B', 'C')#</cfoutput> <cfoutput><br>E-to-A: #find_path('E', 'A')#</cfoutput> (should be empty) <cfoutput><br>E-to-D: #find_path('E', 'D')#</cfoutput>

<!--- --------- ---> <cffunction name="find_path"> <cfargument name="startNode"> <cfargument name="endNode"> <cfargument name="pathList" default="">

<cfset var usePathList = listAppend(pathList, startNode)> <cfif startNode is endNode> <cfreturn usePathList> </cfif> <cfquery name="finder" dbtype="query"> SELECT * FROM graph WHERE fromID = '#startNode#' </cfquery> <!--- make list of toID's so that we don't have to keep the query "open" ---> <cfset endList = valueList(finder.toID)> <cfloop list="#endList#" index="iTo"> <cfif Not listFind(usePathList, iTo)> <cfset newPath = find_path(iTo, endNode, usePathList)> <cfif listLen(newPath)> <cfreturn newPath> </cfif> </cfif> </cfloop> <cfreturn ''> </cffunction> <!--- ----- ---> <cffunction name="myAddRow"> <!--- add a row ---> <cfargument name="p_from"> <cfargument name="p_to"> <cfset queryAddRow(graph)> <cfset querySetCell(graph,'fromID',p_from)> <cfset querySetCell(graph,'toID', p_to)> </cffunction>

Output

 Results: 
 A-to-D: A,B,C,D 
 D-to-A: (should be empty) 
 B-to-C: B,C 
 E-to-A: (should be empty) 
 E-to-D: E,F,C,D 

Here is an experimental non-recursive, non-list cursor-based FoxPro approach. I used to use similar techniques to traverse trees. Such a technique can be useful if a recursive search would fill up RAM so that one wants to use disk instead. (Note that RAM caching is normally used, so it is not "pure" disk.) Besides, it is easier to stop and re-start such a thing if using tables. This algorithm incrementally fills a work table with path segments, and uses the prior set of segment matches to find the next set and marks them to avoid revisiting them. In a way it emulates a recursive call stack. It does not currently work for invalid paths because it will get stuck in an infinite loop, but could probably be fixed by adding a "unique" check before copying in records from the data table. I couldn't find a way to do that in less than about 12 lines, so skipped it for now. The output could be made cleaner also.

 *********

set exact on && typical context settings set deleted on set talk off close data set safety off

do find_path with 'A', 'D' do find_path with 'B', 'C' do find_path with 'E', 'D'

******************************** procedure find_path parameters startNode, endNode

? "Looking from " + startNode + " to " + endNode

select B use wrkGraph.dbf alias work zap

curNode = startNode reloop = .t. foundIt = .f. curRec = 0

select work do while reloop append from graph.dbf for fromID = curNode if .not. eof() .and. curRec <> 0 temp = curRec + 1 goto &temp endif if eof() goto top endif if eof() reloop = .f. else if toID = endNode && found target reloop = .f. foundIt = .t. else if mark <> 'X' curNode = toID replace mark with 'X' endif endif endif enddo

*--- Mark Nodes for Display

if foundIt goto bottom replace mark with 'P' && mark nodes on Path curNode = fromId skip -1 do while .not. bof() && loop backward if toID = curNode replace mark with 'P' curNode = fromID endif skip -1 enddo else ? "Not Found" endif

*--- Display the result nodes

list fields fromID, toID for mark='P' off

return *****************end****************

(Table columns for both tables are fromID, toID, mark. But the original data table does not really need the "mark" column.)

Output:

 Looking from A to D

FROMID TOID A C C D

Looking from B to C

FROMID TOID B D D C

Looking from E to D

FROMID TOID E F F C C D

- Top

(Formatting note: Due to a wiki bug, Mozilla displays a single blank line as 2)


For those interested in seeing other "relational" solutions to graph algorithm problems check out: http://www.cs.ucla.edu/classes/winter06/cs240A/solutions/cities.txt

Closest cities in SQL(99):

WITH ALLD(city1, city2, Miles)

    SELECT * FROM distance
         UNION
    SELECT D.city1, ALLD.city2, D.Miles + ALLD.Miles
    FROM distance as D, ALLD
    WHERE D.city2=ALLD.City1
SELECT city1, city2, min(Miles) FROM ALLD GROUP BY city1, city2

From the discussions above it seems that relational/set theories are formalized and proven methods that haven't or at least hadn't been fully implemented. Whereas Object-Oriented techniques/heuristics don't seem to be based on any formal/proven (or even provable theories), but are being fully implemented. Go figure.

All the solutions presented here exceed in computational complexity Dijkstra's algorithm. None use a heap to extract the min (sorry a table is not a heap). Some do heinous cross products to get an adjacency list. The SQL given above will not terminate if there are cycles in the graph. See http://www.thescripts.com/forum/thread184469.html


CategoryExample, CategoryDataStructure, QueryTraversalVersusRecursion


EditText of this page (last edited August 3, 2007) or FindPage with title or text search