Criteria For Good Math Or Compact Notation

Exchanges in OoLacksMathArgument maintain that the RelationalModel has a good "math or compact notation" and that OO does not, and that "all other things being equal, "having a usable 'math' is better than not having one".

This page attempts to establish criteria by which such statements might be measured in a testable and repeatable (by someone else) way. Think of this as a UnitTest for a putative "math or compact notation".

Can we enumerate some stories about this "math or compact notation"?
Can we identify some measurable and testable criteria for "math", "compact notation", "usable", and "good"?
Can we quantify what we mean by "all other things being equal"?
Can we identify what this math has to do with how code is organized?
Can the findings be applied to all domains, or do different domains require different approaches?

A good example of a compact math notation is the div, grad, and curl notation. Compared to the series of equations it replaces, this notation is easier to write, easier to read, and easier to calculate with.

Criterion: A compact notation focuses on what matters by getting rid of boilerplate.

In my opinion, GangOfFour-style OO patterns are boilerplate for the most part. You copy the patterns instead of referencing them.

It has been suggested by the early researchers of relational theory that relational queries are more compact than their competitors. For example, say you have an existing system and your boss walks in and asks for something like, "I want to know the defect rate of all plastic-containing widgets with more than 3 components manufactured in Houston between these dates and shipped to customer X." Relational notation was more compact and required less "setup work" according to them. They collected "query stories" and used them to test various query systems.

Some background: http://www.mcjones.org/System_R/SQL_Reunion_95/sqlr95-Prehisto.html

Excerpt:

: There was this guy TedCodd who had some kind of strange mathematical notation, but nobody took it very seriously. Ray Boyce was hired at about this time, and we kind of got into this game called the Query Game where we were thinking of ways to express complicated queries. But actually before the Query Game started, I had a conversion experience, and I still remember this. Ted Codd came to visit Yorktown, I think it might have been at this symposium that Irv alluded to. He gave a seminar and a lot of us went to listen to him. This was as I say a revelation for me because Codd had a bunch of queries that were fairly complicated queries and since I'd been studying CODASYL, I could imagine how those queries would have been represented in CODASYL by programs that were five pages long that would navigate through this labyrinth of pointers and stuff. Codd would sort of write them down as one-liners. These would be queries like, "Find the employees who earn more than their managers." [laughter] He just whacked them out and you could sort of read them, and they weren't complicated at all, and I said, "Wow." This was kind of a conversion experience for me, that I understood what the relational thing was about after that.

I suppose it could also be argued that not everything is a query, and thus optimizing your design for just queries may make other things more complicated. But for one, the borderline between query and non-query is a bit fuzzy.

Very little of what I do is a query. -- EricHodges

To each their own.

Saying "... it could also be argued that not everything is a query ..." makes it sound like it could be argued that everything is a query. It can't. Lots of things aren't queries and can't be replaced by queries. -- EricHodges

I suppose that depends on how you define query, but for the sake of argument, let's divide things into query and non-query operations. If you use a notation that is based on queries, then if it grows more complicated over time, you can get such capabilities without encountering a big DiscontinuitySpike. If the query-based notation is not significantly larger or more complex for non-query stuff, then there is not enough of a penalty to counteract the benefits of being able to gently slide into query-land when needed. Plus, you are dealing with one notation instead of two (one for queries, and another for non-queries).

What do you mean by "use a notation that is based on queries"? How do you write a FastFourierTransform with a notation based on queries, for instance? How does any of this apply to programming?

I actually once tried to write a wave-to-frequency-value converter using FoxPro. I would pass a kind of simplified sine wave (or was it a square wave?) template of progressively smaller sizes over the wave point data and subtracted each "side" to find a matching factor. I never finished it, however. It was just a playful experiment. But, using tables made the bookkeeping easier with regard to associating which frequency (template size) corresponded to which point in time, recombining them into a graph, etc. I only had to type "use tableX, browse" to inspect the results during debugging and apply ad hoc data filters. Note that I did not say that tables or queries were good for every application or the entire application. Generally, the query does a good chunk of the processing in biz apps, and the program code does the rest by looping through with IF-statements or whatnot. IF-like decision processing is not a current strong point of query languages, I would note. Well, maybe PrologLanguage fans will disagree. (Perhaps the FFT anecdote should be moved to TableOrientedProgramming.)

It depends what kind of matrix algebra operations are required. Matrix multiplication is just 2 lines in SQL:

select A.i, B.j, sum(A.value*B.value) from A, B where A.j=B.J group by A.i, B.j

Wavelets, for example, require only simple matrix operations and, therefore can be written in SQL. Matrix inversion (and determinant), on the other hand, is hard -- Vadim Tropashko

Story 1:

The math/compact notation enables fast ad hoc queries.

Note that I never claimed "fast" as in machine speed. I rarely time them and tinker only when there is a specific speed complaint. -- top

[I think most folks agree that relational tables enable fast ad hoc queries. It sounds like the math/compact notation we're trying to define here is relational tables and queries. -- bottom]

I dunno. Would not RegularExpressions against flat text satisfy this story? It's pretty fast to query, say, all of wiki for something if you know the syntax.

Doesn't it depend on how the data is stored and represented? Also, how do you do things like Joins in regex or add indexes to avoid sequential searches without changing the calls? Perhaps we can add:

Not tied to a specific physical representation

[It doesn't matter if the story can be satisfied by RegularExpressions on flat files. We're not looking for stories that only this math/compact notation can satisfy, just stories that it does satisfy. (And ad hoc queries using regex on flat files won't be as fast as relational queries unless the flat files have relational table indexes.) -- bottom]

Can we clarify difference between only and does? Maybe the wording is just odd. If a regex search on flat files satisfies "fast ad hoc queries" and "not tied to specific physical representation" then it's a valid candidate for inclusion. No specific dependency on flat files is needed, just suck all the data into RAM and search it as one big character sequence. There's no clear definition of "fast" anyway. -- StevenNewton

[Can we add another story that regex doesn't satisfy? I doubt regex is what -top- means by math/compact notation. -- bottom]

Sure, I would like to see that.

RegularExpressions are indeed compact and have uses. For text parsing they can be really great, but I would not want to use them for extensive data manipulation. Personally, I would like to see something like "generators" instead of RegularExpressions. -- top

Candidate Criteria

Consistent, clear rules
Compact
Relatively simple set of base rules
Transformational into self

The "transformational" rule means that it is possible to convert an expression in the notation to another expression of the same language to provide simplicity, normalization, etc. [Why not use "transformable"?] The conversion or substitution algorithm should be "mechanical" in nature such that a machine can be programmed to do it using well-understood or well-accepted algorithms. BooleanExpression example:

  XOR(A,B) = AND(OR(A,B), NOT(AND(A,B)))

(Note: using prefix notation here)

We can transform the XOR pattern to the longer form and vice versa using Boolean Algebra [see BooleanLogic]. Boolean Algebra is a good example of such notation: A limited set of powerful rules with clear-cut operations and very useful. Just about every ProgrammingLanguage includes some form of it.

Simplify NOT(NOT(NOT(A OR B) OR D) OR NOT(NOT(NOT D OR NOT(C OR D)) OR B)).

Now where did I put my transformation software?'

Can't you have a class that gives the best of both worlds? Something like

  class BooleanObject : SymbolicLogic //assume can be expanded to other branches
  {
    BooleanObject Lhs; //empty if Op.type != Binary
    BooleanOperator Op; //and=^,or=v,not=~ etc
    BooleanObject Rhs;
    BooleanRules* Rules; //transformation rules

    BooleanObject(string Exp); //constructor 
    BooleanObject Transform(BooleanObject); //transforms to CanonicalForm using the Rules
    BooleanObject Eval(BooleanObject); //evaluate to 0=false,1=true or other convention ie -1 for true
  };

Example usage:

  BooleanObject o("1^(~0 v 1)");
  cout << Transform(o);
  cout << Eval(Transform(o));

Similar to SymbolicCpp?

Perhaps, but attempts to extrapolate such to RelationalLanguages has not been very pretty. ExpressionApiComplaints.

HaskellDb uses plain HaskellLanguage constructs as a library of RelationalAlgebra combinators, allowing one to access databases using Haskell code (i.e., without constructing "SELECT X FROM Y WHERE Z" strings). The code is concise and can express any relational algebra operation.

What command(s) actually go to the database engine? Or is the implementation done in Haskell? If so, what would it look like if other languages wanted to access the same data?

I'm moving this conversation to HaskellDb since it's kind of OffTopic here, but OnTopic there.

Ironically, OOP is often more compact than relational! OOP is terse - just look at dot notation, or Ruby, the way you can do lots of things with short lines of code. I don't think compact notation equals good math, because OOP is a compact notation, even though oop lacks math. English lacks math too: english probably started out as an informal mumbling and grunting between hairy humans long ago, without any math backing it up. English is not wrong/bad just because it is (or started out) as informal. One of the reasons people use object relational mappers is because of the terse notation which ends up being easier on the eyes than verbose SQL.

It depends heavily on the language and what is being done with it. SQL was not necessarily designed to be compact, and there are verbose OOP languages.

Even verbose languages like Java offer more terse notation and more power for accessing GUI than a relational language (as discussed in TableOrientedGuiDiscussion). In order to setColor of a widget in java, it is just a matter of terse dot notation syntax and calling a method. Whereas with relational you have no methods, so you are left castrated.. when you update the color column in the database, it doesn't do anything, other than change the text in the database - it doesn't run the code like an object does with its terse syntax. OOP combines both data and code whereas database is just data storage. A novel database no one has yet invented, might be one that has side effects on the client machine.. it could launch events like OOP does. Whether it is violating relational, or a good idea, is to be discussed and debated.

Most existing relational languages are not optimized for compact attribute setting because it's generally assumed some kind of CrudScreen or feeder mechanism will be used instead of a query language. Data entry and query writing are generally considered different activities in database-land. As far as GUI-specific issues, I'll leave that to GUI-specific topics. Tools tend to be optimized for their intended purpose and common usage patterns for that intended purpose. It's my opinion that Java is not well suited for queries at this stage, especially when it comes to cross-app-language usage, many-to-many relationships, response time for large data sets, and concurrency handling. Further, databases better blur the distinction between RAM and disk. And DatabasesAreMoreThanJustStorage. -t

Java isn't suited for queries unless you bring in third party tools like SQL, or some other third party tool.