Table Oriented Code Management Discussion

Discussions based on TableOrientedCodeManagement.

Moved from SeparationOfConcerns:

Working definitions/abbreviations:

I disagree that a GPPL is up to the task for non-trivial systems, and some kind of database-like tool is needed so that one can sift, sort, filter, group, and re-project their view of the concerns per coding, debugging, or reading need. Textual code is insufficient. (Somewhere we already had this debate IINM.) -top

Your statement assumes that GPPLs necessarily lack such features; that assumption seems unjustifiable. Regardless, I think you grossly overrate the potential benefits of your SeparationAndGroupingAreArchaicConcepts design, and I think the reason you overrate it is that you haven't tried - even on paper - to design a language that would use this design. Therefore, you haven't run into the hairy mess of constructing, identifying, classifying, annotating, and describing views over code in some manner the IDE could leverage, especially when dealing with woven code (e.g. consider ErrorHandling without exceptions). I suspect that anything you can usefully view separately (meaning: you don't need the context of the code to make sense of what it's doing), you can already code separately. Separation and annotation could be made much easier with an integrated IDE and Language, but handling of CrossCuttingConcerns still comes down to the basic language primitives no matter whether you express them in text or some sort of Database+GraphicalProgramming mix.

As far as blurring the line between a GPPL and a database; yes it can be done. I will merely consider it an instance of GreencoddsTenthRuleOfProgramming, which you should have easily forecasted if you knew my position well. And I have used databases to "index" modules, routines, variables, sql, schemas, and various properties of these and found such contraptions quite useful even though they lacked many advanced TOP-code-management possibilities. (Perhaps I should describe some of it's features in another topic.) I've thus tried the Yugo version of TableOrientedCodeManagement and found it had pleasant properties. I'm sure the Ferrari version, if built, would be at least as nice. (And, I believe that exception-based error-handling is overrated.) --top

Your 'index' of existing code divisions is useful (IntegratedDevelopmentEnvironments of all sorts do it) but doesn't say much at all regarding the ability to "sift, sort, filter, group, and re-project their view of the [relevant] concerns per coding, debugging, or reading need". The hairy problem is in constructing, classifying, identifying, and annotating code, and describing views over code, in a manner that allows distinction of concerns relevant to the developer. Until you face that problem you'll have a huge "magic happens here" step buried in your assumptions. With regards to your 'Yugo' analogy: you don't have a Yugo; you have a 'bicycle' version of TOP-code-management. It's nice. It's useful. Bicycles are useful, but they're also human-powered, much like indexing existing code divisions made by humans. You have yet to build an internal combustion engine fit for a Yugo, yet here you are dreaming of a Ferrari...

Also, I suggested "consider ErrorHandling without exceptions" in the context of "dealing with woven code." Error handling woven into code is simply well known and easy to compare to alternative mechanisms (such as try/catch exceptions and ResumableException). Your off-hand opinion as to the virtue of exception-based error-handling is not useful to the subject.

Another advantage of TOCMS is that you generally don't have to reinvent and/or recompile an entire engine from scratch for each different language. Most IDE's don't handle that very well (unless they reinvent a database-like system internally).

Scenario A: I want a listing of all functions with "fwd" in their name which are called from functions that use Oracle transactions and that are in modules which are members of the "security" category or have "security" in their module name. And I want to export the list to a spreadsheet. In the listing I want the function name, the calling function name, and the file and path name of the calling function.

Still relies on a separations already made by people: "functions", "modules", explicit "security" classifications. Further, such a query does not operate at the 'concern' granularity except insofar as you rely on SeparationOfConcerns (into functions) having already been accomplished. I'll write up a Scenario B for contrast.

Scenario B: I want a listing of all code that processes errors emitted after security violation by code that interacts with an Oracle database. This code is usually tightly interwoven with HappyPath code and Logging calls that I am not concerned with and do not want in my view. For debugging, I want a live view of this code so that any entry causes a breakpoint after a security violation. I would also like the ability to edit the code, e.g. to add some counters.

Notice that Scenario B asks for a view specific to my concerns, as opposed to requiring SeparationOfConcerns already be achieved in the function & module breakdown. The relevant questions are: is this sort of view of concerns feasible? What sort of language and IDE design would make it possible? To what degree could such a view be sensible and reasonably edit-able if code is simply tossed into a view without larger context? Could there be complex expressions (e.g. parameter evaluation with side-effects) in such a language, or would there be a lot of compromises in how things are expressed to better support classification?

What do you mean by "processing errors emitted". Do you mean a log of run-time errors?

No. The clarification was already there: errors emitted after security violation. I did not specify how errors are emitted, therefore I don't care: I want to know regardless of how errors are emitted (e.g. ErrorCode vs. exception vs. process signal). Whether the errors are logged or not would, I suspect, be included in the code that processes errors emitted after security violations, so I'd probably have it in my view excepting that I specifically requested exclusion of logging calls from my view.

The answer depends heavily on how such information is encoded and what kind of access one has to that info. For example, are the related conventions hard-wired into the app language, or are they based on shop conventions? How is "interacts with Oracle" encoded and/or determined? These are all very language and/or shop-specific questions. I couldn't give a general answer without writing lots of possibility-coverage branches. -t

"The answer depends" you say. It is my impression that you do not remember the relevant questions.

I read it again, and it's still as vague and obtuse as my first pass. You have a very odd writing style. You think weird.

Unlike you, I have a memory that goes back more than one paragraph. Tell me: to which question, precisely, were you suggesting "the answer depends heavily?" Don't give me any bullshit: give me a quoted question. If it isn't a question I asked, or if your evasions don't make sense relative to the question - and I am certain that at least one of those two conditions will be true - then you should seriously reconsider which of us "thinks weird".

Your scenario is simply too open-ended on multiple fronts. That's why I suggest sample pseudo-code, preferably using C-style syntax because most readers are familiar with it, your error log layout and location (file, memory, DB), etc. If you give specific details like that, then I can formulate or comment on a specific solution without having to address different open possibility paths about languages and technology used (files, ram, DB, etc). -t

You have claimed that TOCM allows one to "sift, sort, filter, group, and re-project [the programmer's] view of the concerns per coding, debugging, or reading need." Scenario B expresses a specific set of concerns that need to be sifted, sorted, filtered, grouped, and re-projected. You have also made claims about 'compilers' and IDEs that integrate TOCM principles somehow offering an advantages. Scenario B, which doesn't present any particular formulation of code, is open-ended enough that you can decided how to integrate these TOCM concerns into the code. Indeed, that's one of the relevant questions for Scenario B: "What sort of language and IDE design would make it possible?" If you feel a pseudo-code example and C language syntax is appropriate to answer, then provide it. If you feel Scenario B is irrelevant - that the ability to sift, sort, filter, group, and re-project concerns somehow excludes error-handling and logging, or somehow requires knowing the precise layout of the log (file, memory, DB), etc - then I suggest you explain why TOCM has these limitations rather than complaining to me about leaving you ample room to answer.

Besides, you reap what you sow; if you had provided ample pseudo-code for Scenario A, then maybe your desire for it in Scenario B wouldn't strike me as hypocrisy.

Why didn't you finish exploring Scenario A before introducing another scenario?

Scenario A is logically counter-productive. Your argument above started by saying "I disagree that a GPPL is up to the task [of SeparationOfConcerns] for non-trivial systems, and some kind of database-like tool is needed [...]". And yet here you are, presenting Scenario A, which assumes that the GPPL was up to the task of SeparationOfConcerns - in particular, that the security concerns are separated into dedicated functions and modules. When you're done exploring Scenario A, perhaps you can sell your audience some dehydrated water...

Granularity of Classification

"Separated into" is a matter of degree. With enough programming, everything could be integrated into everything such that each token can be isolated and manged independently. I'm being more realistic in not assuming we have unlimited programming at our disposal, and am assuming that code is tracked to the function level. Now, a function may be tracked has HAVING security-related code in it (belongs to a "security" category, among others), and that level is good enough for most needs. Thus, the security concerns are "separated" (separate-able) up to the function level of granularity. If your claim is that the utility of such a system is worse than what exists now without token-level-granularity, then please clarify such. Further note that the function-granularity is a design decision based on practicality, not something inherently limited by the usage of TOP/RDBMS to manage code. I believe function-level to be a practical balance between tool simplicity and power roughly comparable to page-level physical book indexing having been "good enough" for most publications. -t

Okay, so you're effectively recanting what you stated earlier. You are depending on the GPPL be "up to the task" of SeparationOfConcerns up to the 'function level'. Your use of TOCM is an IDE feature that allows you to 'tag' 'named' functions with a set of named concerns - i.e. to help organize and manage code. I'm not saying this is "worse than what exists now", only that it's far from what you have been suggesting TOCM supports.

By the way, there are many levels of granularity between token-level and function-level. You could also work with anonymous functions, blocks, conditional cases, statements, expressions, sub-expressions.

There are reasons to disfavor function-level granularity:

I favor the FirstClass 'module' level granularity used in such varied languages as ML, Oz, Maude, and Newspeak. And even the SecondClass 'module' level seen in Erlang has the implementation-hiding advantages. You can make the function-level work, though I wouldn't judge it 'GoodEnough' for shared-library development.

Automated Concern Recognition

As far as how the TOCMS "knows" a function is security-related, again this depends on the language and/or shop conventions. If all security-related concerns have to use a certain API, then code parsing may be able to automatically identify security-related functions. How "perfect" this is depends on how perfect we want to build our parser. I've used similar techniques to identify SQL code to "mark" routines/modules as having SQL with roughly 98% accuracy using API conventions and sometimes SQL itself. I could have increased the accuracy with enough twiddling of the recognizer algorithm, but felt it not worth the trade-off. 90% was a big improvement over the roughly 70% doing it the old-fashioned way, when joined with other concerns. And faster. -t

Considering that one of my questions was "what sort of language and IDE design makes it possible", saying "this depends on the language and/or shop conventions" looks like hedging. But, based on what you said so far: the programming language must have 'named functions' (i.e. functions that may be distinguished by some sort of key or external identifier). The IDE must offer the ability to assign attributes to this identifier such that shops and individual libraries may each have their own conventions. Finally, you favor supporting 'pluggable' recognizers that can regenerate assignment of certain attributes. Would this be a correct summary?

I won't commit to "must", but I'll agree to that for now, pending a strong reason to deviate. And the custom library recognition is more a feature/benefit, not a limit. My observation is that the need for such is common.

Deriving from the above: greater support for SeparationOfConcerns is better, to maximize the degree to which different functions handle different concerns. One wishes to avoid 'white-washing' where nearly every function gets tagged with a CrossCuttingConcern (logging, error-handling, concurrency management, etc.).

Maybe that pattern indicates a design smell, such as under-use of SeparateIoFromCalculation. I couldn't say without seeing actual code. But more importantly, the utility of a TOCMS often comes from cross-referencing generic concerns with specific concerns, like the intersection between a specific table, a specific function/module call/reference, and a general concern.

Additionally, for software with a performance risk (PerformanceMatters often when writing SystemsSoftware and certain classes of competitive software (like games and browsers)), the language must optimize/inline well such that SeparationOfConcerns isn't synonymous with 'performance penalty'. Neither of these would be necessary if TOCM delivered on your original statements - TOCM as an alternative to language-level SeparationOfConcerns, SeparationAndGroupingAreArchaicConcepts, etc.

Well, I don't have a lot of experience in performance-centric domains, so I won't comment on that. Again, I don't claim to optimize for every possible domain. Maybe a special TOCMS kit fork is needed for some.

Alternative C from TableOrientedCodeManagement would be more conducive to sub-function granularity. However, going below that level, it may get dicey to edit the function as a unit. -t

Is the originator of TableOrientedCodeManagement aware that a body of source code is already a database in the ideal conceptual representation for code analysis and management, and that any syntax-aware IDE, editor, or other grammar-driven mechanism can trivially parse that source code and support any number of user views and queries of any kind?

A tabular representation may, for some purposes, be appropriate for result of some query on the source code, but I'm hard pressed to see why the code should normally be represented that way. A tabular structure seems, at the very least, to be inflexible, awkward, and suboptimal for the sort of tasks typically performed on source code, and for the sort of queries one might wish to ask of it.

I haven't seen any code editors that offer flexible query techniques and also allow custom multiple categorization. I'm sure such can be built, but so far they don't exist or are specific to one language and still limited. And, why reinvent the database from scratch to do it? Most of the existing tools are focused on syntax and code-completion, not bigger-structure analysis and management. I agree though it may be practical to split up language-specific and syntax issues from the bigger-structure management system so that they can be mixed and matched as needed so that 1,000 wheels don't have to be reinvented for 1,000 app languages. It's a matter of agreeing where one starts and one ends and the common interface between them.

Such code editors don't exist (or are very obscure) because there hasn't been sufficient motivation to build them. However, various lints (see arguably fit that category. In 30+ years of writing code, I can count on one hand the times I've wanted a complex "code query" facility. I suppose if I had one I'd use it more -- I can appreciate its value -- but that still isn't enough to motivate me to build one.

I'm not sure what you mean about "reinvent the database from scratch to do it". Why would I have to do that? It sounds like a lot of work to create a database schema, write a parser to dump source code into the database, then write useful code-oriented relational queries against the schema, plus (presumably) provide some way of reconstructing linear source code from the database. It sounds like relatively little work to create a parser, write a 'code algebra' to work with the parser, and write an appropriate language to construct 'code algebra' queries. Of course, constructing such a tool presumes the ability to write parsers, design algebras, and construct query languages. If you know nowt of such things, perhaps using an existing DBMS system might be a (clunky) alternative.

Why would it be a lot of work to write SQL queries to an existing schema? Plus, there's existing tools and conventions such as QueryByExample that make query writing mostly a matter of dragging and clicking. And what would this "code algebra" look like? Why should someone have to learn such when most developers worth any salt already know SQL?

The sort of queries one would probably want to issue against a codebase may or may not be easy to write depending on the structure of the schema, but intuitively I suspect they're not going to be easy to write.

For example, how would you write a SQL query to retrieve all lines of code that are three lines after a line that invokes any two-parameter log() method in any class inherited from either the Spooler class or the Logger class but not the FileLogger class or any of its subclasses? Using a hypothetical made-up-on-the-fly code algebra, I might write something like:

    invokes("log(?, ?)", 
            classtree("Spooler", $) + classtree("Logger", $) - classtree("FileLogger", $)))
Each named operator accepts two parameters -- a specification of a subset of a collection of source lines, and a collection of lines.

Each operator returns a collection of lines. Note that a "collection of lines" is necessarily conceptual, and not simply an actual collection of lines statically extracted from the source. For example, 'classtree' can't simply return (say) the lines in a class definition and all its derivatives, because invocations of the log() method could be found outside that hierarchy.

'+' is a binary operator that returns a collection of lines consisting of the union of its operands.

'-' is a binary operator that returns the collection of lines in the first operand minus those in the second operand.

'$' represents the sourcebase itself.

Why learn it? Because it is targetted to making code inquiries, and therefore (if properly designed) should be simpler than the equivalent SQL for querying code.

And, I agree that for a well-written code-base, such tools may be more trouble than their worth to set up. However, when a BigBallOfMud is dumped in one's lap, than code analysis becomes much more useful. I've only built TOCMS for BigBallOfMud's.


EditText of this page (last edited July 14, 2010) or FindPage with title or text search