Table Oriented Code Management Discussion

Discussions based on TableOrientedCodeManagement.

Working definitions/abbreviations:

TOCMS - Table-oriented code management system
GPPL - general-purpose programming language

I disagree that a GPPL is up to the task for non-trivial systems, and some kind of database-like tool is needed so that one can sift, sort, filter, group, and re-project their view of the concerns per coding, debugging, or reading need. Textual code is insufficient. (Somewhere we already had this debate IINM.) -top

Your statement assumes that GPPLs necessarily lack such features; that assumption seems unjustifiable. Regardless, I think you grossly overrate the potential benefits of your SeparationAndGroupingAreArchaicConcepts design, and I think the reason you overrate it is that you haven't tried - even on paper - to design a language that would use this design. Therefore, you haven't run into the hairy mess of constructing, identifying, classifying, annotating, and describing views over code in some manner the IDE could leverage, especially when dealing with woven code (e.g. consider ErrorHandling without exceptions). I suspect that anything you can usefully view separately (meaning: you don't need the context of the code to make sense of what it's doing), you can already code separately. Separation and annotation could be made much easier with an integrated IDE and Language, but handling of CrossCuttingConcerns still comes down to the basic language primitives no matter whether you express them in text or some sort of Database+GraphicalProgramming mix.

As far as blurring the line between a GPPL and a database; yes it can be done. I will merely consider it an instance of GreencoddsTenthRuleOfProgramming, which you should have easily forecasted if you knew my position well. And I have used databases to "index" modules, routines, variables, sql, schemas, and various properties of these and found such contraptions quite useful even though they lacked many advanced TOP-code-management possibilities. (Perhaps I should describe some of it's features in another topic.) I've thus tried the Yugo version of TableOrientedCodeManagement and found it had pleasant properties. I'm sure the Ferrari version, if built, would be at least as nice. (And, I believe that exception-based error-handling is overrated.) --top

Your 'index' of existing code divisions is useful (IntegratedDevelopmentEnvironments of all sorts do it) but doesn't say much at all regarding the ability to "sift, sort, filter, group, and re-project their view of the [relevant] concerns per coding, debugging, or reading need". The hairy problem is in constructing, classifying, identifying, and annotating code, and describing views over code, in a manner that allows distinction of concerns relevant to the developer. Until you face that problem you'll have a huge "magic happens here" step buried in your assumptions. With regards to your 'Yugo' analogy: you don't have a Yugo; you have a 'bicycle' version of TOP-code-management. It's nice. It's useful. Bicycles are useful, but they're also human-powered, much like indexing existing code divisions made by humans. You have yet to build an internal combustion engine fit for a Yugo, yet here you are dreaming of a Ferrari...

You have it backward: the tools I've built use primarily automated techniques to classify stuff. You are correct that classification "hints" have to be manually added, where-as a classification system built into a GPPL could enforce some of such. But a "compiler" that's closely tied with a TOP code management system could also check and flag stuff, such as "Routine X has no classification tag". The main "build script" could run the TOCMS along with the usual language compiler (assuming a compiled language) and/or unit test scripts. -t
I have it correct, not backward. IDE's of all sorts automate the indexing itself. The issue is in distinguishing and indexing concerns that aren't ALREADY broken down into separate routines, such that one may usefully "sift, sort, filter, group, and re-project" at the concern granularity rather than code granularity.
The advantage of a TOCMS over existing IntegratedDevelopmentEnvironment's is that one can use off-the-shelf TOP/CRUD tools on them. You're not stuck with the interface that the vendor gives you and don't have to learn screwy API's that reinvent DatabaseVerbs in their idiosyncratic limited way. All you need is a description of the schema.
Okay, so, to be clear, you are disclaiming the ability to "sift, sort, filter, group, and re-project their view of the concerns per coding, debugging, or reading need" as an advantage of TOCMS?
I'm not sure what you are asking. How about looking at Scenario A (below).

Also, I suggested "consider ErrorHandling without exceptions" in the context of "dealing with woven code." Error handling woven into code is simply well known and easy to compare to alternative mechanisms (such as try/catch exceptions and ResumableException). Your off-hand opinion as to the virtue of exception-based error-handling is not useful to the subject.

Another advantage of TOCMS is that you generally don't have to reinvent and/or recompile an entire engine from scratch for each different language. Most IDE's don't handle that very well (unless they reinvent a database-like system internally).

Scenario A: I want a listing of all functions with "fwd" in their name which are called from functions that use Oracle transactions and that are in modules which are members of the "security" category or have "security" in their module name. And I want to export the list to a spreadsheet. In the listing I want the function name, the calling function name, and the file and path name of the calling function.

Still relies on a separations already made by people: "functions", "modules", explicit "security" classifications. Further, such a query does not operate at the 'concern' granularity except insofar as you rely on SeparationOfConcerns (into functions) having already been accomplished. I'll write up a Scenario B for contrast.

Scenario B: I want a listing of all code that processes errors emitted after security violation by code that interacts with an Oracle database. This code is usually tightly interwoven with HappyPath code and Logging calls that I am not concerned with and do not want in my view. For debugging, I want a live view of this code so that any entry causes a breakpoint after a security violation. I would also like the ability to edit the code, e.g. to add some counters.

Notice that Scenario B asks for a view specific to my concerns, as opposed to requiring SeparationOfConcerns already be achieved in the function & module breakdown. The relevant questions are: is this sort of view of concerns feasible? What sort of language and IDE design would make it possible? To what degree could such a view be sensible and reasonably edit-able if code is simply tossed into a view without larger context? Could there be complex expressions (e.g. parameter evaluation with side-effects) in such a language, or would there be a lot of compromises in how things are expressed to better support classification?

What do you mean by "processing errors emitted". Do you mean a log of run-time errors?

No. The clarification was already there: errors emitted after security violation. I did not specify how errors are emitted, therefore I don't care: I want to know regardless of how errors are emitted (e.g. ErrorCode vs. exception vs. process signal). Whether the errors are logged or not would, I suspect, be included in the code that processes errors emitted after security violations, so I'd probably have it in my view excepting that I specifically requested exclusion of logging calls from my view.

The answer depends heavily on how such information is encoded and what kind of access one has to that info. For example, are the related conventions hard-wired into the app language, or are they based on shop conventions? How is "interacts with Oracle" encoded and/or determined? These are all very language and/or shop-specific questions. I couldn't give a general answer without writing lots of possibility-coverage branches. -t

"The answer depends" you say. It is my impression that you do not remember the relevant questions.

I read it again, and it's still as vague and obtuse as my first pass. You have a very odd writing style. You think weird.

Unlike you, I have a memory that goes back more than one paragraph. Tell me: to which question, precisely, were you suggesting "the answer depends heavily?" Don't give me any bullshit: give me a quoted question. If it isn't a question I asked, or if your evasions don't make sense relative to the question - and I am certain that at least one of those two conditions will be true - then you should seriously reconsider which of us "thinks weird".

Your scenario is simply too open-ended on multiple fronts. That's why I suggest sample pseudo-code, preferably using C-style syntax because most readers are familiar with it, your error log layout and location (file, memory, DB), etc. If you give specific details like that, then I can formulate or comment on a specific solution without having to address different open possibility paths about languages and technology used (files, ram, DB, etc). -t

You have claimed that TOCM allows one to "sift, sort, filter, group, and re-project [the programmer's] view of the concerns per coding, debugging, or reading need." Scenario B expresses a specific set of concerns that need to be sifted, sorted, filtered, grouped, and re-projected. You have also made claims about 'compilers' and IDEs that integrate TOCM principles somehow offering an advantages. Scenario B, which doesn't present any particular formulation of code, is open-ended enough that you can decided how to integrate these TOCM concerns into the code. Indeed, that's one of the relevant questions for Scenario B: "What sort of language and IDE design would make it possible?" If you feel a pseudo-code example and C language syntax is appropriate to answer, then provide it. If you feel Scenario B is irrelevant - that the ability to sift, sort, filter, group, and re-project concerns somehow excludes error-handling and logging, or somehow requires knowing the precise layout of the log (file, memory, DB), etc - then I suggest you explain why TOCM has these limitations rather than complaining to me about leaving you ample room to answer.

Besides, you reap what you sow; if you had provided ample pseudo-code for Scenario A, then maybe your desire for it in Scenario B wouldn't strike me as hypocrisy.

Why didn't you finish exploring Scenario A before introducing another scenario?

Scenario A is logically counter-productive. Your argument above started by saying "I disagree that a GPPL is up to the task [of SeparationOfConcerns] for non-trivial systems, and some kind of database-like tool is needed [...]". And yet here you are, presenting Scenario A, which assumes that the GPPL was up to the task of SeparationOfConcerns - in particular, that the security concerns are separated into dedicated functions and modules. When you're done exploring Scenario A, perhaps you can sell your audience some dehydrated water...

Granularity of Classification

"Separated into" is a matter of degree. With enough programming, everything could be integrated into everything such that each token can be isolated and manged independently. I'm being more realistic in not assuming we have unlimited programming at our disposal, and am assuming that code is tracked to the function level. Now, a function may be tracked has HAVING security-related code in it (belongs to a "security" category, among others), and that level is good enough for most needs. Thus, the security concerns are "separated" (separate-able) up to the function level of granularity. If your claim is that the utility of such a system is worse than what exists now without token-level-granularity, then please clarify such. Further note that the function-granularity is a design decision based on practicality, not something inherently limited by the usage of TOP/RDBMS to manage code. I believe function-level to be a practical balance between tool simplicity and power roughly comparable to page-level physical book indexing having been "good enough" for most publications. -t

Okay, so you're effectively recanting what you stated earlier. You are depending on the GPPL be "up to the task" of SeparationOfConcerns up to the 'function level'. Your use of TOCM is an IDE feature that allows you to 'tag' 'named' functions with a set of named concerns - i.e. to help organize and manage code. I'm not saying this is "worse than what exists now", only that it's far from what you have been suggesting TOCM supports.

What specific statement are you referring to?
That TOCM allows one to "sift, sort, filter, group, and re-project [the programmer's] view of the concerns per coding, debugging, or reading need." In particular, you implied that TOCM offers these features where the GPPL is not "up to the task"... i.e. that TOCM has some magic to support independent 'viewing' and 'filtering' concerns that are not separate in source code. Your position on SeparationAndGroupingAreArchaicConcepts makes a similar assertion.
I'm still not clear on what you are getting at.
It still baffles me how you're confused as to what I'm getting at. This entire discussion stemmed from an objection to those words you stated earlier that you are now contradicting. You never should have suggested TOCM as an alternative to achieving SOC in the GPPL.

By the way, there are many levels of granularity between token-level and function-level. You could also work with anonymous functions, blocks, conditional cases, statements, expressions, sub-expressions.

There are reasons to disfavor function-level granularity:

It is often useful to export things other than functions such as types, value constants, and syntax.
- If they are designated as re-usable, they could be managed as separate "include files" if we simply don't make the function-level mandatory (or as a dummy "main" function/module). The "includness" can be at the language level or by the code manager independent of the language.
- It would be smoother if TOCM simply accounted for these things normally. Modern LanguageDesign is often concerned quite directly with these sorts of modularity issues.
- I'd lean toward not tainting the tool with language-specific complexity if existing features handle it sufficiently. If I see specific examples of "failures" for a language or shop conventions, then I may consider other options. Perhaps a "type" can be treated the same way as a class is. The conceptual difference and possibly the syntactic difference is minor in many languages.
- Are you imagining TOCM being entirely independent of the language, using HotComments or something? Because when I hear you discuss TOCM, I imagine that the goal of TOCM is to have the Database as the primary code repository... perhaps a temporal database for versioning. Programmers are presented with editable projections. In this ultimate vision, there are no "separate include files". Where would you include them?
It is often useful to hide functions used internally ('helper' functions) to a module to avoid accidental coupling against implementation details that are likely to change.
- Example? I thought we're already at the function level? "Hiding" is a matter of filtering views.
- To 'hide' a function isn't merely to not view it, but to prevent coupling (i.e. so that other modules cannot link to it). It mostly matters when developing shared libraries. Avoid coupling to helper-classes so you don't have BackwardsCompatibility nightmares a few years down the line when you want to change implementation details. boost::detail namespace is used for this purpose, and is open source, so feel free to rake it for examples.
It doesn't really cover FirstClass constructs (FirstClassFunctions, for example, don't really fit into your schema).
- True, but it's not a technique used often in my domain. Nothing will make every domain happy and I don't intend to try to make everybody 100% happy.
- So long as you're clear on that.
It doesn't touch on DependencyInjection, PolicyInjection, ConfigurationManagement, and related concerns (this is a limitation/weakness of function-level binding).
- A very small percent of the world cares about your pet security system. I know you think it's important, but the audience doesn't.
- DependencyInjection, PolicyInjection, and ConfigurationManagement relate only very peripherally to security. I'm not sure why you bring 'security' up at all. That said, if people - as you say - don't care about security, then how do you account for $2.46 billion spent on anti-virus software, the repeated advertisements regarding identity theft, massive DigitalRightsManagement efforts, the popularity of SSH over Telnet, the explosive popularity of HTTPs, the use of pseudonyms like 'TopMind', and the common use of VirtualPrivateNetwork for telecommuting? People care about security. They also care how it is achieved: they want security without cost to flexibility, reuse, performance, or various limbs.
- I didn't say people "didn't care about security". You are putting words in my mouth. I meant people don't care about your particular security/type-checking system.
- And how would that be related to anything I said earlier? Sure, I agree that they don't care about my particular implementation for security (which is largely independent of TypeSafety (DynamicTyping is sufficient, as exemplified by EeLanguage)). What they do care about are features offered by the brand of security I promote (which is called ObjectCapabilityModel/CapabilitySecurityModel): flexibility, reuse, performance, modularity, simplicity, sharing, multi-organization support. There may be other SecurityModels that offer the same features. But, still, how is security related to DependencyInjection, PolicyInjection, or ConfigurationManagement? The words coming from your mouth made no sense in this context, so I was reasonably hard-pressed to guess your meaning.

I favor the FirstClass 'module' level granularity used in such varied languages as ML, Oz, Maude, and Newspeak. And even the SecondClass 'module' level seen in Erlang has the implementation-hiding advantages. You can make the function-level work, though I wouldn't judge it 'GoodEnough' for shared-library development.

Automated Concern Recognition

As far as how the TOCMS "knows" a function is security-related, again this depends on the language and/or shop conventions. If all security-related concerns have to use a certain API, then code parsing may be able to automatically identify security-related functions. How "perfect" this is depends on how perfect we want to build our parser. I've used similar techniques to identify SQL code to "mark" routines/modules as having SQL with roughly 98% accuracy using API conventions and sometimes SQL itself. I could have increased the accuracy with enough twiddling of the recognizer algorithm, but felt it not worth the trade-off. 90% was a big improvement over the roughly 70% doing it the old-fashioned way, when joined with other concerns. And faster. -t

Considering that one of my questions was "what sort of language and IDE design makes it possible", saying "this depends on the language and/or shop conventions" looks like hedging. But, based on what you said so far: the programming language must have 'named functions' (i.e. functions that may be distinguished by some sort of key or external identifier). The IDE must offer the ability to assign attributes to this identifier such that shops and individual libraries may each have their own conventions. Finally, you favor supporting 'pluggable' recognizers that can regenerate assignment of certain attributes. Would this be a correct summary?

I won't commit to "must", but I'll agree to that for now, pending a strong reason to deviate. And the custom library recognition is more a feature/benefit, not a limit. My observation is that the need for such is common.

Deriving from the above: greater support for SeparationOfConcerns is better, to maximize the degree to which different functions handle different concerns. One wishes to avoid 'white-washing' where nearly every function gets tagged with a CrossCuttingConcern (logging, error-handling, concurrency management, etc.).

Maybe that pattern indicates a design smell, such as under-use of SeparateIoFromCalculation. I couldn't say without seeing actual code. But more importantly, the utility of a TOCMS often comes from cross-referencing generic concerns with specific concerns, like the intersection between a specific table, a specific function/module call/reference, and a general concern.

Additionally, for software with a performance risk (PerformanceMatters often when writing SystemsSoftware and certain classes of competitive software (like games and browsers)), the language must optimize/inline well such that SeparationOfConcerns isn't synonymous with 'performance penalty'. Neither of these would be necessary if TOCM delivered on your original statements - TOCM as an alternative to language-level SeparationOfConcerns, SeparationAndGroupingAreArchaicConcepts, etc.

Well, I don't have a lot of experience in performance-centric domains, so I won't comment on that. Again, I don't claim to optimize for every possible domain. Maybe a special TOCMS kit fork is needed for some.

Alternative C from TableOrientedCodeManagement would be more conducive to sub-function granularity. However, going below that level, it may get dicey to edit the function as a unit. -t

Is the originator of TableOrientedCodeManagement aware that a body of source code is already a database in the ideal conceptual representation for code analysis and management, and that any syntax-aware IDE, editor, or other grammar-driven mechanism can trivially parse that source code and support any number of user views and queries of any kind?

A tabular representation may, for some purposes, be appropriate for result of some query on the source code, but I'm hard pressed to see why the code should normally be represented that way. A tabular structure seems, at the very least, to be inflexible, awkward, and suboptimal for the sort of tasks typically performed on source code, and for the sort of queries one might wish to ask of it.

I haven't seen any code editors that offer flexible query techniques and also allow custom multiple categorization. I'm sure such can be built, but so far they don't exist or are specific to one language and still limited. And, why reinvent the database from scratch to do it? Most of the existing tools are focused on syntax and code-completion, not bigger-structure analysis and management. I agree though it may be practical to split up language-specific and syntax issues from the bigger-structure management system so that they can be mixed and matched as needed so that 1,000 wheels don't have to be reinvented for 1,000 app languages. It's a matter of agreeing where one starts and one ends and the common interface between them.

Such code editors don't exist (or are very obscure) because there hasn't been sufficient motivation to build them. However, various lints (see http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis) arguably fit that category. In 30+ years of writing code, I can count on one hand the times I've wanted a complex "code query" facility. I suppose if I had one I'd use it more -- I can appreciate its value -- but that still isn't enough to motivate me to build one.

I'm not sure what you mean about "reinvent the database from scratch to do it". Why would I have to do that? It sounds like a lot of work to create a database schema, write a parser to dump source code into the database, then write useful code-oriented relational queries against the schema, plus (presumably) provide some way of reconstructing linear source code from the database. It sounds like relatively little work to create a parser, write a 'code algebra' to work with the parser, and write an appropriate language to construct 'code algebra' queries. Of course, constructing such a tool presumes the ability to write parsers, design algebras, and construct query languages. If you know nowt of such things, perhaps using an existing DBMS system might be a (clunky) alternative.

Why would it be a lot of work to write SQL queries to an existing schema? Plus, there's existing tools and conventions such as QueryByExample that make query writing mostly a matter of dragging and clicking. And what would this "code algebra" look like? Why should someone have to learn such when most developers worth any salt already know SQL?

The sort of queries one would probably want to issue against a codebase may or may not be easy to write depending on the structure of the schema, but intuitively I suspect they're not going to be easy to write.

For example, how would you write a SQL query to retrieve all lines of code that are three lines after a line that invokes any two-parameter log() method in any class inherited from either the Spooler class or the Logger class but not the FileLogger class or any of its subclasses? Using a hypothetical made-up-on-the-fly code algebra, I might write something like:

 lineOffset(3,
    invokes("log(?, ?)", 
            classtree("Spooler", $) + classtree("Logger", $) - classtree("FileLogger", $)))

Each named operator accepts two parameters -- a specification of a subset of a collection of source lines, and a collection of lines.

Each operator returns a collection of lines. Note that a "collection of lines" is necessarily conceptual, and not simply an actual collection of lines statically extracted from the source. For example, 'classtree' can't simply return (say) the lines in a class definition and all its derivatives, because invocations of the log() method could be found outside that hierarchy.

'+' is a binary operator that returns a collection of lines consisting of the union of its operands.

'-' is a binary operator that returns the collection of lines in the first operand minus those in the second operand.

'$' represents the sourcebase itself.

Why learn it? Because it is targetted to making code inquiries, and therefore (if properly designed) should be simpler than the equivalent SQL for querying code.

I am not assuming here the that the TOCMS actually parses individual statements into relational form. For one, that'd probably limit the kit to usage for one particular language. I'd prefer a two-phase approach where the DB-centric side narrows down the scope of routines to study further with a language-specific parser and/or a RegEx-like tool. That way each focuses on what it does best, and is still usable for different languages. This is a common technique with TOP: use the query to narrow down a result set which is further analyzed with imperative logic and/or other tools/techniques. This allows the DB to do the heavy-lifting, and then things like RegEx are the tweezers for the fine work. Further, "3-lines after" is not a very realistic/common kind of search because you'd miss one if there happened to be an extra line for some unforeseen purpose. Note that a method/routine cally/caller cross-reference table may be available to eliminate those that don't contain your "log()" call. -t

And, I agree that for a well-written code-base, such tools may be more trouble than their worth to set up. However, when a BigBallOfMud is dumped in one's lap, than code analysis becomes much more useful. I've only built TOCMS for BigBallOfMud's.

CategorySourceManagement