Dynamic Languages And Large Apps

Re TypefulProgramming: "But I still can't see where would dynamic typing prove really useful in a standard business application scenario (which has, say, one million lines of code). Agreed that dynamic typing will be useful when the code base is small and only one or two developers are required to maintain a code."

Why is it that fans of compile-time typing keep suggesting that dynamic languages are not good for "large applications"? First, we need to define "large applications" because dynamic language proponents tend to have different division-of-work techniques (SystemSizeMetrics). The One Big EXE approach that compiloids tend to use is a smell in my opinion. The trick is to split large "applications" into small, relatively independent tasks or events that communicate mostly through a database or state management mechanism. (In languages like SmallTalk, the ST environment becomes the de-facto "database".) That way the developer only has to know the data schemas to "plug in" to the system instead of tons of behavioral interfaces. (However, I do realize that some of you like behavior-centric interfaces for reasons that escape me. Perhaps it is one of the grand personal preferences that we are all just born with.) -- top

A LargeApplication is one where the total volume of code--however packaged--is large. (I won't specify any SystemSizeMetrics here). And for many LargeApplications, dynamic languages work fine.

Total volume per what? Per department? Per company? Where to draw the lines is the point of contention. For example, is Yahoo one huge application or many smaller ones? Yahoo is several large applications--the totality of Yahoo can be sensibily divided into different subsections--a search engine, a bulliten board, a news service, an online store, etc.--but none of those is a small or trivial piece of software. Most of Yahoo is I/O bound, so DynamicLanguages are a good choice. But an application composed of numerous small subsets is still a LargeApplication

I am still do not see a clear-cut definition that gives boundaries for "application". I can agree that "online store" is an application, but not see them all as a one big mega application.

There are also no clear-cut definitions that give boundaries for "large". I'm certain we can agree that both of these things are somewhat relative. One might loosely describe an application as a collective of FunctionalRequirements collectively united or constrained by at least one or two NonFunctionalRequirements, but, though this definition does make useful distinctions between many applications, even it wouldn't draw precise lines.

There are some categories of LargeApplication that would be difficult to implement well in a DynamicLanguage, however, due to strict performance constraints. A RDBMS engine, for example, is something that practically requires being compiled to machine code, and having the hell optimized out of it. Many such optimizations are made possible by StaticTyping and other techniques for HelpingTheCompiler. Of course, I imagine that many RDBMS engines AlternateHardAndSoftLayers--implementing the core I/O stuff in C/C++ (this layer gets down and dirty with the OS), implementing other parts (query optimizers and such) in higher-level languages which are better at that sort of data processing than are C/C++.

{Well, I agree that "system tools" are probably best implemented in a compiled language.}

I'm curious - where does TypeInference fit in the DynamicLanguage spectrum? And has anyone tried to implement a performance-critical RDBMS in ObjectiveCaml? It seems like that language is almost tailor-made for things like that. -- JonathanTang

Any company that targets performance-critical system tools is probably going to pick the "safe" choice of C or C++. Only a lone entrepreneur would probably ever try that with ObjectiveCaml.

[A lone entrepreneur, or the members of the Commercial Users of Functional Programming. See http://cufp.org/]

This page seems to talk more about the definition of a large application rather than about its original topic. What I can say from my personal experience is that static typing saved my day several times because:

most of the time the big application does not have any MicroArchitecture : once some knows what an exe is supposed to do, this exe will most of the time be a huge .c file of several thousands lines of code. This .c file will have ten or twenty global variables.
Global variables are not a necessity. Wrap them in functions, put them in the database, or put them into a global dictionary array instead. Global arrays are not any more evil than global classes.
[And what's that supposed to change? Global vars and a global dictionary are equivalent modulo a trivial syntax transformation, therefore both are equally evil.]
I would like to see specific semi-realistic scenarios. This smells like it's turning into a classic setter/getter HolyWar (which even OO'ers take different sides on. See AccessorsAreEvil).
I agree that static typing can "catch problems". It has benefits, but at the expense of other things. There are already plenty of existing topics on those.
there is not a single UnitTest. Talk me about LegacyCode :) [context?]

This is not a nice example of architectural masterpiece, but this is my reality. -- PhilippeDetournay

I'm not sure I see what a large, monolithic, unstructured hunk of spaghetti-code has to do with dynamic languages. I'll venture a guess or two, though -- if you must change it (and sooner or later you will), then you will inevitably face the task of breaking this monster apart into more manageable pieces. The static nature of CeeLanguage, and the morass of entanglements that surely follows as a result, makes this task tedious and difficult. Oh, and by the way, here's a recipe for attacking the global variables:

Choose one of the globals.
Define a getter and setter with a similar name.
Grep and replace references to the global with calls to the appropriate getter or setter.
Iterate until gone.
... ? ...
Then "much of the refactoring starts to get much easier."

Once you've got them expressed and used as methods, much of the refactoring starts to get much easier.

So you replace a bunch of global variables with global getters and setters. I'm supposing you're planning to use the layer of behavioral indirection for something useful, but I'm still not clear on how switching to getters and setters makes much of the refactoring much easier.

What makes you believe that the "morass of entanglements" "surely follows as a result" from "the static nature of CeeLanguage"? I've seen my share of ExtremelyInterstrangled products written in many languages, from CeeLanguage and VisualBasic to Perl and JavaScript. In my own observations, it seems more often the complexity of the problem domain and the difficulty meeting certain NonFunctionalRequirements (speed/caching, memory space, etc.) that results in such entanglements.